There is a default value in nutch-default.xml /Rikard
2007/9/6, Smith Norton <[EMAIL PROTECTED]>: > > In the bin/generate command, if I omit the 'topN' argument, what is > the behavior? > > Does it generate all possible URLs or does it assume a default topN value? > > I tried omitting topN value in my crawl script and I find that my > crawl is running much faster. Earlier I had a -topN 2000 argument and > it used to take 4-5 days to finish a crawl of depth 5. > > Now, without the topN argument, it finished a crawl of depth 5 in 6 > hours. Can anyone explain what's going on? >
