According to http://wiki.apache.org/nutch/nutch-0.8-dev/bin/nutch%20generate
the value is Long.MAX_VALUE.

Do you run both tests in the same conditions? Or maybe you have first run the 
crawl with topN 2000 and then without the parameter on the same crawl db? It 
may happen that there is not so much to crawl anymore ...

Regards,
Marcin


> I have not added any such thing in my nutch-site.xml and I have
> omitted -topN argument in bin/generate command.
> 
> So my question is what would be the effect in this case. I was
> expecting that it would be same as -topN <infinity>. So it should
> generate all possible URLs in the generate phase.
> 
> I tried omitting topN value in my crawl script and I find that my
> crawl is running much faster. Earlier I had a -topN 2000 argument and
> it used to take 4-5 days to finish a crawl of depth 5.
> 
> Now, without the topN argument, it finished a crawl of depth 5 in 6
> hours. How?
> 
> On 9/7/07, Rikard Lindner <[EMAIL PROTECTED]> wrote:
> > Now im getting a bit uncertain but i think you can add crawl.topN in your
> > nutch-site.xml, i couldnt find it in nutch-default either but im quite sure
> > it is set somerwhere!
> >
> > /Rikard
> >
> > 2007/9/6, Smith Norton <[EMAIL PROTECTED]>:
> > >
> > > Thanks for the response. What is the property name for this default
> > > value of topN in nutch-default.xml?
> > >
> > > On 9/6/07, Rikard Lindner <[EMAIL PROTECTED]> wrote:
> > > > There is a default value in nutch-default.xml
> > > >
> > > > /Rikard
> > > >
> > > > 2007/9/6, Smith Norton <[EMAIL PROTECTED]>:
> > > > >
> > > > > In the bin/generate command, if I omit the 'topN' argument, what is
> > > > > the behavior?
> > > > >
> > > > > Does it generate all possible URLs or does it assume a default topN
> > > value?
> > > > >
> > > > > I tried omitting topN value in my crawl script and I find that my
> > > > > crawl is running much faster. Earlier I had a -topN 2000 argument and
> > > > > it used to take 4-5 days to finish a crawl of depth 5.
> > > > >
> > > > > Now, without the topN argument, it finished a crawl of depth 5 in 6
> > > > > hours. Can anyone explain what's going on?
> > > > >
> > > >
> > >
> >

Reply via email to