Re: Effect of no topN argument in generate

Smith Norton Thu, 06 Sep 2007 11:59:11 -0700

I have not added any such thing in my nutch-site.xml and I have
omitted -topN argument in bin/generate command.


So my question is what would be the effect in this case. I was
expecting that it would be same as -topN <infinity>. So it should
generate all possible URLs in the generate phase.

I tried omitting topN value in my crawl script and I find that my
crawl is running much faster. Earlier I had a -topN 2000 argument and
it used to take 4-5 days to finish a crawl of depth 5.

Now, without the topN argument, it finished a crawl of depth 5 in 6
hours. How?

On 9/7/07, Rikard Lindner <[EMAIL PROTECTED]> wrote:
> Now im getting a bit uncertain but i think you can add crawl.topN in your
> nutch-site.xml, i couldnt find it in nutch-default either but im quite sure
> it is set somerwhere!
>
> /Rikard
>
> 2007/9/6, Smith Norton <[EMAIL PROTECTED]>:
> >
> > Thanks for the response. What is the property name for this default
> > value of topN in nutch-default.xml?
> >
> > On 9/6/07, Rikard Lindner <[EMAIL PROTECTED]> wrote:
> > > There is a default value in nutch-default.xml
> > >
> > > /Rikard
> > >
> > > 2007/9/6, Smith Norton <[EMAIL PROTECTED]>:
> > > >
> > > > In the bin/generate command, if I omit the 'topN' argument, what is
> > > > the behavior?
> > > >
> > > > Does it generate all possible URLs or does it assume a default topN
> > value?
> > > >
> > > > I tried omitting topN value in my crawl script and I find that my
> > > > crawl is running much faster. Earlier I had a -topN 2000 argument and
> > > > it used to take 4-5 days to finish a crawl of depth 5.
> > > >
> > > > Now, without the topN argument, it finished a crawl of depth 5 in 6
> > > > hours. Can anyone explain what's going on?
> > > >
> > >
> >
>

Re: Effect of no topN argument in generate

Reply via email to