Hello kind sirs,

>> Another question is about injecting urls into the webdb. I first inject
>> some seed urls into the webdb and then starts fetching them, that's ok.
>> But the bin/nutch generate command creates a new segment,
>> with urls to fetch, from the webdb, right? What does the -topN parameter
>> exactly do? Does it get N urls from the web db which has the greates
>> rate/value/score, or does it simply get N urls from the webdb which
>> has been pushed onto the "top" (is there a "top" in the webdb?) of the
>> webdb?
> Correct, -topN should use the best scored urls form db.

I have a question about -topN. Does -topN only work after having done a
round of fetching? What happens if I use -topN when generating a segment
to fetch before I've fetched anything? Will it just select a random subset
of N URLs, or will it use the first N URLs?

Many thanks,

-Shiwoong



-------------------------------------------------------
This SF.Net email is sponsored by the 'Do More With Dual!' webinar happening
July 14 at 8am PDT/11am EDT. We invite you to explore the latest in dual
core and dual graphics technology at this free one hour event hosted by HP, 
AMD, and NVIDIA.  To register visit http://www.hp.com/go/dualwebinar
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to