Hi,

Is that top scoring 1000 related to the number of
pages? Since I have many thousands of pages, would it
be better if I increased this number?
If you want to index all index pages from dmoz, you do not have to use any limitation for the amount of urls when you call bin/nutch generate db segments the very first time.

When you call generate a second time you should limit the amount of pages. Maybe measure how long it would take to crawl the first 5M pages and decide how much pages you would like to crawl in future.

Bye

Matthias
--
http://www.eventax.com - eventax GmbH
http://www.umkreisfinder.de - Die Suchmaschine f�r Lokales und Events


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now. http://productguide.itmanagersjournal.com/
_______________________________________________
Nutch-developers mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to