Hi,
If you want to index all index pages from dmoz, you do not have to use any limitation for the amount of urls when you call bin/nutch generate db segments the very first time.Is that top scoring 1000 related to the number of pages? Since I have many thousands of pages, would it be better if I increased this number?
When you call generate a second time you should limit the amount of pages. Maybe measure how long it would take to crawl the first 5M pages and decide how much pages you would like to crawl in future.
Bye
Matthias -- http://www.eventax.com - eventax GmbH http://www.umkreisfinder.de - Die Suchmaschine f�r Lokales und Events
-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now. http://productguide.itmanagersjournal.com/
_______________________________________________
Nutch-developers mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/nutch-developers
