Thanks for your reply. The problem I have with using the suggested settings you have described above is that the Generator step of normalizing is taking too long after some iterations(Thats why I want the crawldb to be at a reasonable level).
It seems that I can crawl and index about one million URLs in a 24h period from the first init. But this number is decreasing with a large amount if I continue to crawl. This is due to the fact that the normalize step can take up to one hour after some iterations, when the crawldb is getting bigger. I don't see why the generator step is taking so long? It can't take that much time selecting X urls from a database of about 10 million URLs? Thanks, James Ford -- View this message in context: http://lucene.472066.n3.nabble.com/Make-Nutch-to-crawl-internal-urls-only-tp3974397p3976511.html Sent from the Nutch - User mailing list archive at Nabble.com.

