Hello, I am running nutch 0.9 currently. I am running on 4 nodes, one is the master, in addition to being a slave.
I have injected 100k urls into nutch. All urls are on the same host. I am running a generate/fetch/update cycle with topN set at 100k. However, after each cycle, it only fetches between 2588 and 2914 urls each time. I have run this over 8 times, all with the same result. I have tried using nutch fetch and nutch fetch2. My hypothesis is, this is due to all urls being on same host (www.example.com/some/path). Do I need to set the fetcher.threads.per.host to something higher than the default of 2? Is there something in the logs I should look for to determine the exact cause of this problem? Thank you in advance for any assistance that can be provided. If you need any additional information, please let me know and I'll send it. Thanks! JohnM -- john mendenhall [EMAIL PROTECTED] surf utopia internet services
