You can decrease fetcher.server.delay. Another way is to split storage table and run many instances of nutch. However, if you do not own the server where the crawled domain hosted you could be blocked, since frequent requests might be accepted as a Dos attack.
hth. Alex. -----Original Message----- From: weishenyun <[email protected]> To: user <[email protected]> Sent: Mon, Jul 1, 2013 8:17 pm Subject: Re: Running multiple nutch jobs to fetch a same site with millions of pages Hi alxsss, I have tried that. I have set -numTasks > 1 and set mapred.reduce.tasks > 1. But still only one reducer task tried to fetch all the pages from the same site. -- View this message in context: http://lucene.472066.n3.nabble.com/Running-multiple-nutch-jobs-to-fetch-a-same-site-with-millions-of-pages-tp4074523p4074539.html Sent from the Nutch - User mailing list archive at Nabble.com.

