The queue runs on a single datanode?
Rémy Amouroux wrote > > Hi > > The fetcher threads (10 by default as configured in nutch-default.xml > through fetcher.threads.fetch) are taking theirs jobs from a queue. > By default, there is only one queue per host (defined by property > fetcher.queue.mode) and there is a property configuration limiting the > number of threads allowed to access a queue (property > fetcher.threads.per.queue, default is 1 in nutch-default.xml ) in order to > be polite with the crawled web site. > > So, you are crawling only one website, then you have one queue, and only > one thread allowed to fetch at a given moment. > > By modifying fetcher.threads.per.queue in nutch-site.xml, you can have > more threads doing fetching at the same time, capped by > fetcher.threads.fetch. > > Regards > > PS: be careful and think of the impact of the new configuration on this > website :-) > > RemyA > > Le 24 mai 2012 à 06:12, Dustine Rene Bernasor a écrit : > >> I have a 3-slaves hadoop cluster and I am performing a crawl on a single >> website. However, only 1 slave is performing fetching (though the other >> slaves are still alive). Is this normal behavior if only 1 domain is >> crawled? Is there any way to force the other slaves to fetch? >> >> Thanks. >> > -- View this message in context: http://lucene.472066.n3.nabble.com/nutch-hadoop-only-one-slave-is-crawling-tp3985825p3985886.html Sent from the Nutch - User mailing list archive at Nabble.com.

