Hi Tejas The fetcher.threads.per.host property has been depreciated and replaced with fetcher.threads.per.queue I am not sue if fetcher.threads.per.queue will hepl the fetching as the generator only generates the fetchlist from 2 or 3 domain. How can i tell the generator to create fetchlist with equal number of urls from all domain?
I am sure there are urls from the other domains but i guess since the url score is less it fetches from only 2 domains. I will try increasing fetcher.threads.per.queue to 5 and see if the fetch speed is increased and let you know Tejas Patil wrote > Hey Peter, > > I am guessing that you have just increased the global thread count. Have > you even increased "fetcher.threads.per.host" ? This will improve the > crawl > rate as multiple threads can attack the same site. Dont make it too high > or > else the system will get overloaded. The nutch wiki has an article [0] > about the potential reasons for slow crawls and some good suggestions. > > [0] : https://wiki.apache.org/nutch/OptimizingCrawls > > Thanks, > Tejas Patil > > > On Sun, Jan 27, 2013 at 8:08 PM, peterbarretto < > peterbarretto08@ > >wrote: > >> I tried increasing the numbers of threads to 50 but the speed is not >> affected >> >> >> I tried changing the partition.url.mode value to byDomain and >> fetcher.queue.mode to byDomain but still it does not help the speed. >> It seems to get urls from 2 domains now and the other domains are not >> getting crawled. Is this due to the url score? if so how do i crawl urls >> from all the domains? >> >> >> lewis john mcgibbney wrote >> > Increase number of threads when fetching >> > Also please see nutch-deault.xml for paritioning of urls, if you know >> your >> > target domains you may wish to adapt the policy. >> > Lewis >> > >> > On Sunday, January 27, 2013, peterbarretto < >> >> > peterbarretto08@ >> >> > > >> > wrote: >> >> I want to increase the number of urls fetched at a time in nutch. I >> have >> >> around 10 websites to crawl. so how can i crawl all the sites at a >> time >> ? >> >> right now i am fetching 1 site with a fetch delay of 2 second but it >> is >> > too >> >> slow. How to concurrently fetch from different domain? >> >> >> >> >> >> >> >> -- >> >> View this message in context: >> > >> http://lucene.472066.n3.nabble.com/increase-the-number-of-fetches-at-agiven-time-on-nutch-1-6-or-2-1-tp4036499.html >> >> Sent from the Nutch - User mailing list archive at Nabble.com. >> >> >> > >> > -- >> > *Lewis* >> >> >> >> >> >> -- >> View this message in context: >> http://lucene.472066.n3.nabble.com/increase-the-number-of-fetches-at-agiven-time-on-nutch-1-6-or-2-1-tp4036499p4036630.html >> Sent from the Nutch - User mailing list archive at Nabble.com. >> -- View this message in context: http://lucene.472066.n3.nabble.com/increase-the-number-of-fetches-at-agiven-time-on-nutch-1-6-or-2-1-tp4036499p4036964.html Sent from the Nutch - User mailing list archive at Nabble.com.

