Hi Tejas, I changed the generate.count.mode to domain and generate.max.count to 100 but still it shows queue mode as byhost and not by domain.
peterbarretto wrote > Hi Tejas > > The fetcher.threads.per.host property has been depreciated and replaced > with fetcher.threads.per.queue > I am not sue if fetcher.threads.per.queue will hepl the fetching as the > generator only generates the fetchlist from 2 or 3 domain. How can i tell > the generator to create fetchlist with equal number of urls from all > domain? > > I am sure there are urls from the other domains but i guess since the url > score is less it fetches from only 2 domains. > > I will try increasing fetcher.threads.per.queue to 5 and see if the fetch > speed is increased and let you know > Tejas Patil wrote >> Hey Peter, >> >> I am guessing that you have just increased the global thread count. Have >> you even increased "fetcher.threads.per.host" ? This will improve the >> crawl >> rate as multiple threads can attack the same site. Dont make it too high >> or >> else the system will get overloaded. The nutch wiki has an article [0] >> about the potential reasons for slow crawls and some good suggestions. >> >> [0] : https://wiki.apache.org/nutch/OptimizingCrawls >> >> Thanks, >> Tejas Patil >> >> >> On Sun, Jan 27, 2013 at 8:08 PM, peterbarretto < >> peterbarretto08@ >> >wrote: >> >>> I tried increasing the numbers of threads to 50 but the speed is not >>> affected >>> >>> >>> I tried changing the partition.url.mode value to byDomain and >>> fetcher.queue.mode to byDomain but still it does not help the speed. >>> It seems to get urls from 2 domains now and the other domains are not >>> getting crawled. Is this due to the url score? if so how do i crawl urls >>> from all the domains? >>> >>> >>> lewis john mcgibbney wrote >>> > Increase number of threads when fetching >>> > Also please see nutch-deault.xml for paritioning of urls, if you know >>> your >>> > target domains you may wish to adapt the policy. >>> > Lewis >>> > >>> > On Sunday, January 27, 2013, peterbarretto < >>> >>> > peterbarretto08@ >>> >>> > > >>> > wrote: >>> >> I want to increase the number of urls fetched at a time in nutch. I >>> have >>> >> around 10 websites to crawl. so how can i crawl all the sites at a >>> time >>> ? >>> >> right now i am fetching 1 site with a fetch delay of 2 second but it >>> is >>> > too >>> >> slow. How to concurrently fetch from different domain? >>> >> >>> >> >>> >> >>> >> -- >>> >> View this message in context: >>> > >>> http://lucene.472066.n3.nabble.com/increase-the-number-of-fetches-at-agiven-time-on-nutch-1-6-or-2-1-tp4036499.html >>> >> Sent from the Nutch - User mailing list archive at Nabble.com. >>> >> >>> > >>> > -- >>> > *Lewis* >>> >>> >>> >>> >>> >>> -- >>> View this message in context: >>> http://lucene.472066.n3.nabble.com/increase-the-number-of-fetches-at-agiven-time-on-nutch-1-6-or-2-1-tp4036499p4036630.html >>> Sent from the Nutch - User mailing list archive at Nabble.com. >>> -- View this message in context: http://lucene.472066.n3.nabble.com/increase-the-number-of-fetches-at-agiven-time-on-nutch-1-6-or-2-1-tp4036499p4036976.html Sent from the Nutch - User mailing list archive at Nabble.com.

