Hi Tejas

The fetcher.threads.per.host property has been depreciated and replaced with
fetcher.threads.per.queue
I am not sue if fetcher.threads.per.queue will hepl the fetching as the
generator only generates the fetchlist from 2 or 3 domain. How can i tell
the generator to create fetchlist with equal number of urls from all domain?  

I am sure there are urls from the other domains but i guess since the url
score is less it fetches from only 2 domains.

I will try increasing fetcher.threads.per.queue to 5 and see if the fetch
speed is increased and let you know


Tejas Patil wrote
> Hey Peter,
> 
> I am guessing that you have just increased the global thread count. Have
> you even increased "fetcher.threads.per.host" ? This will improve the
> crawl
> rate as multiple threads can attack the same site. Dont make it too high
> or
> else the system will get overloaded. The nutch wiki has an article [0]
> about the potential reasons for slow crawls and some good suggestions.
> 
> [0] : https://wiki.apache.org/nutch/OptimizingCrawls
> 
> Thanks,
> Tejas Patil
> 
> 
> On Sun, Jan 27, 2013 at 8:08 PM, peterbarretto <

> peterbarretto08@

> >wrote:
> 
>> I tried increasing the numbers of threads to 50 but the speed is not
>> affected
>>
>>
>> I tried changing the partition.url.mode value to byDomain and
>> fetcher.queue.mode to byDomain but still it does not help the speed.
>> It seems to get urls from 2 domains now and the other domains are not
>> getting crawled. Is this due to the url score? if so how do i crawl urls
>> from all the domains?
>>
>>
>> lewis john mcgibbney wrote
>> > Increase number of threads when fetching
>> > Also please see nutch-deault.xml for paritioning of urls, if you know
>> your
>> > target domains you may wish to adapt the policy.
>> > Lewis
>> >
>> > On Sunday, January 27, 2013, peterbarretto <
>>
>> > peterbarretto08@
>>
>> > >
>> > wrote:
>> >> I want to increase the number of urls fetched at a time in nutch. I
>> have
>> >> around 10 websites to crawl. so how can i crawl all the sites at a
>> time
>> ?
>> >> right now i am fetching 1 site with a fetch delay of 2 second but it
>> is
>> > too
>> >> slow. How to concurrently fetch from different domain?
>> >>
>> >>
>> >>
>> >> --
>> >> View this message in context:
>> >
>> http://lucene.472066.n3.nabble.com/increase-the-number-of-fetches-at-agiven-time-on-nutch-1-6-or-2-1-tp4036499.html
>> >> Sent from the Nutch - User mailing list archive at Nabble.com.
>> >>
>> >
>> > --
>> > *Lewis*
>>
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/increase-the-number-of-fetches-at-agiven-time-on-nutch-1-6-or-2-1-tp4036499p4036630.html
>> Sent from the Nutch - User mailing list archive at Nabble.com.
>>





--
View this message in context: 
http://lucene.472066.n3.nabble.com/increase-the-number-of-fetches-at-agiven-time-on-nutch-1-6-or-2-1-tp4036499p4036964.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to