Hi Tejas,

I changed the generate.count.mode to domain and generate.max.count to 100
but still it shows queue mode as byhost and not by domain.



peterbarretto wrote
> Hi Tejas
> 
> The fetcher.threads.per.host property has been depreciated and replaced
> with fetcher.threads.per.queue
> I am not sue if fetcher.threads.per.queue will hepl the fetching as the
> generator only generates the fetchlist from 2 or 3 domain. How can i tell
> the generator to create fetchlist with equal number of urls from all
> domain?  
> 
> I am sure there are urls from the other domains but i guess since the url
> score is less it fetches from only 2 domains.
> 
> I will try increasing fetcher.threads.per.queue to 5 and see if the fetch
> speed is increased and let you know
> Tejas Patil wrote
>> Hey Peter,
>> 
>> I am guessing that you have just increased the global thread count. Have
>> you even increased "fetcher.threads.per.host" ? This will improve the
>> crawl
>> rate as multiple threads can attack the same site. Dont make it too high
>> or
>> else the system will get overloaded. The nutch wiki has an article [0]
>> about the potential reasons for slow crawls and some good suggestions.
>> 
>> [0] : https://wiki.apache.org/nutch/OptimizingCrawls
>> 
>> Thanks,
>> Tejas Patil
>> 
>> 
>> On Sun, Jan 27, 2013 at 8:08 PM, peterbarretto <

>> peterbarretto08@

>> >wrote:
>> 
>>> I tried increasing the numbers of threads to 50 but the speed is not
>>> affected
>>>
>>>
>>> I tried changing the partition.url.mode value to byDomain and
>>> fetcher.queue.mode to byDomain but still it does not help the speed.
>>> It seems to get urls from 2 domains now and the other domains are not
>>> getting crawled. Is this due to the url score? if so how do i crawl urls
>>> from all the domains?
>>>
>>>
>>> lewis john mcgibbney wrote
>>> > Increase number of threads when fetching
>>> > Also please see nutch-deault.xml for paritioning of urls, if you know
>>> your
>>> > target domains you may wish to adapt the policy.
>>> > Lewis
>>> >
>>> > On Sunday, January 27, 2013, peterbarretto <
>>>
>>> > peterbarretto08@
>>>
>>> > >
>>> > wrote:
>>> >> I want to increase the number of urls fetched at a time in nutch. I
>>> have
>>> >> around 10 websites to crawl. so how can i crawl all the sites at a
>>> time
>>> ?
>>> >> right now i am fetching 1 site with a fetch delay of 2 second but it
>>> is
>>> > too
>>> >> slow. How to concurrently fetch from different domain?
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> View this message in context:
>>> >
>>> http://lucene.472066.n3.nabble.com/increase-the-number-of-fetches-at-agiven-time-on-nutch-1-6-or-2-1-tp4036499.html
>>> >> Sent from the Nutch - User mailing list archive at Nabble.com.
>>> >>
>>> >
>>> > --
>>> > *Lewis*
>>>
>>>
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://lucene.472066.n3.nabble.com/increase-the-number-of-fetches-at-agiven-time-on-nutch-1-6-or-2-1-tp4036499p4036630.html
>>> Sent from the Nutch - User mailing list archive at Nabble.com.
>>>





--
View this message in context: 
http://lucene.472066.n3.nabble.com/increase-the-number-of-fetches-at-agiven-time-on-nutch-1-6-or-2-1-tp4036499p4036976.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to