Hi,

> 1. fetcher.threads.per.host: 10*3 = 30
Correct. But if there are 1000 hosts you hardly
would set it to 3000, see question 2.

Keep in mind, that the property has been renamed into
fetcher.threads.per.queue with Nutch 1.4!
A queue can be defined by host or ip, see fetcher.queue.mode.

> 2. fetcher.threads.fetch
If there are many hosts you would set fetcher.threads.per.host
to 1 (the default), and use fetcher.threads.fetch to limit the
load on your system (esp. to limit the network load).

> 3. in distributed mode
All URLs from the same host are placed in the same partition.
This ensures that host-level blocking can be done in one single
JVM.

Sebastian


On 06/22/2014 05:51 PM, S.L wrote:
> Hi All,
> 
> I would like to know the relationship between the two config properties
> *fetcher.threads.fetch* and *fetcher.threads.per.host*.
> 
> 
>    1. If lets say I am crawling 10 hosts in my seed file and set the
>    fetcher.threads.per.host property to 3 , should I set the
>    fetcher.threads.fetch property to 10*3 i.e >=30 ?
>    2. I can understand the *fetcher.threads.per.host *property as it is
>    self explanatory , which means number to concurrent connections to a
>    particular host , however , I am not able to clearly follow what
> *fetcher.threads.fetch
>    *does.
>    3. Also I would like to know how the *fetcher.threads.per.host* property
>    comes into play in a distributed mode  ?
> 
> 
> 
> Thanks in advance.
> 

Reply via email to