Hi, > 1. fetcher.threads.per.host: 10*3 = 30 Correct. But if there are 1000 hosts you hardly would set it to 3000, see question 2.
Keep in mind, that the property has been renamed into fetcher.threads.per.queue with Nutch 1.4! A queue can be defined by host or ip, see fetcher.queue.mode. > 2. fetcher.threads.fetch If there are many hosts you would set fetcher.threads.per.host to 1 (the default), and use fetcher.threads.fetch to limit the load on your system (esp. to limit the network load). > 3. in distributed mode All URLs from the same host are placed in the same partition. This ensures that host-level blocking can be done in one single JVM. Sebastian On 06/22/2014 05:51 PM, S.L wrote: > Hi All, > > I would like to know the relationship between the two config properties > *fetcher.threads.fetch* and *fetcher.threads.per.host*. > > > 1. If lets say I am crawling 10 hosts in my seed file and set the > fetcher.threads.per.host property to 3 , should I set the > fetcher.threads.fetch property to 10*3 i.e >=30 ? > 2. I can understand the *fetcher.threads.per.host *property as it is > self explanatory , which means number to concurrent connections to a > particular host , however , I am not able to clearly follow what > *fetcher.threads.fetch > *does. > 3. Also I would like to know how the *fetcher.threads.per.host* property > comes into play in a distributed mode ? > > > > Thanks in advance. >

