Hi Dmitry, That's not possible yet but could be done by writing a custom mapreduce job which would add a metadata to the entries of a fetchlist (e.g. fetcher.threads) given an input sequence file of [domains] + [feature name / value] for the domains where you want a non default behaviour.
You would then modify the Fetcher so that it checks for that reserved feature name in the metadata of an entry to fetch (e.g. fetcher.threads?) and use it. This is likely to require a modification of FetchItemQueue and FetchItem so that it does not rely on the default value only. This should not be too difficult and would be a very nice contribution. We could also use the metadata injection to get the feature name / value at injection time and thus avoid rewriting the fetchlist with the custom map reduce job. This would work only if you are interested in fetching ONLY the urls from the seed list though and not following the links, although you could also write a custom scoring filter which would transfer the metadata to any outlinks belonging to the same domain as their originating page. BTW : one of the plans for 2.0 is to have a separate table for storing host or domain information. This is definitely where you'd store the number of threads for instance. HTH Julien -- DigitalPebble Ltd Open Source Solutions for Text Engineering http://www.digitalpebble.com On 23 June 2010 11:36, Dmitriy V. Kazimirov <[email protected]>wrote: > Hi, > > Is it possible (likely not but still) to make Nutch use different values > of fetcher.threads.per.host for different domains? > > i.e. low default value(1-3) and for domains which I _knew_ can handle load > and it won't be problem - increase it > > > > If that's not possible could someone give advice where in code to look to > make such modifications? > > > > > > With regards, Dmitriy Kazimirov > >

