> > Hi Dmitry, > > That's not possible yet but could be done by writing a custom mapreduce > job > which would add a metadata to the entries of a fetchlist (e.g. > fetcher.threads) given an input sequence file of [domains] + [feature > name / > value] for the domains where you want a non default behaviour. > > You would then modify the Fetcher so that it checks for that reserved > feature name in the metadata of an entry to fetch (e.g. > fetcher.threads?) > and use it. This is likely to require a modification of FetchItemQueue > and > FetchItem so that it does not rely on the default value only. > Thank for your advice. I'm correct that this can be used to modify other fetcher attributes too? (I like DataParkSearch's abilities to essentially make almost everything making sense per domain - tuneable)
> This should not be too difficult and would be a very nice contribution. So looks like I have to do this feature it when I will have a little more free time. > We > could also use the metadata injection to get the feature name / value > at > injection time and thus avoid rewriting the fetchlist with the custom > map > reduce job. This would work only if you are interested in fetching ONLY > the > urls from the seed list though and not following the links, although > you > could also write a custom scoring filter which would transfer the > metadata > to any outlinks belonging to the same domain as their originating page. > For my purposes this should be done to all urls which are to be fetched(I use domain filters becouse regex filters are too slow on 500+ domains in test phase arleady) > BTW : one of the plans for 2.0 is to have a separate table for storing > host > or domain information. This is definitely where you'd store the number > of > threads for instance. > Is it correct that 2.0 will have only solr-indexer(and not 'regular' searcher)? What about search webinterface(currently what I see on solr is API for _making_ one,not basic interface like regular nutch one) > HTH > > Julien > > -- > DigitalPebble Ltd > > Open Source Solutions for Text Engineering > http://www.digitalpebble.com > > On 23 June 2010 11:36, Dmitriy V. Kazimirov > <[email protected]>wrote: > > > Hi, > > > > Is it possible (likely not but still) to make Nutch use different > values > > of fetcher.threads.per.host for different domains? > > > > i.e. low default value(1-3) and for domains which I _knew_ can handle > load > > and it won't be problem - increase it > > > > > > > > If that's not possible could someone give advice where in code to > look to > > make such modifications? > > > > > > > > > > > > With regards, Dmitriy Kazimirov > > > >

