> 
> Hi Dmitry,
> 
> That's not possible yet but could be done by writing a custom mapreduce
> job
> which would add a metadata to the entries of a fetchlist (e.g.
> fetcher.threads) given an input sequence file of [domains] + [feature
> name /
> value] for the domains where you want a non default behaviour.
> 
> You would then modify the Fetcher so that it checks for that reserved
> feature name in the metadata of an entry to fetch (e.g.
> fetcher.threads?)
> and use it. This is likely to require a modification of FetchItemQueue
> and
> FetchItem so that it does not rely on the default value only.
> 
Thank for your advice.
I'm correct that this can be used to modify other fetcher attributes too?
(I like DataParkSearch's abilities to essentially make almost everything
making sense per domain - tuneable)

> This should not be too difficult and would be a very nice contribution.
So looks like I have to do this feature it when I will have a little more
free time.

> We
> could also use the metadata injection to get the feature name / value
> at
> injection time and thus avoid rewriting the fetchlist with the custom
> map
> reduce job. This would work only if you are interested in fetching ONLY
> the
> urls from the seed list though and not following the links, although
> you
> could also write a custom scoring filter which would transfer the
> metadata
> to any outlinks belonging to the same domain as their originating page.
> 
For my purposes this should be done to all urls which are to be fetched(I
use domain filters becouse regex filters are too slow on 500+ domains in
test phase arleady)



> BTW : one of the plans for 2.0 is to have a separate table for storing
> host
> or domain information. This is definitely where you'd store the number
> of
> threads for instance.
> 
Is it correct that 2.0 will have only solr-indexer(and not 'regular'
searcher)?
What about search webinterface(currently what I see on solr is API for
_making_ one,not basic interface like regular nutch one)
> HTH
> 
> Julien
> 
> --
> DigitalPebble Ltd
> 
> Open Source Solutions for Text Engineering
> http://www.digitalpebble.com
> 
> On 23 June 2010 11:36, Dmitriy V. Kazimirov
> <[email protected]>wrote:
> 
> > Hi,
> >
> > Is it possible (likely not but still) to make Nutch use different
> values
> > of fetcher.threads.per.host for different domains?
> >
> > i.e. low default value(1-3) and for domains which I _knew_ can handle
> load
> > and it won't be problem - increase it
> >
> >
> >
> > If that's not possible could someone give advice where in code to
> look to
> > make such modifications?
> >
> >
> >
> >
> >
> > With regards, Dmitriy Kazimirov
> >
> >

Reply via email to