well spotted! Should have checked the name indeed. Anyway, Claudio let us know how you are getting on.
Julien On 6 July 2010 17:23, Andrzej Bialecki <[email protected]> wrote: > On 2010-07-06 16:22, Julien Nioche wrote: > >> i'm trying to reproduce the problem outside of the "crawl" command, >>> through multi-step script approach. >>> the problem happens again before the parse command. >>> >> >> >> you specified -noparse on the fetch command line, didn't you? >> >> >> i guess the problem >>> is indeed in the protocol-httpclient. Though i can't understand what's >>> happening. Why, in the fetching phase, the tika parser is called for >>> TXT? The parser is called on the content in the Fetcher output() method. >>> >>> >> the parser should not be called at all if you specify -noparse for the >> fetch >> >> as for TXT the parser is used to find outlinks >> >> > Careful here - the option is called -noParsing. -noparse won't work, in > such case Fetcher will default to whatever was set in > nutch-site/nutch-default.xml (which often is set to parsing). > > > -- > Best regards, > Andrzej Bialecki <>< > ___. ___ ___ ___ _ _ __________________________________ > [__ || __|__/|__||\/| Information Retrieval, Semantic Web > ___|||__|| \| || | Embedded Unix, System Integration > http://www.sigram.com Contact: info at sigram dot com > > -- DigitalPebble Ltd Open Source Solutions for Text Engineering http://www.digitalpebble.com

