Re: Hangup of fetcher threads

Andrzej Bialecki Tue, 06 Jul 2010 09:24:19 -0700

On 2010-07-06 16:22, Julien Nioche wrote:

i'm trying to reproduce the problem outside of the "crawl" command,
through multi-step script approach.
the problem happens again before the parse command.



you specified -noparse on the fetch command line, didn't you?

i guess the problem
is indeed in the protocol-httpclient. Though i can't understand what's
happening. Why, in the fetching phase, the tika parser is called for
TXT? The parser is called on the content in the Fetcher output() method.


the parser should not be called at all if you specify -noparse for the fetch

as for TXT the parser is used to find outlinks

Careful here - the option is called -noParsing. -noparse won't work, insuch case Fetcher will default to whatever was set innutch-site/nutch-default.xml (which often is set to parsing).



--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Re: Hangup of fetcher threads

Reply via email to