> i'm trying to reproduce the problem outside of the "crawl" command,
> through multi-step script approach.
> the problem happens again before the parse command.


you specified -noparse on the fetch command line, didn't you?


> i guess the problem
> is indeed in the protocol-httpclient. Though i can't understand what's
> happening. Why, in the fetching phase, the tika parser is called for
> TXT? The parser is called on the content in the Fetcher output() method.
>

the parser should not be called at all if you specify -noparse for the fetch

as for TXT the parser is used to find outlinks

Reply via email to