> i'm trying to reproduce the problem outside of the "crawl" command, > through multi-step script approach. > the problem happens again before the parse command.
you specified -noparse on the fetch command line, didn't you? > i guess the problem > is indeed in the protocol-httpclient. Though i can't understand what's > happening. Why, in the fetching phase, the tika parser is called for > TXT? The parser is called on the content in the Fetcher output() method. > the parser should not be called at all if you specify -noparse for the fetch as for TXT the parser is used to find outlinks

