On Wed, 23 Mar 2016, Thamme Gowda N. wrote:
Question : How to enable multiple parsers for specific mimetypes?

I am using tika to parse html pages.

My requirement is that both *NamedEntityParser* and *HtmlParser* has to be
enabled for specific web related MIME types like *text/html, *
*application/xhtml+xml*.

This is not currently supported.

See http://wiki.apache.org/tika/CompositeParserDiscussion for the discussion on it. If you have ideas on how we can solve the issue of multiple parsers needing to output to the same write-once SAX stream, including for the fallback case, please shout!

(You can chain multiple content handlers together, so one option might be to try to get the named entity stuff to enrich the html sax events stream rather than needing to be a standalone parser)

Nick

Reply via email to