Thanks for clarifying Nick, This will be a nice feature to have. I will have a look at the past discussions before proceeding.
- Thamme On Wed, Mar 23, 2016 at 3:01 PM, Nick Burch <[email protected]> wrote: > On Wed, 23 Mar 2016, Thamme Gowda N. wrote: > >> Question : How to enable multiple parsers for specific mimetypes? >> >> I am using tika to parse html pages. >> >> My requirement is that both *NamedEntityParser* and *HtmlParser* has to be >> enabled for specific web related MIME types like *text/html, * >> *application/xhtml+xml*. >> > > This is not currently supported. > > See http://wiki.apache.org/tika/CompositeParserDiscussion for the > discussion on it. If you have ideas on how we can solve the issue of > multiple parsers needing to output to the same write-once SAX stream, > including for the fallback case, please shout! > > (You can chain multiple content handlers together, so one option might be > to try to get the named entity stuff to enrich the html sax events stream > rather than needing to be a standalone parser) > > Nick >
