Thanks for clarifying Nick,

This will be a nice feature to have.
 I will have a look at the past discussions before proceeding.

-
Thamme

On Wed, Mar 23, 2016 at 3:01 PM, Nick Burch <[email protected]> wrote:

> On Wed, 23 Mar 2016, Thamme Gowda N. wrote:
>
>> Question : How to enable multiple parsers for specific mimetypes?
>>
>> I am using tika to parse html pages.
>>
>> My requirement is that both *NamedEntityParser* and *HtmlParser* has to be
>> enabled for specific web related MIME types like *text/html, *
>> *application/xhtml+xml*.
>>
>
> This is not currently supported.
>
> See http://wiki.apache.org/tika/CompositeParserDiscussion for the
> discussion on it. If you have ideas on how we can solve the issue of
> multiple parsers needing to output to the same write-once SAX stream,
> including for the fallback case, please shout!
>
> (You can chain multiple content handlers together, so one option might be
> to try to get the named entity stuff to enrich the html sax events stream
> rather than needing to be a standalone parser)
>
> Nick
>

Reply via email to