Hi,
I'm building a new ContentHandler that needs to do some work on script elements
as well. But they are not reported in my startElement method. The context has
the IdentityHtmlMapper set and script does not get discarded in Tika's own
HtmlHandler. Instead, the script element is reported in HtmlHandler but not in
my custom handler.
The confusing thing is that i am able to get it in my handler when adding the
script element to TagSoup inside HtmlParser's constructor:
HTML_SCHEMA.elementType("script", HTMLSchema.M_EMPTY, 65535, 0);
Without this, script and it's characters are only reported inside HtmlHandler,
never in custom handlers.
Am must be doing something wrong here, any hints?
Thanks,
Markus