Hi,

On Thu, Aug 30, 2012 at 2:05 PM, Markus Jelsma
<[email protected]> wrote:
> The issue is with TagSoup's schema where some HTML5 elements are missing.
> I fixed it for now by adding some elements to the schema in the (newly added)
> constructor of Tika's HtmlParser.

Looks like a reasonable workaround. Can you file a TIKA issue for this
and attach a patch with your changes?

> I used 255 as memberOf value because the group constants are not defined in
> the schema and i couldn't find their integer repr. in the html.tssl file in 
> TagSoup.
> This is not a very elegant solution so how should it be solved?

I think the idea solution would be to have these changes included
directly in TagSoup.

BR,

Jukka Zitting

Reply via email to