Hi, On Thu, Aug 30, 2012 at 2:05 PM, Markus Jelsma <[email protected]> wrote: > The issue is with TagSoup's schema where some HTML5 elements are missing. > I fixed it for now by adding some elements to the schema in the (newly added) > constructor of Tika's HtmlParser.
Looks like a reasonable workaround. Can you file a TIKA issue for this and attach a patch with your changes? > I used 255 as memberOf value because the group constants are not defined in > the schema and i couldn't find their integer repr. in the html.tssl file in > TagSoup. > This is not a very elegant solution so how should it be solved? I think the idea solution would be to have these changes included directly in TagSoup. BR, Jukka Zitting
