Jira issue:
https://issues.apache.org/jira/browse/TIKA-985 
 
-----Original message-----
> From:Jukka Zitting <[email protected]>
> Sent: Thu 30-Aug-2012 14:09
> To: [email protected]
> Subject: Re: Article and section tags
> 
> Hi,
> 
> On Thu, Aug 30, 2012 at 2:05 PM, Markus Jelsma
> <[email protected]> wrote:
> > The issue is with TagSoup's schema where some HTML5 elements are missing.
> > I fixed it for now by adding some elements to the schema in the (newly 
> > added)
> > constructor of Tika's HtmlParser.
> 
> Looks like a reasonable workaround. Can you file a TIKA issue for this
> and attach a patch with your changes?
> 
> > I used 255 as memberOf value because the group constants are not defined in
> > the schema and i couldn't find their integer repr. in the html.tssl file in 
> > TagSoup.
> > This is not a very elegant solution so how should it be solved?
> 
> I think the idea solution would be to have these changes included
> directly in TagSoup.
> 
> BR,
> 
> Jukka Zitting
> 

Reply via email to