[ https://issues.apache.org/jira/browse/TIKA-304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Benson Margulies updated TIKA-304: ---------------------------------- Description: It would be nice if one could subclass HtmlParser to change what it passes along, instead of having to copy it. I'll attach a first effort. It would also be good if attributes could be preserved (particularly id attributes) but let's see how you like my first patch. was: It would be nice if one could subclass HtmlParser to change what it passes along, instead of having to copy it. I'll attach a first effort. > HtmlParser could be easier to subclass > -------------------------------------- > > Key: TIKA-304 > URL: https://issues.apache.org/jira/browse/TIKA-304 > Project: Tika > Issue Type: Bug > Components: parser > Affects Versions: 0.4, 0.5 > Reporter: Benson Margulies > Attachments: html-parser-subclass.diff > > > It would be nice if one could subclass HtmlParser to change what it passes > along, instead of having to copy it. I'll attach a first effort. > It would also be good if attributes could be preserved (particularly id > attributes) but let's see how you like my first patch. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.