[ https://issues.apache.org/jira/browse/TIKA-347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jukka Zitting resolved TIKA-347. -------------------------------- Resolution: Fixed Implemented in revision 890117. > Make HtmlParser customizable through ParseContext > ------------------------------------------------- > > Key: TIKA-347 > URL: https://issues.apache.org/jira/browse/TIKA-347 > Project: Tika > Issue Type: Improvement > Components: parser > Reporter: Jukka Zitting > Assignee: Jukka Zitting > Fix For: 0.6 > > > In TIKA-304 we added the mapSafeElement() and isDiscardElement() methods to > HtmlParser so that subclasses could better customize how incoming HTML > elements get mapped to the XHMTL output from Tika. This works fairly well but > requires you to modify the Tika configuration file or to explicitly inject a > custom HtmlParser subclass instance to the CompositeParser instance you're > using (AutoDetectParser, etc.). > Now that we have the ParseContext mechanism available to simplify such > customization, it would be nice to allow you to provide a custom "HTML > mapper" instance through the parse context and have HtmlParser call that > mapper (if available) for the mapSafeElement() and isDiscardElement() > operations. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.