[ 
https://issues.apache.org/jira/browse/TIKA-347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jukka Zitting resolved TIKA-347.
--------------------------------

    Resolution: Fixed

Implemented in revision 890117.

> Make HtmlParser customizable through ParseContext
> -------------------------------------------------
>
>                 Key: TIKA-347
>                 URL: https://issues.apache.org/jira/browse/TIKA-347
>             Project: Tika
>          Issue Type: Improvement
>          Components: parser
>            Reporter: Jukka Zitting
>            Assignee: Jukka Zitting
>             Fix For: 0.6
>
>
> In TIKA-304 we added the mapSafeElement() and isDiscardElement() methods to 
> HtmlParser so that subclasses could better customize how incoming HTML 
> elements get mapped to the XHMTL output from Tika. This works fairly well but 
> requires you to modify the Tika configuration file or to explicitly inject a 
> custom HtmlParser subclass instance to the CompositeParser instance you're 
> using (AutoDetectParser, etc.).
> Now that we have the ParseContext mechanism available to simplify such 
> customization, it would be nice to allow you to provide a custom "HTML 
> mapper" instance through the parse context and have HtmlParser call that 
> mapper (if available) for the mapSafeElement() and isDiscardElement() 
> operations.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to