Hi, See TIKA-347 for a nice alternative to the earlier TIKA-304 approach to customizing the way Tika maps incoming HTML to XHTML.
You can now inject a custom mapping strategy through the parse context, like this: Parser parser = ...; ParseContext context = new ParseContext(); context.set(HtmlMapper.class, new MyCustomHtmlMapper()) parser.parse(..., context); The new HtmlMapper interface contains the same mapSafeElement() and isDiscardElement() method signatures that we already used for the overridable HtmlParser methods in TIKA-304. If a custom HtmlMapper instance is not found in the parse context, then the existing TIKA-304 mechanism is used. BR, Jukka Zitting