Hi,

We need div elements returned when we pass the stream through Boilerpipe from 
Nutch. We enable includeMarkup to get markup returned in the first place, but 
divs are not returned. In the ParseContext we set context.set(HtmlMapper.class, 
IdentityHtmlMapper.INSTANCE) but this is not honored for some reason.

For some reason in the background DefaultHtmlMapper is being used, we know this 
because we do get divs returned if we add DIV,div to the SAFE_ELEMENTS Map. 
This is not very good because we prefer not to modify this parser class and 
because the unit test 
testMultipart(org.apache.tika.parser.mail.RFC822ParserTest) fails if the div is 
added to the DefaultHtmlMapper.SAFE_ELEMENTS.

Any ideas on how we can force the IdentityMapper to be used instead?

Thanks,
Markus

Reply via email to