Hi, We need div elements returned when we pass the stream through Boilerpipe from Nutch. We enable includeMarkup to get markup returned in the first place, but divs are not returned. In the ParseContext we set context.set(HtmlMapper.class, IdentityHtmlMapper.INSTANCE) but this is not honored for some reason.
For some reason in the background DefaultHtmlMapper is being used, we know this because we do get divs returned if we add DIV,div to the SAFE_ELEMENTS Map. This is not very good because we prefer not to modify this parser class and because the unit test testMultipart(org.apache.tika.parser.mail.RFC822ParserTest) fails if the div is added to the DefaultHtmlMapper.SAFE_ELEMENTS. Any ideas on how we can force the IdentityMapper to be used instead? Thanks, Markus
