unsubscribe

On Fri, Mar 1, 2013 at 7:35 AM, Markus Jelsma <[email protected]>wrote:

> Hi,
>
> We need div elements returned when we pass the stream through Boilerpipe
> from Nutch. We enable includeMarkup to get markup returned in the first
> place, but divs are not returned. In the ParseContext we set
> context.set(HtmlMapper.class, IdentityHtmlMapper.INSTANCE) but this is not
> honored for some reason.
>
> For some reason in the background DefaultHtmlMapper is being used, we know
> this because we do get divs returned if we add DIV,div to the SAFE_ELEMENTS
> Map. This is not very good because we prefer not to modify this parser
> class and because the unit test
> testMultipart(org.apache.tika.parser.mail.RFC822ParserTest) fails if the
> div is added to the DefaultHtmlMapper.SAFE_ELEMENTS.
>
> Any ideas on how we can force the IdentityMapper to be used instead?
>
> Thanks,
> Markus
>



-- 
Dan Klueter

Reply via email to