unsubscribe
On Fri, Mar 1, 2013 at 7:35 AM, Markus Jelsma <[email protected]>wrote: > Hi, > > We need div elements returned when we pass the stream through Boilerpipe > from Nutch. We enable includeMarkup to get markup returned in the first > place, but divs are not returned. In the ParseContext we set > context.set(HtmlMapper.class, IdentityHtmlMapper.INSTANCE) but this is not > honored for some reason. > > For some reason in the background DefaultHtmlMapper is being used, we know > this because we do get divs returned if we add DIV,div to the SAFE_ELEMENTS > Map. This is not very good because we prefer not to modify this parser > class and because the unit test > testMultipart(org.apache.tika.parser.mail.RFC822ParserTest) fails if the > div is added to the DefaultHtmlMapper.SAFE_ELEMENTS. > > Any ideas on how we can force the IdentityMapper to be used instead? > > Thanks, > Markus > -- Dan Klueter
