RE: Prevent parsers from stripping html tags

Markus Jelsma Mon, 08 May 2017 10:57:54 -0700

Hi - you need an identity mapper for Tika if i remember correctly:

<property>
  <name>tika.htmlmapper.classname</name>
  <value>org.apache.tika.parser.html.IdentityHtmlMapper</value>
  <description>Classname of Tika HTMLMapper to use. Influences the elements 
included in the DOM and hence
  the behavior of the HTMLParseFilters.
  </description>
</property>


Regards,
Markus

 
 
-----Original message-----
> From:Matt Rutherford <[email protected]>
> Sent: Monday 8th May 2017 19:45
> To: [email protected]
> Subject: Prevent parsers from stripping html tags
> 
> I would like to maintain the html tags during the parsing stage so they
> also get indexed. How can I accomplish this?
> 
> I tried removing the parser plugins (html and tika in my case) but it seems
> you need at least one and enabling either of these strips the markup from
> the docs.
>

RE: Prevent parsers from stripping html tags

Reply via email to