RE: Prevent parsers from stripping html tags

Matt Rutherford Mon, 08 May 2017 11:31:43 -0700

I uncommented this and the parse-tika plugin in plugin.includes but it
still removed tags when indexing.


On 8 May 2017 6:57 p.m., "Markus Jelsma" <[email protected]> wrote:

> Hi - you need an identity mapper for Tika if i remember correctly:
>
> <property>
>   <name>tika.htmlmapper.classname</name>
>   <value>org.apache.tika.parser.html.IdentityHtmlMapper</value>
>   <description>Classname of Tika HTMLMapper to use. Influences the
> elements included in the DOM and hence
>   the behavior of the HTMLParseFilters.
>   </description>
> </property>
>
> Regards,
> Markus
>
>
>
> -----Original message-----
> > From:Matt Rutherford <[email protected]>
> > Sent: Monday 8th May 2017 19:45
> > To: [email protected]
> > Subject: Prevent parsers from stripping html tags
> >
> > I would like to maintain the html tags during the parsing stage so they
> > also get indexed. How can I accomplish this?
> >
> > I tried removing the parser plugins (html and tika in my case) but it
> seems
> > you need at least one and enabling either of these strips the markup from
> > the docs.
> >
>

RE: Prevent parsers from stripping html tags

Reply via email to