The HTMLStripCharFilter will strip the html for the *indexed* terms, it does not effect the *stored* field.

If you don't want html in the stored field, can you just strip it out before passing to solr?


On Nov 11, 2009, at 8:07 PM, aseem cheema wrote:

Hey Guys,
How do I add HTML/XML documents using SolrJ such that it does not by
pass the HTML char filter?

SolrJ escapes the HTML/XML value of a field, and that make it bypass
the HTML char filter. For example <center>content</center> if added to
a field with HTMLStripCharFilter on the field using SolrJ, is not
stripped of center tags. But if check in analysis.jsp, it does get
stripped. When I look at the SolrJ XML feed, it looks like this:
<add><doc boost="1.0"><field name="id">http://haha.com</field><field
name="text">&lt;center&gt;content&lt;/center&gt;</field></doc></add>

Any help is highly appreciated. Thanks.

--
Aseem

Reply via email to