add XML/HTML documents using SolrJ, without bypassing HTML char filter

2009-11-11 Thread aseem cheema
Hey Guys,
How do I add HTML/XML documents using SolrJ such that it does not by
pass the HTML char filter?

SolrJ escapes the HTML/XML value of a field, and that make it bypass
the HTML char filter. For example centercontent/center if added to
a field with HTMLStripCharFilter on the field using SolrJ, is not
stripped of center tags. But if check in analysis.jsp, it does get
stripped. When I look at the SolrJ XML feed, it looks like this:
adddoc boost=1.0field name=idhttp://haha.com/fieldfield
name=textlt;centergt;contentlt;/centergt;/field/doc/add

Any help is highly appreciated. Thanks.

-- 
Aseem


Re: add XML/HTML documents using SolrJ, without bypassing HTML char filter

2009-11-11 Thread Ryan McKinley
The HTMLStripCharFilter will strip the html for the *indexed* terms,  
it does not effect the *stored* field.


If you don't want html in the stored field, can you just strip it out  
before passing to solr?



On Nov 11, 2009, at 8:07 PM, aseem cheema wrote:


Hey Guys,
How do I add HTML/XML documents using SolrJ such that it does not by
pass the HTML char filter?

SolrJ escapes the HTML/XML value of a field, and that make it bypass
the HTML char filter. For example centercontent/center if added to
a field with HTMLStripCharFilter on the field using SolrJ, is not
stripped of center tags. But if check in analysis.jsp, it does get
stripped. When I look at the SolrJ XML feed, it looks like this:
adddoc boost=1.0field name=idhttp://haha.com/fieldfield
name=textlt;centergt;contentlt;/centergt;/field/doc/add

Any help is highly appreciated. Thanks.

--
Aseem




Re: add XML/HTML documents using SolrJ, without bypassing HTML char filter

2009-11-11 Thread aseem cheema
Ohhh... you are a life saver... thank you so much.. it makes sense.

Aseem

On Wed, Nov 11, 2009 at 7:40 PM, Ryan McKinley ryan...@gmail.com wrote:
 The HTMLStripCharFilter will strip the html for the *indexed* terms, it does
 not effect the *stored* field.

 If you don't want html in the stored field, can you just strip it out before
 passing to solr?


 On Nov 11, 2009, at 8:07 PM, aseem cheema wrote:

 Hey Guys,
 How do I add HTML/XML documents using SolrJ such that it does not by
 pass the HTML char filter?

 SolrJ escapes the HTML/XML value of a field, and that make it bypass
 the HTML char filter. For example centercontent/center if added to
 a field with HTMLStripCharFilter on the field using SolrJ, is not
 stripped of center tags. But if check in analysis.jsp, it does get
 stripped. When I look at the SolrJ XML feed, it looks like this:
 adddoc boost=1.0field name=idhttp://haha.com/fieldfield
 name=textlt;centergt;contentlt;/centergt;/field/doc/add

 Any help is highly appreciated. Thanks.

 --
 Aseem





-- 
Aseem