All,

 

I have just installed Solr 3.1 running on Tomcat 7.  I am noticing a possible 
issue with Highlighting.  I have a filed in my index called "story".  The solr 
document that I am testing with the data in the story field starts with the 
following snippet (remaining data in the field is not shown to keep things 
simple)

 

<p><a idref="0" /></p><p>EN AMÉRICA LATINA, 

 

When I search for "america" with the highlighting enabled on the "story' field, 
here is what I get in my "highlighting" section of the response.  I am using 
the "ASCIIFoldingFilterFactory" to make my searches accent insensitive.  

 

<lst name="highlighting"><lst name="2011_May_13_ _1c77033a"><arr 
name="story"><str>&lt;p&gt;&lt;a idref=&quot;0&quot; /&gt;&lt;/p&gt;&lt;p&gt;EN 
<em>AM&#201;RICA</em> LATINA, SE HAN PRODUCIDO AVANCES, CON RESPECTO A LA 
PROTECCI&#211;N</str></arr></lst>.  The problem is the encode html tags before 
the <em> showing up as raw html tags (because of the encoding) on my search 
results page.  Just to make sure, I do want the html to be interpreted as html 
not as text.  In this particular situation I am not worried about the dangers 
of allowing such behavior.

 

The same test performed on the same data running on 1.4.1 index does not 
exhibit this behavior.

 

Any help is appreciated.  Please let me know if I need to post my field type 
definitions (index and query) from the SolrConfig.xml for the "story" field.

 

Thanks in advance

 

Raj

Reply via email to