Hmmm, looks like the highlighting code changed.  Using the example
doc, with 1.4 I get

http://localhost:8983/solr/select?q=features:circumflexes&hl=true&hl.fl=features&wt=json&indent=true

 "highlighting":{
  "UTF8TEST":{
        "features":["eaiou with <em>circumflexes</em>: êâîôû"]}}}

With 3.1, this now looks like

  "highlighting":{
    "UTF8TEST":{
      "features":["eaiou with <em>circumflexes</em>:
&#234;&#226;&#238;&#244;&#251;"]}}}

So it's started to produce XML entities even when not in XML format.
And even worse, when one is using the XML format, those entities are
treated as literal data and escaped:
      <str>eaiou with &lt;em&gt;circumflexes&lt;/em&gt;:
&amp;#234;&amp;#226;&amp;#238;&amp;#244;&amp;#251;</str>

I don't know if this covers all of your problems, but things are
definitely a bit wonky in highlighter-land.
Could you open a JIRA issue for this?

-Yonik
http://www.lucenerevolution.org -- Lucene/Solr User Conference, May
25-26, San Francisco




2011/4/14 getagrip <getag...@web.de>:
> Having updated from 1.4.1 to 3.1.0 some documents are not parsed correctly
> anymore:
>
> 1. Both the result's id field and the highlighting's header do not display
> special-characters e.g. German Umlauts anymore.
>
> 2. The highlighting section is messed up as words appear in random order
> instead of readable sentences.
>
> Please see both versions (3.1.0 & 1.4.1) below:
>
> ###
> query =>
>
> <http://localhost:8983/solr/select/?q=attr_content:der*&rows=2&start=0&sort=id+asc&hl.fl=attr_content&fl=id&hl=true>
>
>
> ### solr 3.1.0 (not working) =>
>
>
> <lst name="highlighting">
> <lst name="/master/Netzqualit?tsmessungen gem?? Klasse A +.pdf">
> <arr name="attr_content">
> <str>
>  Netzqualitätsmessungen gemäß Klasse ASeit Einführung <em>der</em> Norm IEC
> </str>
> </arr>
> </lst>
> <lst name="/master/schleifenimpedanz.pdf">
> <arr name="attr_content">
> <str>
>  <em>derAnwendungsberichtSchleifenimpedanzDie</em> Messung <em>der</em>
> Erdschleifenimpedanz und die Bestimmung desunbeeinflussten Kurzschlussstroms
> (PFC
> </str>
> </arr>
> </lst>
> </lst>
>
>
> ### solr 1.4.1 (works well) =>
>
>
>
> <lst name="highlighting">
> <lst name="/master/Netzqualitätsmessungen gemäß Klasse A +.pdf">
> <arr name="attr_content">
> <str>
>  die elektrische Anlage eines Unternehmens den zuverlässigen Betrieb
> <em>der</em> Ver- braucher gewährleistet
> </str>
> </arr>
> </lst>
> <lst name="/master/schleifenimpedanz.pdf">
> <arr name="attr_content">
> <str>
>  wurde die Messung oft gar nicht erst durchgeführt aus Angst <em>der</em> FI
> könnte auslösen. Diese Befürch- tung
> </str>
> </arr>
> </lst>
> </lst>
>

Reply via email to