Hmmm, looks like the highlighting code changed. Using the example doc, with 1.4 I get
http://localhost:8983/solr/select?q=features:circumflexes&hl=true&hl.fl=features&wt=json&indent=true "highlighting":{ "UTF8TEST":{ "features":["eaiou with <em>circumflexes</em>: êâîôû"]}}} With 3.1, this now looks like "highlighting":{ "UTF8TEST":{ "features":["eaiou with <em>circumflexes</em>: êâîôû"]}}} So it's started to produce XML entities even when not in XML format. And even worse, when one is using the XML format, those entities are treated as literal data and escaped: <str>eaiou with <em>circumflexes</em>: &#234;&#226;&#238;&#244;&#251;</str> I don't know if this covers all of your problems, but things are definitely a bit wonky in highlighter-land. Could you open a JIRA issue for this? -Yonik http://www.lucenerevolution.org -- Lucene/Solr User Conference, May 25-26, San Francisco 2011/4/14 getagrip <getag...@web.de>: > Having updated from 1.4.1 to 3.1.0 some documents are not parsed correctly > anymore: > > 1. Both the result's id field and the highlighting's header do not display > special-characters e.g. German Umlauts anymore. > > 2. The highlighting section is messed up as words appear in random order > instead of readable sentences. > > Please see both versions (3.1.0 & 1.4.1) below: > > ### > query => > > <http://localhost:8983/solr/select/?q=attr_content:der*&rows=2&start=0&sort=id+asc&hl.fl=attr_content&fl=id&hl=true> > > > ### solr 3.1.0 (not working) => > > > <lst name="highlighting"> > <lst name="/master/Netzqualit?tsmessungen gem?? Klasse A +.pdf"> > <arr name="attr_content"> > <str> > Netzqualitätsmessungen gemäß Klasse ASeit Einführung <em>der</em> Norm IEC > </str> > </arr> > </lst> > <lst name="/master/schleifenimpedanz.pdf"> > <arr name="attr_content"> > <str> > <em>derAnwendungsberichtSchleifenimpedanzDie</em> Messung <em>der</em> > Erdschleifenimpedanz und die Bestimmung desunbeeinflussten Kurzschlussstroms > (PFC > </str> > </arr> > </lst> > </lst> > > > ### solr 1.4.1 (works well) => > > > > <lst name="highlighting"> > <lst name="/master/Netzqualitätsmessungen gemäß Klasse A +.pdf"> > <arr name="attr_content"> > <str> > die elektrische Anlage eines Unternehmens den zuverlässigen Betrieb > <em>der</em> Ver- braucher gewährleistet > </str> > </arr> > </lst> > <lst name="/master/schleifenimpedanz.pdf"> > <arr name="attr_content"> > <str> > wurde die Messung oft gar nicht erst durchgeführt aus Angst <em>der</em> FI > könnte auslösen. Diese Befürch- tung > </str> > </arr> > </lst> > </lst> >