Re: solr admin result page error

2011-02-25 Thread Bernd Fehling
Hi Markus, the result of my investigation is that Lucene currently can only handle UTF-8 code within BMP [Basic Multilingual Plane] (plane 0) = 0x. Any code above BMP might end in unpredictable results which is bad. If you get invalid UTF-8 from the index and use wt=xml it gives the error

solr admin result page error

2011-02-11 Thread Bernd Fehling
Dear list, after loading some documents via DIH which also include urls I get this yellow XML error page as search result from solr admin GUI after a search. It says XML processing error not well-formed. The code it argues about is: arr name=dcurls strhttp://eprints.soton.ac.uk/43350//str

Re: solr admin result page error

2011-02-11 Thread Bernd Fehling
Results so far. I could locate and isolate the document causing trouble. I've checked the document with xmllint again. It is valid, well-formed utf8. I've loaded the single document and get the XML error if displaying the search result. This is through solr admin search and also JSON interface,

Re: solr admin result page error

2011-02-11 Thread Markus Jelsma
It looks like you hit the same issue as i did a while ago: http://www.mail-archive.com/solr-user@lucene.apache.org/msg46510.html On Friday 11 February 2011 08:59:27 Bernd Fehling wrote: Dear list, after loading some documents via DIH which also include urls I get this yellow XML error page

Re: solr admin result page error

2011-02-11 Thread Bernd Fehling
Hi Markus, yes it looks like the same issue. There is also a \u utf8-code in your dump. Till now I followed it into XMLResponseWriter. Some steps before the result in a buffer looks good and the utf8-code is correct. Really hard to debug this freaky problem. Have you looked deeper into this