Have you verified that the data was indexed properly (UTF-8 encoding)? Try a
raw HTTP request using the browser or curl and see how that field looks in
the resulting XML.
-- Jack Krupansky
-----Original Message-----
From: Andreas Kahl
Sent: Thursday, October 18, 2012 1:10 PM
To: j...@basetechnology.com ; solr-user@lucene.apache.org
Subject: Antw: Re: How to retrieve field contents as UTF-8 from Solr-Index
with SolrJ
Jack,
Thanks for the hint, but we have already set URIEncoding="UTF-8" on all
our tomcats, too.
Regards
Andreas
"Jack Krupansky" 18.10.12 17.11 Uhr >>>
It may be that your container does not have UTF-8 enabled. For example,
with
Tomcat you need something like:
Make sure your "Connector" element has URIEncoding="UTF-8" (for Tomcat.)
-- Jack Krupansky
-----Original Message-----
From: Andreas Kahl
Sent: Thursday, October 18, 2012 10:53 AM
To: solr-user@lucene.apache.org
Subject: How to retrieve field contents as UTF-8 from Solr-Index with
SolrJ
Hello everyone,
we are trying to implement a simple Servlet querying a Solr 3.5-Index
with SolrJ. The Query we send is an identifier in order to retrieve a
single record. From the result we extract one field to return. This
field contains an XML-Document with characters from several european and
asian alphabets, so we need UTF-8.
Now we have the problem that the string returned by
marcXml = results.get(0).getFirstValue("marcxml").toString();
is not valid UTF-8, so the resulting XML-Document is not well formed.
Here is what we do in Java:
<<
ModifiableSolrParams params = new ModifiableSolrParams();
params.set("q", query.toString());
params.set("fl", "marcxml");
params.set("rows", "1");
try {
QueryResponse result = server.query(params,
SolrRequest.METHOD.POST);
SolrDocumentList results = result.getResults();
if (!results.isEmpty()) {
marcXml =
results.get(0).getFirstValue("marcxml").toString();
}
} catch (Exception ex) {
Logger.getLogger(MarcServer.class.getName()).log(Level.SEVERE,
null, ex);
}
Charset.defaultCharset() is "UTF-8" on both, the querying machine and
the Solr-Server. Also we tried BinaryResponseParser as well as
XMLResponseParser when instantiating CommonsHttpSolrServer.
Does anyone have a solution to this? Is this related to
https://issues.apache.org/jira/browse/SOLR-2034 ? Is there
eventually a workaround?
Regards
Andreas