: The conclusion is that setting URIEncoding="UTF-8" in the <Connector> : section in server.xml is not enough : : I also needed to add -Dfile.encoding=UTF-8 to the tomcatâs java : startup options (in catalina.bat)
seeing how you resolved this problem, has got me thinking ... how did you index the CSV file orriginally, id you post it over teh write to Solr (either as a raw post, or as CGI file upload) orwas it afile on the server indexed using the "stream.file" option? if it was a remote file streamed as part of the request, then i don't think it should have mattered what your file.encoding sysprop was set to -- Solr should have either used hte file encoding specified in the Content-Type you sent, or if the content type didn't have a charset, it would have assumed UTF8 (you didn't by any chance put an incorrect charset in the content-type did you?) if it was a file on the server that was accessed using stream.file, then Solr uses a FiledReader to get at it -- which does use the default file encoding for your JVM, but you can tell Solr to ignore that default using the stream.contentType param. can you let us know what you did orriginally, and try some of the alternatives i just listed (without the explicit file.encoding property) -Hoss