: The conclusion is that setting URIEncoding="UTF-8" in the <Connector>
: section in server.xml is not enough
:
: I also needed to add -Dfile.encoding=UTF-8 to the tomcat’s java
: startup options (in catalina.bat)

seeing how you resolved this problem, has got me thinking ... how did you
index the CSV file orriginally, id you post it over teh write to Solr
(either as a raw post, or as CGI file upload) orwas it afile on the server
indexed using the "stream.file" option?

if it was a remote file streamed as part of the request, then i don't
think it should have mattered what your file.encoding sysprop was set to
-- Solr should have either used hte file encoding specified in the
Content-Type you sent, or if the content type didn't have a charset, it
would have assumed UTF8 (you didn't by any chance put an incorrect
charset in the content-type did you?)

if it was a file on the server that was accessed using stream.file, then
Solr uses a FiledReader to get at it -- which does use the default file
encoding for your JVM, but you can tell Solr to ignore that default using
the stream.contentType param.

can you let us know what you did orriginally, and try some of the
alternatives i just listed (without the explicit file.encoding property)



-Hoss

Reply via email to