Hi Peter,

I have the same set of issues and will look for a response here.

Sometimes those other chars can be create at the time of input (like extraction from a Microsoft Office doc from third part tool for example). But MySQL looking OK in the browser might be because the encoding of MySQL was not the same as the original text. Say for example that the collation of MySQL is Latin, and the document was UTF-8. When a browser renders, it might assume chars are UTF-8, but SOLR might be taking the table type literally in the DIH (Latin1 Swedish for example). Could also be the way PHP doesn't handle UTF-8 well and it depends on your client.

Don't think it has anything to do with Jetty - I use Resin.

Hope that helps,

- Jonathan


On Nov 4, 2009, at 8:48 AM, Peter Hedlund wrote:

I'm having a problem with character encoding. The data that I'm indexing with SOLR is being pulled from a MySQL database and then the index is being integrated into a PHP application. When I display the text from the SOLR index it's full of strange characters (–, é, etc...). However, when I bypass SOLR and access the data from the MySQL table directly and write to the browser I don't see any problems with em-dashes and accented characters.

Is this a JETTY issue or a SOLR issue or something else? (It's not simply an issue of including <meta http-equiv="Content-Type" content="text/html;charset=UTF-8"> either)

Thanks for any help.

Peter Hedlund



Reply via email to