Hi Peter,
I have the same set of issues and will look for a response here.
Sometimes those other chars can be create at the time of input (like
extraction from a Microsoft Office doc from third part tool for
example). But MySQL looking OK in the browser might be because the
encoding of MySQL was not the same as the original text. Say for
example that the collation of MySQL is Latin, and the document was
UTF-8. When a browser renders, it might assume chars are UTF-8, but
SOLR might be taking the table type literally in the DIH (Latin1
Swedish for example). Could also be the way PHP doesn't handle UTF-8
well and it depends on your client.
Don't think it has anything to do with Jetty - I use Resin.
Hope that helps,
- Jonathan
On Nov 4, 2009, at 8:48 AM, Peter Hedlund wrote:
I'm having a problem with character encoding. The data that I'm
indexing with SOLR is being pulled from a MySQL database and then
the index is being integrated into a PHP application. When I
display the text from the SOLR index it's full of strange characters
(–, é, etc...). However, when I bypass SOLR and access the data
from the MySQL table directly and write to the browser I don't see
any problems with em-dashes and accented characters.
Is this a JETTY issue or a SOLR issue or something else? (It's not
simply an issue of including <meta http-equiv="Content-Type"
content="text/html;charset=UTF-8"> either)
Thanks for any help.
Peter Hedlund