1 - Verify your mysql is set up using UTF-8 2 - Does your JDBC connect string contain: useUnicode=true&characterEncoding=UTF-8 See: http://dev.mysql.com/doc/refman/5.0/en/connector-j-reference-charsets.html
Glen http://zzzoot.blogspot.com/ On Mon, Dec 27, 2010 at 5:15 PM, Mark <static.void....@gmail.com> wrote: > Solr: 1.4.1 > JDBC driver: Connector/J 5.1.14 > > Looks like its the JDBC driver because It doesn't even work with a simple > java program. I know this is a little off subject now, but do you have any > clues? Thanks again > > > On 12/27/10 1:58 PM, Erick Erickson wrote: >> >> More data please. >> >> Which jdbc driver? Have you tried just printing out the results of using >> that >> driver in a simple Java program? >> >> Solr should handle UTF-8 just fine, but the servlet container may have to >> have some settings tweaked, which one of those are you using? >> >> What version of Solr? >> >> Best >> Erick >> >> On Mon, Dec 27, 2010 at 3:05 PM, Mark<static.void....@gmail.com> wrote: >> >>> Seems like I am missing some configuration when trying to use DIH to >>> import >>> documents with chinese characters. All the documents save crazy nonsense >>> like "这是测试" instead of actual chinese characters. >>> >>> I think its at the JDBC level because if I hardcode one of the fields >>> within data-config.xml (using a template transformer) the characters show >>> up >>> correctly. >>> >>> Any ideas? Thanks >>> > -- -