I haven't tried it yet, but I _think_ in Rails if you are using the 'mysql2' adapter (now standard with Rails3) instead of 'mysql', it might handle utf-8 better with less areas for gotchas. I think if the underlying mysql database is set to use utf-8, then, at least with mysql2 adapter, you shouldn't need to set 'encoding' attribute on the database connection definition. But I could be wrong, and this isn't really about solr anymore of course.

On 12/29/2010 9:48 AM, Mark wrote:
Sure thing.

In my database.yml I was missing the "encoding: utf8" option.

If one were to add unicode characters within rails (console, web form,
etc) the characters would appear to be saved correctly... ie when trying
to retrieve them back, everything looked perfect. The characters also
appeared correctly using the mysql prompt. However when trying to index
or retrieve those characters using JDBC/Solr the characters were mangled.

After adding the above utf8 encoding option I was able to correctly save
utf8 characters into the database and retrieve them using JDBC/Solr.
However when using the mysql client all the characters would show up as
all mangled or as '????'. This was resolved by running the following
query "set names utf8;".

On 12/28/10 10:17 PM, Glen Newton wrote:
Hi Mark,

Could you offer a more technical explanation of the Rails problem, so
that if others encounter a similar problem your efforts in finding the
issue will be available to them?  :-)

Thanks,
Glen

PS. This has wandered somewhat off-topic to this list: apologies&
thanks for the patience of this list...

On Tue, Dec 28, 2010 at 4:15 PM, Mark<static.void....@gmail.com>   wrote:
It was due to the way I was writing to the DB using our rails application.
Everythin looked correct but when retrieving it using the JDBC driver it was
all managled.

On 12/27/10 4:38 PM, Glen Newton wrote:
Is it possible your browser is not set up to properly display the
chinese characters? (I am assuming you are looking at things through
your browser)
Do you have any problems viewing other chinese documents properly in
your browser?
Using mysql, can you see these characters properly?

What happens when you use curl or wget to get a document from solr and
looking at it using something besides your browser?

Yes, I am running out of ideas!  :-)

-Glen

On Mon, Dec 27, 2010 at 7:22 PM, Mark<static.void....@gmail.com>     wrote:
Just like the user of that thread... i have my database, table, columns
and
system variables all set but it still doesnt work as expected.

Server version: 5.0.67 Source distribution

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql>     SHOW VARIABLES LIKE 'collation%';
+----------------------+-----------------+
| Variable_name        | Value           |
+----------------------+-----------------+
| collation_connection | utf8_general_ci |
| collation_database   | utf8_general_ci |
| collation_server     | utf8_general_ci |
+----------------------+-----------------+
3 rows in set (0.00 sec)

mysql>     SHOW VARIABLES LIKE 'character_set%';
+--------------------------+----------------------------------------+
| Variable_name            | Value                                  |
+--------------------------+----------------------------------------+
| character_set_client     | utf8                                   |
| character_set_connection | utf8                                   |
| character_set_database   | utf8                                   |
| character_set_filesystem | binary                                 |
| character_set_results    | utf8                                   |
| character_set_server     | utf8                                   |
| character_set_system     | utf8                                   |
| character_sets_dir       | /usr/local/mysql/share/mysql/charsets/ |
+--------------------------+----------------------------------------+
8 rows in set (0.00 sec)


Any other ideas? Thanks


On 12/27/10 3:23 PM, Glen Newton wrote:
[client]
   default-character-set = utf8
   [mysql]
   default-character-set=utf8
   [mysqld]
   character_set_server = utf8
   character_set_client = utf8

Reply via email to