Wolfram Arnold wrote: > I'm running into a surprising twist in MySql string compare. > > Try this: > > select "América" = "America"; > +------------------------+ > | "América" = "America" | > +------------------------+ > | 1 | > +------------------------+
I looked into this a bit this morning. It seems that string comparison is affected by "collation" not charset. The default utf8 collation yields the unexpected result: mysql> SHOW VARIABLES LIKE '%collat%'; +----------------------+-----------------+ | Variable_name | Value | +----------------------+-----------------+ | collation_connection | utf8_general_ci | | collation_database | utf8_general_ci | | collation_server | utf8_general_ci | +----------------------+-----------------+ 3 rows in set (0.00 sec) mysql> select "América" = "America"; +------------------------+ | "América" = "America" | +------------------------+ | 1 | +------------------------+ 1 row in set (0.00 sec) But if I change the collation to utf8_bin, the strings compare as unequal: mysql> set collation='utf8_bin'; Query OK, 0 rows affected (0.04 sec) mysql> set collation_server='utf8_bin'; Query OK, 0 rows affected (0.00 sec) mysql> set collation_database='utf8_bin'; Query OK, 0 rows affected (0.00 sec) mysql> SHOW VARIABLES LIKE '%collat%'; +----------------------+----------+ | Variable_name | Value | +----------------------+----------+ | collation_connection | utf8_bin | | collation_database | utf8_bin | | collation_server | utf8_bin | +----------------------+----------+ 3 rows in set (0.00 sec) mysql> select "América" = "America";; +------------------------+ | "América" = "America" | +------------------------+ | 0 | +------------------------+ 1 row in set (0.00 sec) I'm a bit confused myself about how utf8_general_ci could sort effectively if 'é' equals 'e' since users would expect all the like characters to group together and why this collation would be useful; nor do I understand if there are other implications to using utf8_bin collation. There is a nicely written report on Unicode collation here: http://www.unicode.org/unicode/reports/tr10/ Sarah -- Posted via http://www.ruby-forum.com/. _______________________________________________ Railsi18n-discussion mailing list Railsi18n-discussion@rubyforge.org http://rubyforge.org/mailman/listinfo/railsi18n-discussion