Wolfram Arnold wrote:
> I'm running into a surprising twist in MySql string compare.
> 
> Try this:
> 
> select "América" = "America";
> +------------------------+
> | "América" = "America"  |
> +------------------------+
> |                      1 | 
> +------------------------+

I looked into this a bit this morning.  It seems that string comparison 
is affected by "collation" not charset.  The default utf8 collation 
yields the unexpected result:

mysql> SHOW VARIABLES LIKE '%collat%';
+----------------------+-----------------+
| Variable_name        | Value           |
+----------------------+-----------------+
| collation_connection | utf8_general_ci | 
| collation_database   | utf8_general_ci | 
| collation_server     | utf8_general_ci | 
+----------------------+-----------------+
3 rows in set (0.00 sec)

mysql> select "América" = "America";
+------------------------+
| "América" = "America"  |
+------------------------+
|                      1 | 
+------------------------+
1 row in set (0.00 sec)

But if I change the collation to utf8_bin, the strings compare as 
unequal:

mysql> set collation='utf8_bin';
Query OK, 0 rows affected (0.04 sec)

mysql> set collation_server='utf8_bin';
Query OK, 0 rows affected (0.00 sec)

mysql> set collation_database='utf8_bin';
Query OK, 0 rows affected (0.00 sec)

mysql> SHOW VARIABLES LIKE '%collat%';
+----------------------+----------+
| Variable_name        | Value    |
+----------------------+----------+
| collation_connection | utf8_bin | 
| collation_database   | utf8_bin | 
| collation_server     | utf8_bin | 
+----------------------+----------+
3 rows in set (0.00 sec)

mysql> select "América" = "America";;
+------------------------+
| "América" = "America"  |
+------------------------+
|                      0 | 
+------------------------+
1 row in set (0.00 sec)

I'm a bit confused myself about how utf8_general_ci could sort 
effectively if 'é' equals 'e' since users would expect all the like 
characters to group together and why this collation would be useful; nor 
do I understand if there are other implications to using utf8_bin 
collation.

There is a nicely written report on Unicode collation here: 
http://www.unicode.org/unicode/reports/tr10/

Sarah
-- 
Posted via http://www.ruby-forum.com/.
_______________________________________________
Railsi18n-discussion mailing list
Railsi18n-discussion@rubyforge.org
http://rubyforge.org/mailman/listinfo/railsi18n-discussion

Reply via email to