On Tue, May 2, 2017 at 9:24 PM, Brian Wolff <bawo...@gmail.com> wrote:

> .
> >
> > On the latest discussions, there are proposals to increase the minimum
> > mediawiki requirements to MySQL/MariaDB 5.5 and allow binary or utf8mb4
> > (not utf8, 3 byte utf8), https://phabricator.wikimedia.org/T161232.
> Utf8mb4
> > should be enough for most uses (utf8 will not allow for emojis, for
> > example), although I am not up to date with the latest unicode standard
> > changes and MySQL features supporting them.
> >
>
> I dont know about mysql, but in unicode emojis are like any other astral
> character, and utf-8 can encode them in 4 bytes*.
>

I am sorry I wasn't clear before, MySQL's utf8 IS NOT international
standard generally known as UTF-8, it is a bastardization of 3-byte max
UTF-8. MySQL's utf8mb4 is UTF-8:

Proof:

```
mysql> use test
Database changed
mysql> CREATE TABLE test (a char(1) CHARSET utf8, b char(1) CHARSET
utf8mb4, c binary(4));
Query OK, 0 rows affected (0.02 sec)

mysql> SET NAMES utf8mb4;
Query OK, 0 rows affected (0.00 sec)

mysql> insert into test VALUES ('\U+1F4A9', '\U+1F4A9', '\U+1F4A9');
Query OK, 1 row affected, 1 warning (0.00 sec)

mysql> SHOW WARNINGS;
+---------+------+--------------------------------------------------------------------+
| Level   | Code | Message
           |
+---------+------+--------------------------------------------------------------------+
| Warning | 1366 | Incorrect string value: '\xF0\x9F\x92\xA9' for column
'a' at row 1 |
+---------+------+--------------------------------------------------------------------+
1 row in set (0.01 sec)

mysql> SELECT * FROM test;
+------+------+------+
| a    | b    | c    |
+------+------+------+
| ?    | 💩     | 💩     | -- you will need an emoji-compatible font here
+------+------+------+
1 row in set (0.00 sec)

mysql> SELECT hex(a), hex(b), hex(c) FROM test;
+--------+----------+----------+
| hex(a) | hex(b)   | hex(c)   |
+--------+----------+----------+
| 3F     | F09F92A9 | F09F92A9 |
+--------+----------+----------+
1 row in set (0.00 sec)
```

To avoid truncations:

```
mysql> set sql_mode='TRADITIONAL'; --
https://phabricator.wikimedia.org/T108255
Query OK, 0 rows affected (0.00 sec)

mysql> insert into test VALUES ('\U+1F4A9', '\U+1F4A9', '\U+1F4A9');
ERROR 1366 (22007): Incorrect string value: '\xF0\x9F\x92\xA9' for column
'a' at row 1
```

More info at:
https://dev.mysql.com/doc/refman/5.7/en/charset-unicode-utf8.html vs.
https://dev.mysql.com/doc/refman/5.7/en/charset-unicode-utf8mb4.html


-- 
Jaime Crespo
<http://wikimedia.org>
_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to