Hi Adam,

Yeah... Here's the situation with MySQL/MariaDB and "utf8".

When MySQL introduced utf8 charset, they went with a sort of "compressed"
version of UTF-8 that excluded bits for some character ranges (I am super
simplifying this). Emojis and some other character ranges didn't exist at
the time, and now cannot be represented by their "utf8".

utf8mb4 is the "real" UTF-8 charset type. However, it's not a drop-in
replacement. It affects key lengths, amongst other things, and is
incompatible with, well, many things.

There *is* a way to get true UTF-8 support. It requires utf8mb4, and a
handful of global settings applied to the server to enable large keys and a
different InnoDB file format. It then requires a special command to be set
at the beginning of each MySQL/MariaDB session to opt into some better
support.

Basically, it's invasive and not something that we can currently tell
people to enable, or it'll cause new problems. It also requires full table
rebuilds. The instructions also depend on the version of MySQL/MariaDB.

We plan to bake in some level of support for it in Review Board in the
future, but Django doesn't natively support it, and it'll require a bunch
of special logic to rebuild data.

I can't currently provide the settings you may need, because many of them
are dependent on the version of MySQL/MariaDB you're using, and I haven't
verified them lately (just working off internal notes). It boils down to:

1) Using utf8mb4 charsets for all databases, tables, and
connections/sessions
2) Using utf8mb4_bin collation for all the above
3) Enabling innodb_large_prefix and innodb_per_table (might depend on the
versions of MySQL/MariaDB)
4) Enabling innodb_file_format=barracuda (not needed on modern versions)

This is not an exhaustive step-by-step.

PostgreSQL will do UTF-8 by default, fwiw.

Hoping to revisit this support in MySQL/MariaDB after RB4 wraps up. Should
be easier now that MySQL/MariaDB have made progress in this area, and I
need to update my knowledge of what that progress looks like.

Christian


On Fri, May 15, 2020 at 5:31 AM Adam Weremczuk <[email protected]> wrote:

> I don't think utf8mb4 was a good idea and I believe it's now leading to:
>
> sudo rb-site install /var/www/mysite
> (...)
> * Installing the site...
> (...)
> Creating table scmtools_repository
>
> [!] There was an error synchronizing the database. Make sure the
>     database is created and has the appropriate permissions, and then
>     continue.
> [!] Details: (1071, 'Specified key was too long; max key length is 767
>     bytes')
>
> Press Enter to continue
>
>
>
> On Thursday, 14 May 2020 16:01:35 UTC+1, Adam Weremczuk wrote:
>>
>> Hi all,
>>
>> Following installation guide for MySQL I've added to /etc/mysql/my.cnf
>>
>> [client]
>> default-character-set=utf8
>>
>> [mysqld]
>> character-set-server=utf8
>>
>> MariaDB fails to start:
>>
>> May 14 14:01:41 gittest systemd[1]: Starting MariaDB 10.1.44 database
>> server...
>> May 14 14:01:41 gittest mysqld[10318]: 2020-05-14 14:01:41
>> 139687784537472 [Note] /usr/sbin/mysqld (mysqld 10.1.44-MariaDB-0+deb9u1)
>> starting as process 10318 ...
>> May 14 14:01:41 gittest mysqld[10318]: 2020-05-14 14:01:41
>> 139687784537472 [ERROR] COLLATION 'utf8mb4_general_ci' is not valid for
>> CHARACTER SET 'utf8'
>> May 14 14:01:41 gittest mysqld[10318]: 2020-05-14 14:01:41
>> 139687784537472 [ERROR] Aborting
>> May 14 14:01:41 gittest systemd[1]: mariadb.service: Main process exited,
>> code=exited, status=1/FAILURE
>> May 14 14:01:41 gittest systemd[1]: Failed to start MariaDB 10.1.44
>> database server.
>>
>> When I comment out these 2 addition it starts fine and I can retrieve the
>> following:
>>
>> MariaDB [(none)]> SHOW COLLATION LIKE 'utf8%';
>>
>> +------------------------------+---------+-----+---------+----------+---------+
>> | Collation                    | Charset | Id  | Default | Compiled |
>> Sortlen |
>>
>> +------------------------------+---------+-----+---------+----------+---------+
>> | utf8_general_ci              | utf8    |  33 | Yes     | Yes      |
>>    1 |
>> | utf8_bin                     | utf8    |  83 |         | Yes      |
>>    1 |
>> | utf8_unicode_ci              | utf8    | 192 |         | Yes      |
>>    8 |
>> | utf8_icelandic_ci            | utf8    | 193 |         | Yes      |
>>    8 |
>> | utf8_latvian_ci              | utf8    | 194 |         | Yes      |
>>    8 |
>> | utf8_romanian_ci             | utf8    | 195 |         | Yes      |
>>    8 |
>> | utf8_slovenian_ci            | utf8    | 196 |         | Yes      |
>>    8 |
>> | utf8_polish_ci               | utf8    | 197 |         | Yes      |
>>    8 |
>> | utf8_estonian_ci             | utf8    | 198 |         | Yes      |
>>    8 |
>> | utf8_spanish_ci              | utf8    | 199 |         | Yes      |
>>    8 |
>> | utf8_swedish_ci              | utf8    | 200 |         | Yes      |
>>    8 |
>> | utf8_turkish_ci              | utf8    | 201 |         | Yes      |
>>    8 |
>> | utf8_czech_ci                | utf8    | 202 |         | Yes      |
>>    8 |
>> | utf8_danish_ci               | utf8    | 203 |         | Yes      |
>>    8 |
>> | utf8_lithuanian_ci           | utf8    | 204 |         | Yes      |
>>    8 |
>> | utf8_slovak_ci               | utf8    | 205 |         | Yes      |
>>    8 |
>> | utf8_spanish2_ci             | utf8    | 206 |         | Yes      |
>>    8 |
>> | utf8_roman_ci                | utf8    | 207 |         | Yes      |
>>    8 |
>> | utf8_persian_ci              | utf8    | 208 |         | Yes      |
>>    8 |
>> | utf8_esperanto_ci            | utf8    | 209 |         | Yes      |
>>    8 |
>> | utf8_hungarian_ci            | utf8    | 210 |         | Yes      |
>>    8 |
>> | utf8_sinhala_ci              | utf8    | 211 |         | Yes      |
>>    8 |
>> | utf8_german2_ci              | utf8    | 212 |         | Yes      |
>>    8 |
>> | utf8_croatian_mysql561_ci    | utf8    | 213 |         | Yes      |
>>    8 |
>> | utf8_unicode_520_ci          | utf8    | 214 |         | Yes      |
>>    8 |
>> | utf8_vietnamese_ci           | utf8    | 215 |         | Yes      |
>>    8 |
>> | utf8_general_mysql500_ci     | utf8    | 223 |         | Yes      |
>>    1 |
>> | utf8_croatian_ci             | utf8    | 576 |         | Yes      |
>>    8 |
>> | utf8_myanmar_ci              | utf8    | 577 |         | Yes      |
>>    8 |
>> | utf8_thai_520_w2             | utf8    | 578 |         | Yes      |
>>    4 |
>> | utf8mb4_general_ci           | utf8mb4 |  45 | Yes     | Yes      |
>>    1 |
>> | utf8mb4_bin                  | utf8mb4 |  46 |         | Yes      |
>>    1 |
>> | utf8mb4_unicode_ci           | utf8mb4 | 224 |         | Yes      |
>>    8 |
>> | utf8mb4_icelandic_ci         | utf8mb4 | 225 |         | Yes      |
>>    8 |
>> | utf8mb4_latvian_ci           | utf8mb4 | 226 |         | Yes      |
>>    8 |
>> | utf8mb4_romanian_ci          | utf8mb4 | 227 |         | Yes      |
>>    8 |
>> | utf8mb4_slovenian_ci         | utf8mb4 | 228 |         | Yes      |
>>    8 |
>> | utf8mb4_polish_ci            | utf8mb4 | 229 |         | Yes      |
>>    8 |
>> | utf8mb4_estonian_ci          | utf8mb4 | 230 |         | Yes      |
>>    8 |
>> | utf8mb4_spanish_ci           | utf8mb4 | 231 |         | Yes      |
>>    8 |
>> | utf8mb4_swedish_ci           | utf8mb4 | 232 |         | Yes      |
>>    8 |
>> | utf8mb4_turkish_ci           | utf8mb4 | 233 |         | Yes      |
>>    8 |
>> | utf8mb4_czech_ci             | utf8mb4 | 234 |         | Yes      |
>>    8 |
>> | utf8mb4_danish_ci            | utf8mb4 | 235 |         | Yes      |
>>    8 |
>> | utf8mb4_lithuanian_ci        | utf8mb4 | 236 |         | Yes      |
>>    8 |
>> | utf8mb4_slovak_ci            | utf8mb4 | 237 |         | Yes      |
>>    8 |
>> | utf8mb4_spanish2_ci          | utf8mb4 | 238 |         | Yes      |
>>    8 |
>> | utf8mb4_roman_ci             | utf8mb4 | 239 |         | Yes      |
>>    8 |
>> | utf8mb4_persian_ci           | utf8mb4 | 240 |         | Yes      |
>>    8 |
>> | utf8mb4_esperanto_ci         | utf8mb4 | 241 |         | Yes      |
>>    8 |
>> | utf8mb4_hungarian_ci         | utf8mb4 | 242 |         | Yes      |
>>    8 |
>> | utf8mb4_sinhala_ci           | utf8mb4 | 243 |         | Yes      |
>>    8 |
>> | utf8mb4_german2_ci           | utf8mb4 | 244 |         | Yes      |
>>    8 |
>> | utf8mb4_croatian_mysql561_ci | utf8mb4 | 245 |         | Yes      |
>>    8 |
>> | utf8mb4_unicode_520_ci       | utf8mb4 | 246 |         | Yes      |
>>    8 |
>> | utf8mb4_vietnamese_ci        | utf8mb4 | 247 |         | Yes      |
>>    8 |
>> | utf8mb4_croatian_ci          | utf8mb4 | 608 |         | Yes      |
>>    8 |
>> | utf8mb4_myanmar_ci           | utf8mb4 | 609 |         | Yes      |
>>    8 |
>> | utf8mb4_thai_520_w2          | utf8mb4 | 610 |         | Yes      |
>>    4 |
>>
>> +------------------------------+---------+-----+---------+----------+---------+
>> 59 rows in set (0.00 sec)
>>
>> I've replaced utf8 with utf8mb4 in my.cf and MariaDB is now starting
>> fine.
>>
>> Have I done the right thing?
>>
>> Shall the installation documentation be updated?
>>
>> Thanks,
>> Adam
>>
>> --
> Supercharge your Review Board with Power Pack:
> https://www.reviewboard.org/powerpack/
> Want us to host Review Board for you? Check out RBCommons:
> https://rbcommons.com/
> Happy user? Let us know! https://www.reviewboard.org/users/
> ---
> You received this message because you are subscribed to the Google Groups
> "Review Board Community" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/reviewboard/a02fb57b-6547-4d43-a028-4e8706a42860%40googlegroups.com
> <https://groups.google.com/d/msgid/reviewboard/a02fb57b-6547-4d43-a028-4e8706a42860%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>


-- 
Christian Hammond
President/CEO of Beanbag <https://www.beanbaginc.com/>
Makers of Review Board <https://www.reviewboard.org/>

-- 
Supercharge your Review Board with Power Pack: 
https://www.reviewboard.org/powerpack/
Want us to host Review Board for you? Check out RBCommons: 
https://rbcommons.com/
Happy user? Let us know! https://www.reviewboard.org/users/
--- 
You received this message because you are subscribed to the Google Groups 
"Review Board Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/reviewboard/CAE7VndmhBFOxeePH3NGSV9dg2B1XQ8D-guiyRJxABps0%3D%2BK--Q%40mail.gmail.com.

Reply via email to