s/they/emojis/ On Fri, Oct 15, 2021 at 2:12 PM Jaime Crespo <[email protected]> wrote:
> I don't want to defend MySQL development decisions- in fact PHP made some > similarly bad ones, but it would be unfair to judge them too harsly with > the "power of hindsight" [0]- but... /pedantic on > > On Thu, Oct 14, 2021 at 7:37 PM Roy Smith <[email protected]> wrote: > >> What part of "universal" did they not understand? >> > > ... several years ago, during the end of the century/start of a new one, > no one used UTF-8 [1] and PHP didn't even support multi-byte strings. The > original spec for UTF-8 called for up to 6 bytes[2]. The BMP, however (3 > bytes) contained characters for most modern languages [3], which was a > waste of space and performance because at the time, MySQL worked much > faster with fixed-width columns, which would be a waste of space (double!). > My guess is that someone said "this is probably good enough", and would it > be too outrageous to think that we may not need as many extra characters as > stars in our Galaxy, when less than 65K were practically needed? > > 3 things changed after that: > * Unicode limited UTF-8 to encoding for 21 bits in 2003 [4], requiring > only 4 bytes- only one more than on MySQL's utf8 > * Apple wanted to sell iPhones in Japan, so they were added to unicode in > 2010, and its subsequent popularity > * MySQL/InnoDB has been highly optimized for the fast handling of > variable-length strings > > However, you cannot just arbitrarily break backwards compatibility and > rename the meaning of configuration- specially with storage software that > has been continuously supporting incremental upgrades as long as I can > remember. You can just support the new standard and encourage its usage, > make it the default, etc. > > This is a bit offtopic here (feel free to PM to continue the > conversation), and just to be clear, I am _not fully justifying the > decisions_, just giving historical context, but I want to end with some > relevant lessons to the list: > > * It is very difficult to build future-proof applications- PHP, MySQL, > Mediawiki, they have a long history and we should be gentle when we judge > them from the future. My work, involving backups, makes sometimes > supporting storage of stuff for over 5 years (unchanged) challenging, > because encryption algorithms are found to be weak, or end up being > unsupported/unavailable in just 2 releases of the operating system! > * Standards also change, they are not as "universal" as we may want to > believe (there have been 32 extra unicode versions since 1991). I expect > new collations to be needed in the future that are currently not > implemented, too. > * It is ok to make "mistakes", as long as we learn from them and improve > upon them :-) > > Sorry for the text block. > > [0] <url:https://powerlisting.fandom.com/wiki/Hindsight> > [1] <url:https://commons.wikimedia.org/wiki/File:Utf8webgrowth.svg> > [2] <url:https://www.rfc-editor.org/rfc/rfc2279> > [3] <url: > https://en.wikipedia.org/wiki/Plane_(Unicode)#Basic_Multilingual_Plane> > [4] <url:https://www.rfc-editor.org/rfc/rfc3629> > > -- Jaime Crespo <http://wikimedia.org>
_______________________________________________ Wikitech-l mailing list -- [email protected] To unsubscribe send an email to [email protected] https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
