On Fri, Mar 23, 2018 at 11:39 AM, Steven D'Aprano <steve+comp.lang.pyt...@pearwood.info> wrote: > On Fri, 23 Mar 2018 11:08:56 +1100, Chris Angelico wrote: >> Okay. Give me a good reason for the database itself to be locked to >> Latin-1. Make sure you explain how potentially saving the occasional >> byte of storage (compared to UTF-8) justifies limiting the available >> character set to the ones that happen to be in Latin-1, yet it's >> essential to NOT limit the character set to ASCII. > > I'll better than that, I'll give multiple good reasons to use Latin-1. > > It's company policy to only use Latin-1, because the CEO was once > employed by the Unicode Consortium, and fired in disgrace after > embezzling funds, and ever since then he has refused to use Unicode.
You clearly can't afford to quit your job, so I won't mention that possibility. (Oops too late.) But a CEO is not God, and you *can* either dispute or subvert stupid orders. I don't consider this a *good* reason. Maybe a reason, but not a good one. > Compatibility with other databases, systems or tools that require Latin-1. Change them one at a time. When you have to pass data to something that has to receive Latin-1, you encode it to Latin-1. The database can still store UTF-8. Leaving it at Latin-1 is not "good reason for using Latin-1", so much as "we haven't gotten around to changing it yet". > The database has to send information to embedded devices that don't > include a full Unicode implementation, but do support Latin-1. Okay, that's a valid reason, if an incredibly rare one. You have to specifically WANT an encoding error if you try to store something that, later on, will cause problems. It's like asking for a 32-bit signed integer type in Python, because you're using it for something where it's eventually going to be sent to something that can't use larger numbers. Not something that wants a core feature, usually. > The data doesn't actually represent text, but Python 2 style byte- > strings, and Latin-1 is just a convenient, easy way to get that that > ensures ASCII bytes look like ASCII characters. The OP is talking about JSON. Reason makes no sense in that context. And if it really is a byte string, why store it as a Latin-1 string? Store it as the type BLOB instead. Latin-1 is not "arbitrary bytes". It is a very specific encoding that cannot decode every possible byte value. Using Latin-1 to store arbitrary bytes is just as wrong as using ASCII to store eight-bit data. So, you've given me one possible reason that is EXTREMELY situational and, even there, could be handled differently. And it's only valid when you're working with something that supports more than ASCII and no more than Latin-1, and moreover, you have the need for non-ASCII characters. (Otherwise, just use ASCII, which you can declare as UTF-8 if you wish.) ChrisA -- https://mail.python.org/mailman/listinfo/python-list