On Wed, Oct 04, 2023 at 05:32:50PM -0400, Chapman Flack wrote: > Well, for what reason does anybody run PG now with the encoding set > to anything besides UTF-8? I don't really have my finger on that pulse.
Because they still have databases that didn't use UTF-8 10 or 20 years ago that they haven't migrated to UTF-8? It's harder to think of why one might _want_ to store text in any encoding other than UTF-8 for _new_ databases. Though too there's no reason that it should be impossible other than lack of developer interest: as long as text is tagged with its encoding, it should be possible to store text in any number of encodings. > Could it be that it bloats common strings in their local script, and > with enough of those to store, it could matter to use the local > encoding that stores them more economically? UTF-8 bloat is not likely worth the trouble. UTF-8 is only clearly bloaty when compared to encodings with 1-byte code units, like ISO-8859-*. For CJK UTF-8 is not much more bloaty than native non-Unicode encodings like SHIFT_JIS. UTF-8 is not much bloatier than UTF-16 in general either. Bloat is not really a good reason to avoid Unicode or any specific TF. > Also, while any Unicode transfer format can encode any Unicode code > point, I'm unsure whether it's yet the case that {any Unicode code > point} is a superset of every character repertoire associated with > every non-Unicode encoding. It's not always been the case that Unicode is a strict superset of all currently-in-use human scripts. Making Unicode a strict superset of all currently-in-use human scripts seems to be the Unicode Consortium's aim. I think you're asking why not just use UTF-8 for everything, all the time. It's a fair question. I don't have a reason to answer in the negative (maybe someone else does). But that doesn't mean that one couldn't want to store text in many encodings (e.g., for historical reasons). Nico --