On 12/3/21 14:12, Tom Lane wrote: > [ breaking off a different new thread ] > > Chapman Flack <c...@anastigmatix.net> writes: >> Then there's "char". It's category S, but does not apply the server >> encoding. You could call it an 8-bit int type, but it's typically used >> as a character, making it well-defined for ASCII values and not so >> for others, just like SQL_ASCII encoding. You could as well say that >> the "char" type has a defined encoding of SQL_ASCII at all times, >> regardless of the database encoding. > This reminds me of something I've been intending to bring up, which > is that the "char" type is not very encoding-safe. charout() for > example just regurgitates the single byte as-is. I think we deemed > that okay the last time anyone thought about it, but that was when > single-byte encodings were the mainstream usage for non-ASCII data. > If you're using UTF8 or another multi-byte server encoding, it's > quite easy to get an invalidly-encoded string this way, which at > minimum is going to break dump/restore scenarios. > > I can think of at least three ways we might address this: > > * Forbid all non-ASCII values for type "char". This results in > simple and portable semantics, but it might break usages that > work okay today. > > * Allow such values only in single-byte server encodings. This > is a bit messy, but it wouldn't break any cases that are not > problematic already. > > * Continue to allow non-ASCII values, but change charin/charout, > char_text, etc so that the external representation is encoding-safe > (perhaps make it an octal or decimal number). > > Either of the first two ways would have to contemplate what to do > with disallowed values that snuck into the DB via pg_upgrade. > That leads me to think that the third way might be the most > preferable, even though it's not terribly backward-compatible. >
I don't like #2. Is #3 going to change the external representation only for non-ASCII values? If so, that seems OK. Changing it for ASCII values seems ugly. #1 is the simplest to implement and to understand, and I suspect it would break very little in practice, but others might disagree with that assessment. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com