On 12/02/21 16:22, Tom Lane wrote: > ... types belonging to the STRING category, which are predicated > on the assumption that such types are reasonably general-purpose > string types.
This prods me to submit a question I've been incubating for a while. Is there any way to find out, from the catalogs or in any automatable way, which types are implemented with a dependence on the database encoding (or on some encoding)? You might think S category types, for a start: name, text, character, varchar, all dependent on the server encoding, as you'd expect. The ones Tom moves here to category Z were most of the ones I wondered about. Then there's "char". It's category S, but does not apply the server encoding. You could call it an 8-bit int type, but it's typically used as a character, making it well-defined for ASCII values and not so for others, just like SQL_ASCII encoding. You could as well say that the "char" type has a defined encoding of SQL_ASCII at all times, regardless of the database encoding. U types are a mixed bag. Category U includes bytea (no character encoding) and xml/json/jsonb (server encoding). Also tied to the server encoding are cstring and unknown. As an aside, I think it's unfortunate that the xml type has this implicit dependency on the server encoding, when XML is by definition Unicode. It means there are valid XML documents that PostgreSQL may not be able to store, and which documents those are depends on what the database encoding is. I think json and jsonb suffer in the same way. Changing that would be disruptive at this point and I'm not suggesting it, but there might be value in the thought experiment to see what the alternate universe would look like. In the alternate world, you would know that certain datatypes were inherently encoding-oblivious (numbers, polygons, times, ...), certain others are bound to the server encoding (text, varchar, name, ...), and still others are bound to a known encoding other than the server encoding: the ISO SQL NCHAR type (bound to an alternate configurable database encoding), "char" (always SQL_ASCII), xml/json/jsonb (always with the full Unicode repertoire, however they choose to represent it internally). That last parenthetical reminded me that I'm really talking about 'repertoire' here, which ISO SQL treats as a separate topic from 'encoding'. Exactly how an xml or jsonb type is represented internally might be none of my business (unless I am developing a binary-capable driver), but it's fair to ask what its repertoire is, and whether that's full Unicode or not, and if not, whether the repertoire changes when some server setting does. I also think in that ideal world, or even this one, you could want some way to query the catalogs to answer that basic question about some given type. Am I right that we simply don't have that? I currently answer such questions by querying the catalog for the type's _send or _recv function name, then going off to read the code, but that's hard to automate. Regards, -Chap