Peter Eisentraut wrote: > > That seems to suggest the standard answer should be 'Á' regardless of > > any COLLATE clause (though I could be misreading). I'm a bit confused > > by that... what's the standard-compatible way to specify the locale for > > UPPER()/LOWER()? If there is none, then it makes sense that Postgres > > overloads the COLLATE clause for that purpose so that users can use a > > different locale if they want. > > The standard doesn't have the notion of locale-dependent case conversion.
Neither does Unicode, which is why the ICU functions like u_isupper() or u_toupper() don't take a locale argument. With libc, isupper_l() and the other ctype functions need a locale argument, but given a locale's value of "language[_territory][.codeset]", in theory only the codeset part is actually useful. To me the question of what we should put in pg_collation.collctype for the "ucs_basic" collation leads to another question which is: why do we even consider collctype in the first place? Within a database, there's only one "codeset", which corresponds to pg_database.encoding, and there's a value in pg_database.lc_ctype that is normally compatible with that encoding. ISTM that UPPER(string COLLATE "whatever") should always give the same result than UPPER(string COLLATE pg_catalog.default). And likewise all functions that depend on character categories could basically ignore the COLLATE specification, given that our database-wide properties are sufficient to characterize the strings within. Best regards, -- Daniel Vérité https://postgresql.verite.pro/ Twitter: @DanielVerite