On 20.09.25 02:21, Jeff Davis wrote:
New builtin case-insensitive collation PG_UNICODE_CI, where the ordering semantics are just:strcmp(CASEFOLD(arg1), CASEFOLD(arg2)) and the character semantics are the same as PG_UNICODE_FAST.
If it's a variant of PG_UNICODE_FAST, then it ought to be called PG_UNICODE_FAST_CI or similar. Otherwise, one would expect it to be a variant of PG_UNICODE (if that existed, but there is also UNICODE).
But that name is also dubious since you later write that it's not actually fast.
Non-deterministic collations cannot be used by SIMILAR TO, and may cause problems for ILIKE and regexes. The reason is that pattern matching often depends on the character-by-character semantics, but ICU collations aren't constrained enough for these semantics to work.
This reasoning is a bit narrow. SIMILAR TO is kind of deprecated, and ILIKE is kind of stupid, and regexes have their own way to control case-sensitivity.
Nevertheless, I think there would be some value to provide CI (and maybe accent-insensitive?) collations that operate separately from the "nondeterministic" mechanism. But then I would like to see a comprehensive approach that covers a variety of providers and locales. For example, I would expect there to be something like a "sv_SE_CI" locale, either available by default or easily created.
