Re: Built-in case-insensitive collation pg_unicode_ci

Peter Eisentraut Sat, 18 Oct 2025 11:07:05 -0700

On 20.09.25 02:21, Jeff Davis wrote:

New builtin case-insensitive collation PG_UNICODE_CI, where the
ordering semantics are just:


    strcmp(CASEFOLD(arg1), CASEFOLD(arg2))

and the character semantics are the same as PG_UNICODE_FAST.

If it's a variant of PG_UNICODE_FAST, then it ought to be calledPG_UNICODE_FAST_CI or similar. Otherwise, one would expect it to be avariant of PG_UNICODE (if that existed, but there is also UNICODE).

But that name is also dubious since you later write that it's notactually fast.

Non-deterministic collations cannot be used by SIMILAR TO, and may
cause problems for ILIKE and regexes. The reason is that pattern
matching often depends on the character-by-character semantics, but ICU
collations aren't constrained enough for these semantics to work.

This reasoning is a bit narrow. SIMILAR TO is kind of deprecated, andILIKE is kind of stupid, and regexes have their own way to controlcase-sensitivity.

Nevertheless, I think there would be some value to provide CI (and maybeaccent-insensitive?) collations that operate separately from the"nondeterministic" mechanism. But then I would like to see acomprehensive approach that covers a variety of providers and locales.For example, I would expect there to be something like a "sv_SE_CI"locale, either available by default or easily created.

Re: Built-in case-insensitive collation pg_unicode_ci

Reply via email to