Hi,

get_collation_actual_version() in pg_locale.c currently 
excludes C.UTF-8 (and more generally C.*) from versioning,
which makes pg_collation.collversion being empty for these
collations.

char *
get_collation_actual_version(char collprovider, const char *collcollate)
{
....
        if (collprovider == COLLPROVIDER_LIBC &&
                pg_strcasecmp("C", collcollate) != 0 &&
                pg_strncasecmp("C.", collcollate, 2) != 0 &&
                pg_strcasecmp("POSIX", collcollate) != 0)

This seems to be based on the idea that C.* collations provide an
immutable sort like "C", but it appears that it's not the case.

For instance, consider how these C.UTF-8 comparisons differ between
recent linux systems:

U+1D400 = Mathematical Bold Capital A

Debian 9.13 (glibc 2.24)
=> select  'A' < E'\U0001D400' collate "C.UTF-8";
 ?column? 
----------
 t

Debian 10.13 (glibc 2.28)
=> select  'A' < E'\U0001D400' collate "C.UTF-8";
 ?column? 
----------
 f

Debian 11.6 (glibc 2.31)
=> select  'A' < E'\U0001D400' collate "C.UTF-8";
 ?column? 
----------
 f

Ubuntu 22.04 (glibc 2.35)
=> select  'A' < E'\U0001D400' collate "C.UTF-8";
 ?column? 
----------
 t

So I suggest the attached patch to no longer exclude these collations
from the generic versioning.


Best regards,
-- 
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite
diff --git a/src/backend/utils/adt/pg_locale.c 
b/src/backend/utils/adt/pg_locale.c
index 092b620673..dd9c3a0c10 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -1727,7 +1727,6 @@ get_collation_actual_version(char collprovider, const 
char *collcollate)
 #endif
                if (collprovider == COLLPROVIDER_LIBC &&
                        pg_strcasecmp("C", collcollate) != 0 &&
-                       pg_strncasecmp("C.", collcollate, 2) != 0 &&
                        pg_strcasecmp("POSIX", collcollate) != 0)
        {
 #if defined(__GLIBC__)

Reply via email to