Hi, get_collation_actual_version() in pg_locale.c currently excludes C.UTF-8 (and more generally C.*) from versioning, which makes pg_collation.collversion being empty for these collations.
char * get_collation_actual_version(char collprovider, const char *collcollate) { .... if (collprovider == COLLPROVIDER_LIBC && pg_strcasecmp("C", collcollate) != 0 && pg_strncasecmp("C.", collcollate, 2) != 0 && pg_strcasecmp("POSIX", collcollate) != 0) This seems to be based on the idea that C.* collations provide an immutable sort like "C", but it appears that it's not the case. For instance, consider how these C.UTF-8 comparisons differ between recent linux systems: U+1D400 = Mathematical Bold Capital A Debian 9.13 (glibc 2.24) => select 'A' < E'\U0001D400' collate "C.UTF-8"; ?column? ---------- t Debian 10.13 (glibc 2.28) => select 'A' < E'\U0001D400' collate "C.UTF-8"; ?column? ---------- f Debian 11.6 (glibc 2.31) => select 'A' < E'\U0001D400' collate "C.UTF-8"; ?column? ---------- f Ubuntu 22.04 (glibc 2.35) => select 'A' < E'\U0001D400' collate "C.UTF-8"; ?column? ---------- t So I suggest the attached patch to no longer exclude these collations from the generic versioning. Best regards, -- Daniel Vérité https://postgresql.verite.pro/ Twitter: @DanielVerite
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c index 092b620673..dd9c3a0c10 100644 --- a/src/backend/utils/adt/pg_locale.c +++ b/src/backend/utils/adt/pg_locale.c @@ -1727,7 +1727,6 @@ get_collation_actual_version(char collprovider, const char *collcollate) #endif if (collprovider == COLLPROVIDER_LIBC && pg_strcasecmp("C", collcollate) != 0 && - pg_strncasecmp("C.", collcollate, 2) != 0 && pg_strcasecmp("POSIX", collcollate) != 0) { #if defined(__GLIBC__)