On Wed, 21 Jan 2026 at 20:36, David Geier <[email protected]> wrote: > > Hi hackers, > > In thread [1] we found that pg_trgm always uses DEFAULT_COLLATION_OID > for converting trigrams to lower-case. Here are some examples where > today the collation is ignored: > > CREATE EXSTENSION pg_trgm; > CREATE COLLATION turkish (provider = libc, locale = 'tr_TR.utf8'); > > postgres=# SELECT show_trgm('ISTANBUL' COLLATE "turkish"); > show_trgm > --------------------------------------------- > {" i"," is",anb,bul,ist,nbu,sta,tan,"ul "} > > CREATE TABLE test(col TEXT COLLATE "turkish"); > INSERT INTO test VALUES ('ISTANBUL'); > > postgres=# select show_trgm(col) FROM test; > show_trgm > --------------------------------------------- > {" i"," is",anb,bul,ist,nbu,sta,tan,"ul "} > > postgres=# SELECT similarity('ıstanbul' COLLATE "turkish", 'ISTANBUL' > COLLATE "turkish"); > similarity > ------------ > 0.5 > > If the database is initialized via initdb --locale="tr_TR.utf8", the > output changes: > > postgres=# SELECT show_trgm('ISTANBUL'); > show_trgm > -------------------------------------------------------- > {0xf31e1a,0xfe581d,0x3efd30,anb,bul,nbu,sta,tan,"ul "} > > and > > postgres=# select show_trgm(col) FROM test; > show_trgm > -------------------------------------------------------- > {0xf31e1a,0xfe581d,0x3efd30,anb,bul,nbu,sta,tan,"ul "} > > postgres=# SELECT similarity('ıstanbul' COLLATE "turkish", 'ISTANBUL' > COLLATE "turkish"); > similarity > ------------ > 1 > > tr_TR.utf8 converts capital I to ı which is a multibyte character, while > my default collation converts I to i. > > The attached patch attempts to fix that. I grepped for all occurrences > of DEFAULT_COLLATION_OID in contrib/pg_trgm and use the function's > collation OID instead DEFAULT_COLLATION_OID. > > The corresponding regression tests pass. > > [1] > https://www.postgresql.org/message-id/e5dd01c6-c469-405d-aea2-feca0b2dc34d%40gmail.com > > -- > David Geier
Hi! LGTM -- Best regards, Kirill Reshke
