Re: Use correct collation in pg_trgm

Kirill Reshke Thu, 22 Jan 2026 00:52:27 -0800

On Wed, 21 Jan 2026 at 20:36, David Geier <[email protected]> wrote:
>
> Hi hackers,
>
> In thread [1] we found that pg_trgm always uses DEFAULT_COLLATION_OID
> for converting trigrams to lower-case. Here are some examples where
> today the collation is ignored:
>
> CREATE EXSTENSION pg_trgm;
> CREATE COLLATION turkish (provider = libc, locale = 'tr_TR.utf8');
>
> postgres=# SELECT show_trgm('ISTANBUL' COLLATE "turkish");
>                   show_trgm
> ---------------------------------------------
>  {"  i"," is",anb,bul,ist,nbu,sta,tan,"ul "}
>
> CREATE TABLE test(col TEXT COLLATE "turkish");
> INSERT INTO test VALUES ('ISTANBUL');
>
> postgres=# select show_trgm(col) FROM test;
>                   show_trgm
> ---------------------------------------------
>  {"  i"," is",anb,bul,ist,nbu,sta,tan,"ul "}
>
> postgres=# SELECT similarity('ıstanbul' COLLATE "turkish", 'ISTANBUL'
> COLLATE "turkish");
>  similarity
> ------------
>         0.5
>
> If the database is initialized via initdb --locale="tr_TR.utf8", the
> output changes:
>
> postgres=# SELECT show_trgm('ISTANBUL');
>                        show_trgm
> --------------------------------------------------------
>  {0xf31e1a,0xfe581d,0x3efd30,anb,bul,nbu,sta,tan,"ul "}
>
> and
>
> postgres=# select show_trgm(col) FROM test;
>                        show_trgm
> --------------------------------------------------------
>  {0xf31e1a,0xfe581d,0x3efd30,anb,bul,nbu,sta,tan,"ul "}
>
> postgres=# SELECT similarity('ıstanbul' COLLATE "turkish", 'ISTANBUL'
> COLLATE "turkish");
>  similarity
> ------------
>           1
>
> tr_TR.utf8 converts capital I to ı which is a multibyte character, while
> my default collation converts I to i.
>
> The attached patch attempts to fix that. I grepped for all occurrences
> of DEFAULT_COLLATION_OID in contrib/pg_trgm and use the function's
> collation OID instead DEFAULT_COLLATION_OID.
>
> The corresponding regression tests pass.
>
> [1]
> https://www.postgresql.org/message-id/e5dd01c6-c469-405d-aea2-feca0b2dc34d%40gmail.com
>
> --
> David Geier


Hi!
LGTM

-- 
Best regards,
Kirill Reshke

Re: Use correct collation in pg_trgm

Reply via email to