On Tue, Mar 24, 2026 at 4:07 PM Jeff Davis <[email protected]> wrote:
> On Sat, 2026-03-21 at 20:14 -0700, Mark Dilger wrote: > > After v2-0001, ILIKE uses str_casefold() for matching, but pg_trgm > > still > > uses str_tolower() for trigram extraction (trgm_op.c:352 and :948). > > With builtin collations, these produce different results. > > Interesting, thank you. As stated in the original message, I was unsure > about changing pg_trgm without adjusting the regex logic, also: > > > https://www.postgresql.org/message-id/[email protected] > > do you have a suggestion about an easy way to do that, or should we > revisit in the next cycle? > pg_trgm appears to be lossy, with recheck logic. I would think you just need to make it give answers which at least include everything that a regex would match, and then allow recheck to prune that down. My concern is having pg_trgm give less than all the answers, so that after recheck you get fewer results than a seqscan would have returned. Would switching to casefold be strictly broader than regex? If so, you would just need to convert pg_trgm to use casefold and then rely on the recheck machinery. Sorry if this misses something discussed upthread. I'm clearly assuming here that you don't mind that such a change necessitates a REINDEX. -- *Mark Dilger*
