Remove 'charlen' argument from make_trigrams()

The function assumed that if charlen == bytelen, there are no
multibyte characters in the string. That's sensible, but the callers
were a little careless in how they calculated the lengths. The callers
converted the string to lowercase before calling make_trigram(), and
the 'charlen' value was calculated *before* the conversion to
lowercase while 'bytelen' was calculated after the conversion. If the
lowercased string had a different number of characters than the
original, make_trigram() might incorrectly apply the fastpath and
treat all the bytes as single-byte characters, or fail to apply the
fastpath (which is harmless), or it might hit the "Assert(bytelen ==
charlen)" assertion. I'm not aware of any locale / character
combinations where you could hit that assertion in practice,
i.e. where a string converted to lowercase would have fewer characters
than the original, but it seems best to avoid making that assumption.

To fix, remove the 'charlen' argument. To keep the performance when
there are no multibyte characters, always try the fast path first, but
check the input for multibyte characters as we go. The check on each
byte adds some overhead, but it's close enough. And to compensate, the
find_word() function no longer needs to count the characters.

This fixes one small bug in make_trigrams(): in the multibyte
codepath, it peeked at the byte just after the end of the input
string. When compiled with IGNORECASE, that was harmless because there
is always a NUL byte or blank after the input string. But with
!IGNORECASE, the call from generate_wildcard_trgm() doesn't guarantee
that.

Backpatch to v18, but no further. In previous versions lower-casing was
done character by character, and thus the assumption that lower-casing
doesn't change the character length was valid. That was changed in v18,
commit fb1a18810f.

Security: CVE-2026-2007
Reviewed-by: Noah Misch <[email protected]>

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/54598670fe0a191f49848d1a1a8ab09d99616e71
Author: Heikki Linnakangas <[email protected]>

Modified Files
--------------
contrib/pg_trgm/trgm_op.c | 116 ++++++++++++++++++++++++++--------------------
1 file changed, 67 insertions(+), 49 deletions(-)

Reply via email to