Thanks Zsolt for the review.

>
> The code isn't easy to reason about like this, it relies on specific
> details of the outer loop, which was only mentioned in the email
> itself.
>
> This should be also explained in a comment and the commit message, or
> maybe instead of the current way, the loop could work similarly like
> how another loop uses an afterescape flag in do_like_escape (in the
> same file), that form seems less fragile.
>

I have changed the code to use an 'afterescape' flag like in
'do_like_escape'. I also realized that 'do_like_escape' uses NextChar
to handle multibyte encodings. So I changed the byte by byte copy to
use NextChar and then copy the whole character. I think byte-by-byte
copying should be enough for most cases, but if an encoding has '\' as
second or third byte, that might not work.

This copying can also be done with CopyAdvChar, as 'do_like_escape'
does, but that macro is not defined for all cases. So for the time
being, I just used NextChar and copied the character myself. We can
also define CopyAdvChar and ust it here for the code to be consistent
across functions.

Let me know your thoughts on the above approaches.

Regards,
Nitin Motiani
Google
From 2357d7fd2cb5528c707b816ae466eb08a3654f05 Mon Sep 17 00:00:00 2001
From: Nitin Motiani <[email protected]>
Date: Thu, 14 May 2026 10:49:54 +0000
Subject: [PATCH v2] Fix LIKE matching with nondeterministic collations and
 backslashes

Commit 85b7efa1cd added support for LIKE with nondeterministic
collations, but it included a bug in the unescaping logic for pattern
partitions. When the pattern contained a literal backslash (which is
represented as '\\' in the internal pattern), the code would skip both
backslashes, resulting in an incorrect match failure against the
original text.

This fix ensures that an escape backslash correctly causes the following
character to be copied literally into the subpattern before comparison.

A few regression tests are added to verify the fix and prevent future
regressions.

Reported-by: b/19474 on pgsql-bugs
---
 src/backend/utils/adt/like_match.c            | 37 ++++++++++++++++---
 .../regress/expected/collate.icu.utf8.out     | 25 +++++++++++++
 src/test/regress/sql/collate.icu.utf8.sql     |  6 +++
 3 files changed, 62 insertions(+), 6 deletions(-)

diff --git a/src/backend/utils/adt/like_match.c b/src/backend/utils/adt/like_match.c
index f5f72b82e21..0779bdfafbf 100644
--- a/src/backend/utils/adt/like_match.c
+++ b/src/backend/utils/adt/like_match.c
@@ -252,14 +252,39 @@ MatchText(const char *t, int tlen, const char *p, int plen, pg_locale_t locale)
 			if (found_escape)
 			{
 				char	   *b;
+				const char *c = p;
+				const char *start;	/* used in the loop whenever we are copying a
+									 * multibyte character */
+				int			clen = p1 - p;
+				bool		afterescape = false;
 
-				b = buf = palloc(p1 - p);
-				for (const char *c = p; c < p1; c++)
+				b = buf = palloc(clen);
+
+				/*
+				 * Remove occurences of a single '\'. And if we have a '\\',
+				 * keep one '\'.
+				 */
+				while (clen > 0)
 				{
-					if (*c == '\\')
-						;
-					else
-						*(b++) = *c;
+					if (*c == '\\' && !afterescape)
+					{
+						afterescape = true;
+						NextByte(c, clen);
+						continue;
+					}
+
+					/*
+					 * Copy the entire character (1-4 bytes) and advance. This
+					 * ensures we stay aligned on character boundaries for
+					 * multibyte encodings.
+					 */
+					start = c;
+
+					NextChar(c, clen);
+					while (start < c)
+						*(b++) = *(start++);
+
+					afterescape = false;
 				}
 
 				subpat = buf;
diff --git a/src/test/regress/expected/collate.icu.utf8.out b/src/test/regress/expected/collate.icu.utf8.out
index 04e2f6df037..a55d7237500 100644
--- a/src/test/regress/expected/collate.icu.utf8.out
+++ b/src/test/regress/expected/collate.icu.utf8.out
@@ -2741,6 +2741,31 @@ SELECT U&'\0061\0308bc' LIKE U&'_\00e4bc' COLLATE ignore_accents;
 -- escape character at end of pattern
 SELECT 'foox' LIKE 'foo\' COLLATE ignore_accents;
 ERROR:  LIKE pattern must not end with escape character
+-- literal backslash with nondeterministic collation (bug #19474)
+SELECT 'back\slash' COLLATE ignore_accents LIKE 'back\slash%' ESCAPE '#';
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT 'aäb' COLLATE ignore_accents LIKE 'a#äb' ESCAPE '#' AS multibyte_escape;
+ multibyte_escape 
+------------------
+ t
+(1 row)
+
+SELECT 'a\äb' COLLATE ignore_accents LIKE 'a\äb%' ESCAPE '#' AS backslash_multibyte;
+ backslash_multibyte 
+---------------------
+ t
+(1 row)
+
+SELECT 'a\b%c' COLLATE ignore_accents LIKE 'a#\b#%%c' ESCAPE '#' AS mixed_escapes;
+ mixed_escapes 
+---------------
+ t
+(1 row)
+
 -- foreign keys (mixing different nondeterministic collations not allowed)
 CREATE TABLE test10pk (x text COLLATE case_sensitive PRIMARY KEY);
 CREATE TABLE test10fk (x text COLLATE case_insensitive REFERENCES test10pk (x) ON UPDATE CASCADE ON DELETE CASCADE);  -- error
diff --git a/src/test/regress/sql/collate.icu.utf8.sql b/src/test/regress/sql/collate.icu.utf8.sql
index 18c47e6e05a..ed07a702df8 100644
--- a/src/test/regress/sql/collate.icu.utf8.sql
+++ b/src/test/regress/sql/collate.icu.utf8.sql
@@ -960,6 +960,12 @@ SELECT U&'\0061\0308bc' LIKE U&'_\00e4bc' COLLATE ignore_accents;
 -- escape character at end of pattern
 SELECT 'foox' LIKE 'foo\' COLLATE ignore_accents;
 
+-- literal backslash with nondeterministic collation (bug #19474)
+SELECT 'back\slash' COLLATE ignore_accents LIKE 'back\slash%' ESCAPE '#';
+SELECT 'aäb' COLLATE ignore_accents LIKE 'a#äb' ESCAPE '#' AS multibyte_escape;
+SELECT 'a\äb' COLLATE ignore_accents LIKE 'a\äb%' ESCAPE '#' AS backslash_multibyte;
+SELECT 'a\b%c' COLLATE ignore_accents LIKE 'a#\b#%%c' ESCAPE '#' AS mixed_escapes;
+
 -- foreign keys (mixing different nondeterministic collations not allowed)
 CREATE TABLE test10pk (x text COLLATE case_sensitive PRIMARY KEY);
 CREATE TABLE test10fk (x text COLLATE case_insensitive REFERENCES test10pk (x) ON UPDATE CASCADE ON DELETE CASCADE);  -- error
-- 
2.54.0.563.g4f69b47b94-goog

Reply via email to