On Fri, Nov 15, 2024 at 11:42 PM Peter Eisentraut <pe...@eisentraut.org> wrote:
>
> On 15.11.24 05:26, jian he wrote:
> > /*
> > * Now build a substring of the text and try to match it against
> > * the subpattern.  t is the start of the text, t1 is one past the
> > * last byte.  We start with a zero-length string.
> > */
> > t1 = t
> > t1len = tlen;
> > for (;;)
> > {
> > int cmp;
> > CHECK_FOR_INTERRUPTS();
> > cmp = pg_strncoll(subpat, subpatlen, t, (t1 - t), locale);
> >
> > select '.foo.' LIKE '_oo' COLLATE ign_punct;
> > pg_strncoll's iteration of the first 4 argument values.
> > oo      2       foo. 0
> > oo      2       foo. 1
> > oo      2       foo. 2
> > oo      2       foo. 3
> > oo      2       foo. 4
> >
> > seems there is a shortcut/optimization.
> > if subpat don't have wildcard(percent sign, underscore)
> > then we can have less pg_strncoll calls?
>
> How would you do that?  You need to try all combinations to find one
> that matches.
>

we can optimize when trailing (last character) is not  wildcards.

SELECT 'Ha12foo' LIKE '%foo' COLLATE ignore_accents;
within the for loop
for(;;)
{
int            cmp;
CHECK_FOR_INTERRUPTS();
....
}

pg_strncoll comparison will become
Ha12foo    foo
a12foo      foo
12foo        foo
2foo          foo
foo            foo

it's safe because in MatchText we have:
else if (*p == '%')
{
while (tlen > 0)
{
    if (GETCHAR(*t, locale) == firstpat || (locale && !locale->deterministic))
    {
        int            matched = MatchText(t, tlen, p, plen, locale);
        if (matched != LIKE_FALSE)
            return matched; /* TRUE or ABORT */
    }
    NextChar(t, tlen);
}
}

please check attached.
> > minimum case to trigger error within GenericMatchText
> > since no related tests.
> > create table t1(a text collate case_insensitive, b text collate "C");
> > insert into t1 values ('a','a');
> > select a like b from t1;
>
> This results in
>
> ERROR:  42P22: could not determine which collation to use for LIKE
> HINT:  Use the COLLATE clause to set the collation explicitly.
>
> which is the expected behavior.
>
sorry, didn't mention it clearly, i mean we can add it to the regress test.

Attachment: v8-0001-LIKE-with-nondeterministic-collations-no-trail.no-cfbot
Description: Binary data

Reply via email to