Tom Lane wrote:
Andrew Dunstan <[EMAIL PROTECTED]> writes:
Tom Lane wrote:
Except that the entire point of this patch is to dumb down NextChar to
be the same as NextByte for UTF8 strings.

That's not what I see in (what I think is) the latest submission, which includes this snippet:

[ scratches head... ]  OK, then I think I totally missed what this patch
is trying to accomplish; because this code looks just the same as the
existing multibyte-character operations.  Where does the performance
improvement come from?

                        

That's what bothered me. The trouble is that we have so much code that looks *almost* identical.

From my WIP patch, here's where the difference appears to be - note that UTF8 branch has two NextByte calls at the bottom, unlike the other branch:


#ifdef UTF8_OPT
       /*
        * UTF8 is optimised to do byte at a time matching in most cases,
        * thus saving expensive calls to NextChar.
        *
        * UTF8 has disjoint representations for first-bytes and
        * not-first-bytes of MB characters, and thus it is
        * impossible to make a false match in which an MB pattern
        * character is matched to the end of one data character
        * plus the start of another.
        * In character sets without that property, we have to use the
        * slow way to ensure we don't make out-of-sync matches.
        */
       else if (*p == '_')
       {
           NextChar(t, tlen);
           NextByte(p, plen);
           continue;
       }
       else if (!BYTEEQ(t, p))
       {
           /*
            * Not the single-character wildcard and no explicit match? Then
            * time to quit...
            */
           return LIKE_FALSE;
       }

       NextByte(t, tlen);
       NextByte(p, plen);
#else
       /*
        * Branch for non-utf8 multi-byte charsets and also for single-byte
* charsets which don't gain any benefit from the above optimisation.
        */
else if ((*p != '_') && !CHAREQ(t, p))
       {
           /*
            * Not the single-character wildcard and no explicit match? Then
            * time to quit...
            */
           return LIKE_FALSE;
       }

       NextChar(t, tlen);
       NextChar(p, plen);

#endif /* UTF8_OPT */


cheers

andrew



---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

              http://www.postgresql.org/docs/faq

Reply via email to