Re: [HACKERS] like/ilike improvements

Andrew Dunstan Thu, 24 May 2007 20:25:13 -0700


Tom Lane wrote:

Andrew Dunstan <[EMAIL PROTECTED]> writes:

Tom Lane wrote:

You have to be on a first byte before you can meaningfully apply
NextChar, and you have to use NextChar or else you don't count
characters correctly (eg "__" must match 2 chars not 2 bytes).

Yes, I agree completely. However it looks to me like IsFirstByte will infact always be true when we get to call NextChar for matching "_" for UTF8.


If that's true, the patch is failing to achieve its goal of treating %
bytewise ...

Let's back up. % processing works by looking for a place in the textthat might match what follows % in the pattern, and then calling itselfrecursively. For UTF8, if what follows % is _, it does that search byrepeatedly calling NextChar - otherwise it calls NextByte. But if we'renot processing a wildcard we have to match an actual complete UTF8 char,so the fact that we proceed byte-wise won't get us out of sync. wheneverwe happen to encounter an _. We can't rely on that process for othermulti-byte charsets because the suffix of one char might be the prefixof another, so we could get false matches. That can't happen with UTF8.


cheers

andrew

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

              http://archives.postgresql.org

Re: [HACKERS] like/ilike improvements

Reply via email to