Andrew Dunstan <[EMAIL PROTECTED]> writes: > OK, here is a patch that I think incorporates all the ideas discussed > (including part of Mark Mielke's suggestion about optimising %_). There > is now no special treatment of UTF8 other than its use of a faster > NextChar macro.
Looks mostly pretty good. I would suggest replacing tests "tlen == 0" and "plen == 0" with "<= 0", just so the code doesn't go completely insane if presented with invalidly-encoded data that causes it to step beyond the end of data. Also, this comment is not really good enough: > ! /* > ! * It is safe to use NextByte instead of NextChar here, even for > ! * multi-byte character sets, because we are not following > ! * immediately after a wildcard character. > ! */ > ! NextByte(t, tlen); > ! NextByte(p, plen); > } I'd suggest adding something like "If we are in the middle of a multibyte character, we must already have matched at least one byte of the character from both text and pattern; so we cannot get out-of-sync on character boundaries. And we know that no backend-legal encoding allows ASCII characters such as '%' to appear as non-first bytes of characters, so we won't mistakenly detect a new wildcard." regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org