Re: [PATCHES] UTF8MatchText

Andrew Dunstan Thu, 17 May 2007 09:58:47 -0700


I wrote:

ISTM we should generate all these match functions from one body ofcode plus some #define magic.
As I understand it, we have three possible encoding switches: SingleByte, UTF8 and other Multi Byte Charsets, and two possible casesettings: case Sensitive and Case Insensitive. That would make for atotal of six functions, but in the case of both UTF8 and other MBCS wedon't need a special Case Insensitive function - instead we downcaseboth the string and the pattern and then use the Case Sensitivefunction. That leaves a total of four functions.
What is not clear to me is why the UTF8 optimisation work, and why itdoesn't apply to other MBCS. At the very least we need a comment on that.
I also find the existing function naming convention somewhat annoying- having foo() and MB_foo() is less than clear. I'd rather haveSB_foo() and MB_foo(). That's not your fault, of course.
If you supply me with some explanation on the UTF8 optimisation issue,I'll prepare a revised patch along these lines.

Ok, I have studied some more and I think I understand what's going on.AIUI, we are switching from some expensive char-wise comparisons tocheap byte-wise comparisons in the UTF8 case because we know that inUTF8 the magic characters ('_', '%' and '\') aren't a part of any othercharacter sequence. Is that putting it too mildly? Do we need strongerconditions than that? If it's correct, are there other MBCS for whichthis is true?


cheers

andrew



---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Re: [PATCHES] UTF8MatchText

Reply via email to