Re: [PATCHES] UTF8MatchText

Andrew Dunstan Sun, 20 May 2007 06:31:28 -0700


Dennis Bjorklund wrote:

Tom Lane skrev:
You could imagine trying to do
% a byte at a time (and indeed that's what I'd been thinking it did)
but that gets you out of sync which breaks the _ case.
It is only when you have a pattern like '%_' when this is a problemand we could detect this and do byte by byte when it's not. Now wecheck (*p == '\\') || (*p == '_') in each iteration when we scan overcharacters for '%', and we could do it once and have different loopsfor the two cases.
Other than this part that I think can be optimized I don't seeanything wrong with the idea behind the patch. To make the '%' casefast might be an important optimization for a lot of use cases. It'snot uncommon that '%' matches a bigger part of the string than therest of the pattern.

Are you sure? The big remaining char-matching bottleneck will surely bein the code that scans for a place to start matching a %. But that'sexactly where we can't use byte matching for cases where the charsetmight include AB and BA as characters - the pattern might contain %BAand the string AB. However, this isn't a danger for UTF8, which leads meto think that we do indeed need a special case for UTF8, but for adifferent improvement from that proposed in the original patch. I'llpost an updated patch shortly.


cheers

andrew

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
      subscribe-nomail command to [EMAIL PROTECTED] so that your
      message can get through to the mailing list cleanly

Re: [PATCHES] UTF8MatchText

Reply via email to