Re: [PATCHES] UTF8MatchText

Andrew Dunstan Sun, 20 May 2007 07:13:07 -0700


I wrote:

It is only when you have a pattern like '%_' when this is a problemand we could detect this and do byte by byte when it's not. Now wecheck (*p == '\\') || (*p == '_') in each iteration when we scan overcharacters for '%', and we could do it once and have different loopsfor the two cases.
Other than this part that I think can be optimized I don't seeanything wrong with the idea behind the patch. To make the '%' casefast might be an important optimization for a lot of use cases. It'snot uncommon that '%' matches a bigger part of the string than therest of the pattern.
Are you sure? The big remaining char-matching bottleneck will surelybe in the code that scans for a place to start matching a %. Butthat's exactly where we can't use byte matching for cases where thecharset might include AB and BA as characters - the pattern mightcontain %BA and the string AB. However, this isn't a danger for UTF8,which leads me to think that we do indeed need a special case forUTF8, but for a different improvement from that proposed in theoriginal patch. I'll post an updated patch shortly.


Here is a patch that implements this. Please analyse for possible breakage.

cheers

andrew



---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
      choose an index scan if your joining column's datatypes do not
      match

Re: [PATCHES] UTF8MatchText

Reply via email to