Re: [PATCHES] UTF8MatchText

Andrew Dunstan Sun, 20 May 2007 12:47:40 -0700


Tom Lane wrote:

On the strength of this analysis, shouldn't we drop the separate
UTF8 match function and just use SB_MatchText for UTF8?

Possibly - IIRC I looked at that and there was some reason I didn't, butI'll look again.

It strikes me that we may be overcomplicating matters in another way
too.  If you believe that the %-scan code is now the bottleneck, that
is, the key loop is where we have pattern '%foo' and we are trying to
match 'f' to each successive data position, then you should be bothered
that SB_MatchTextIC is applying tolower() to 'f' again for each data
character.  Worst-case we could have O(N^2) applications of tolower()
during a match.  I think there's a fair case to be made that we should
get rid of SB_MatchTextIC and implement *all* the case-insensitive
variants by means of an initial lower() call.  This would leave us with
just two match functions and allow considerable unification of the setup
logic.

Yeah, quite possibly. I'm also wondering if we are wasting effortdowncasing what will in most cases be the same pattern over and overagain. Maybe we need to look at memoizing that somehow, or at least testto see if that would be a gain.


We're getting quite a long way from the original patch :-)

cheers

andrew

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

              http://www.postgresql.org/docs/faq

Re: [PATCHES] UTF8MatchText

Reply via email to