Re: [PATCHES] ilike multi-byte pattern cache

2007-09-24 Thread ITAGAKI Takahiro

Andrew Dunstan [EMAIL PROTECTED] wrote:

 The attached patch implements a one-value pattern cache for the 
 multi-byte encoding case for ILIKE. This reduces calls to lower() by 
 (50% -1) in the common case where the pattern is a constant.  My own 
 testing and Guillaume Smet's show that this cuts roughly in half the 
 performance penalty we inflicted by using lower() in that case.

It might be a better solution to create the new text type 'lower_text'
and replace 'text ILIKE text' to 'text ILIKE lower_text'. The converter
function 'lower' is marked as immutable, planner can evaluate it just
one time in planning if the right-hand side is a constant expression.

Here is a pseudo-code for the above.

CREATE TYPE lower_text (INPUT = lower, ... );
CREATE CAST (text AS lower_text)
WITH FUNCTION lower(text) AS IMPLICIT;
CREATE OPERATOR ILIKE (
LEFTARG = text,
RIGHTARG = lower_text,
);

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PATCHES] ilike multi-byte pattern cache

2007-09-22 Thread Tom Lane
Andrew Dunstan [EMAIL PROTECTED] writes:
 The attached patch implements a one-value pattern cache for the 
 multi-byte encoding case for ILIKE. This reduces calls to lower() by 
 (50% -1) in the common case where the pattern is a constant.  My own 
 testing and Guillaume Smet's show that this cuts roughly in half the 
 performance penalty we inflicted by using lower() in that case.

 Is this sufficiently low risk to sneak into 8.3?

This seems awfully ugly ... and considering that you don't get to
avoid lower() on the data side, it seems pretty dubious that it
buys very much percentagewise.  It would also be a net loss for
non-constant patterns, which are by no means unheard of --- or even
two constant patterns used in the same query.

We've lived with this in 8.2 without much complaint.  I think we can
let it go until we think of a better solution.  To my mind this is
all tied up in the problem of handling locales in a better fashion
--- a lot of the inefficiency of lower() is due to having a poor
impedance match to the libc locale-related functions, and might be
eliminated if we had locale support with better APIs.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PATCHES] ilike multi-byte pattern cache

2007-09-22 Thread Andrew Dunstan



Tom Lane wrote:

Andrew Dunstan [EMAIL PROTECTED] writes:
  
The attached patch implements a one-value pattern cache for the 
multi-byte encoding case for ILIKE. This reduces calls to lower() by 
(50% -1) in the common case where the pattern is a constant.  My own 
testing and Guillaume Smet's show that this cuts roughly in half the 
performance penalty we inflicted by using lower() in that case.



  

Is this sufficiently low risk to sneak into 8.3?



This seems awfully ugly ... and considering that you don't get to
avoid lower() on the data side, it seems pretty dubious that it
buys very much percentagewise.  It would also be a net loss for
non-constant patterns, which are by no means unheard of --- or even
two constant patterns used in the same query.
  


The cost of using lower() is demonstrably high. Even on a very small 
pattern the speedup is easily measurable.


The cost of the patch is effectively 1 call to strcmp() per call and 2 
calls to memcpy() per cache miss, which should be quite cheap.



We've lived with this in 8.2 without much complaint.  I think we can
let it go until we think of a better solution.  To my mind this is
all tied up in the problem of handling locales in a better fashion
--- a lot of the inefficiency of lower() is due to having a poor
impedance match to the libc locale-related functions, and might be
eliminated if we had locale support with better APIs.


  


Well, we have a complaint now :-( This was aimed at being some temporary 
relief rather than a long term fix. I guess Guillaume can use this as a 
patch if he wants.


I agree that we need better locale APIs.

cheers

andrew

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly