Re: [PERFORM] Large CASE-statement is pretty slow?

2004-03-15 Thread Tom Lane
Arjen van der Meijden <[EMAIL PROTECTED]> writes:
> [ huge CASE is pretty slow ]

I did some profiling of the test case that Arjen was kind enough to send
me.  It seems there are two distinct problems.  One is that the parser
uses repeated lappend()'s to construct the list of CASE arms; this
makes building the structure O(N^2) in the number of arms.  (If you
simply try to EXPLAIN the query, you find out that the parse time is
about a third of the run time :-( ... and 90% of that is spent inside
nconc() which is the guts of lappend.)  This problem is slated to be
fixed by Neil Conway's upcoming rewrite of the list support, which will
convert lappend into a constant-time operation.

The other difficulty is that the evaluation machinery for arithmetic
expressions has a lot of overhead.  The profile run shows:

  %   cumulative   self  self total   
 time   seconds   secondscalls   s/call   s/call  name
 38.15 41.9241.92   229646 0.00 0.00  nconc
 21.76 65.8423.92 199054454 0.00 0.00  ExecEvalExpr
 11.38 78.3412.501 0.00 0.00  ExecEvalCase
  8.43 87.61 9.27 66348151 0.00 0.00  ExecEvalFuncArgs
  8.12 96.54 8.93 66348151 0.00 0.00  ExecMakeFunctionResult
  2.96 99.78 3.25 66348151 0.00 0.00  ExecEvalVar
  1.23101.14 1.3610058 0.00 0.00  AllocSetCheck
  1.23102.49 1.35 66348151 0.00 0.00  ExecEvalOper
  1.12103.72 1.2476537 0.00 0.00  OpernameGetCandidates
  0.85104.66 0.94 66424693 0.00 0.00  int4eq

(Note: I added LIMIT 1 to the query so that the CASE is only carried
out 1 times, rather than nearly 9 times as in Arjen's original
test case.  Without this, the call-counter overflows for ExecEvalExpr,
and the time percentages seem to get confused.  One must recognize
though that this overstates the parser overhead compared to the original
test case.)

Clearly the useful work (int4eq) is getting a bit swamped by the ExecEval
mechanism.  I have some ideas about reducing the overhead, which I'll
post to the pghackers list in a bit.

regards, tom lane

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Large CASE-statement is pretty slow?

2004-03-15 Thread Arjen van der Meijden
Greg Stark wrote:

Arjen van der Meijden <[EMAIL PROTECTED]> writes:

Was this the select with the CASE, or the update?
It was just the select to see how long it'd take. I already anticipated 
it to be possibly a "slow query", so I only did the select first.

Best regards,

Arjen van der Meijden



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Large CASE-statement is pretty slow?

2004-03-15 Thread Greg Stark

Arjen van der Meijden <[EMAIL PROTECTED]> writes:

> Well, I have discarded this type of query as "too inefficient" and found a
> better way

Loading the mapping into a table with an index and doing an update using
"from" to do a join seems likely to end up being the most efficient method.
Postgres would probably not even bother with the index and do a hash join.


-- 
greg


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Large CASE-statement is pretty slow?

2004-03-15 Thread Greg Stark
Arjen van der Meijden <[EMAIL PROTECTED]> writes:

> 
> Of course I wanted to know how long it'd take on postgresql, selecting the
> pkey-field only (without the case) took also some 0.7 seconds (the entire table
> may have been more).
> But the CASE-version took 9026139.201 ms, i.e. over 9000 seconds about 8 times
> slower than MySQL.

Was this the select with the CASE, or the update?

If you did the update and have lots of foreign key references to the table
then every record that's updated forces a check to make sure there are no
references to that record (or an update if it's ON UPDATE CASCADE). If there
are no indexes on the referencing table columns that will be very slow.

-- 
greg


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Large CASE-statement is pretty slow?

2004-03-14 Thread Arjen van der Meijden
Tom Lane wrote:

Arjen van der Meijden <[EMAIL PROTECTED]> writes:

Anyway, I was looking into the usefullness of a INSERT INTO newtable 
SELECT field, field, CASE pkey WHEN x1 THEN y1 WHEN x2 THEN y2 etc END 
FROM oldtable


The resulting select was about 1.7MB of query-text, mostly composed of 
the CASE-statement.


Hm, you mean one single SELECT, one single CASE?  How many WHEN clauses
exactly?  Exactly what did a typical clause of the CASE look like?
Yes, one SELECT-query with one single CASE-statement.
The CASE-statement had the simple-case-structure like:
SELECT CASE UserID WHEN 1 THEN 1 WHEN 34 THEN 2 ... etc
I noticed, by the way, that the ordering is on the THEN y parameter, the 
x parameter (WHEN x THEN y) is "more or less increasing".

But some numbers:
The table I did my tests on has 88291 rows, I did the select on the 
integer primary key, so the CASE was the only column in the select.
I'm running the query again on a table that has only the primary key of 
my original table and it seems to be as slow.
I'm not really sure how many WHEN's there are in that CASE, but it is 
supposed to be a relocation of all primary key-values to some other 
value, so it will contain some number close to that 88291.

I wouldn't be too surprised to find some bit of code that's O(N^2) in
the number of arms of the CASE, or something like that; it's not an area
that we've ever felt the need to optimize.  But I'd like a fairly
specific test case before trying to look into it.
Well, I have discarded this type of query as "too inefficient" and found 
a better way, so don't feel the need to optimize it just because I 
noticed it is slow with very large CASEs. Although CASEs with a few 
hundred WHENs wont be that uncommon and might improve a bit as well?

I can send you the "primary key only"-table and the query off list if 
you want to. That won't make me violate any privacy rule and is probably 
a good test case?

Best regards,

Arjen van der Meijden



---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] Large CASE-statement is pretty slow?

2004-03-14 Thread Tom Lane
Arjen van der Meijden <[EMAIL PROTECTED]> writes:
> Anyway, I was looking into the usefullness of a INSERT INTO newtable 
> SELECT field, field, CASE pkey WHEN x1 THEN y1 WHEN x2 THEN y2 etc END 
> FROM oldtable

> The resulting select was about 1.7MB of query-text, mostly composed of 
> the CASE-statement.

Hm, you mean one single SELECT, one single CASE?  How many WHEN clauses
exactly?  Exactly what did a typical clause of the CASE look like?

I wouldn't be too surprised to find some bit of code that's O(N^2) in
the number of arms of the CASE, or something like that; it's not an area
that we've ever felt the need to optimize.  But I'd like a fairly
specific test case before trying to look into it.

regards, tom lane

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


[PERFORM] Large CASE-statement is pretty slow?

2004-03-14 Thread Arjen van der Meijden
Hi list,

I was more or less toying with an idea for a project I have, which 
includes renumbering a primary key (don't ask, it's necessary :/ )

Anyway, I was looking into the usefullness of a INSERT INTO newtable 
SELECT field, field, CASE pkey WHEN x1 THEN y1 WHEN x2 THEN y2 etc END 
FROM oldtable

The resulting select was about 1.7MB of query-text, mostly composed of 
the CASE-statement. So I discarded that idea, I still wanted to know how 
much time it would take on my database (MySQL) and found it to take 
about 1100 seconds, in contrast to simply selecting the data, which'd 
take about 0.7 seconds orso... The table I tested this on is about 30MB.

Of course I wanted to know how long it'd take on postgresql, selecting 
the pkey-field only (without the case) took also some 0.7 seconds (the 
entire table may have been more).
But the CASE-version took 9026139.201 ms, i.e. over 9000 seconds about 8 
times slower than MySQL.

What I'm wondering about:
Although I was not expecting Postgresql to heavily beat MySQL, I was 
surprised to see it so much slower. Is the CASE-statement in Postgresql 
that inefficient? Or is it simply not very scalable (i.e. don't try to 
have 10 cases like I did)?

The database is a lightly optimised gentoo-compile of 7.4.2, the 
mysql-version was 4.0.18 in case anyone wanted to know that.

Best regards,

Arjen van der Meijden

PS, don't try to "help improve the query" I discarded the idea as too 
inefficient and went along with a simple left join to get the "new pkey" 
out of a temporary table ;)



---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]