[HACKERS] Sort of a planner regression 8.3-8.4 (due to EXISTS inlining) and related stuff

2010-05-16 Thread Andres Freund
Hi all,

After having received several reports of worse plans on 8.4 compared to 8.3 
and recently once more one from 'vaxerdec' on IRC I tried to investigate the 
difference.
Reducing the (large and ugly, automatically generated queries) to a 
reproducible testcase I ended up with the following pattern:

explain SELECT 1
   
FROM
c
WHERE
EXISTS (
SELECT *   
FROM a
JOIN b USING (b_id)
WHERE b.c_id = c.c_id)
AND c.value = 1;

8.3 planned this to:

  Index Scan using c_value_key on c  (cost=0.00..24.83 rows=1 width=0)
Index Cond: (value = 1)
Filter: (subplan)
SubPlan
  -  Nested Loop  (cost=0.00..16.56 rows=1 width=12)
-  Index Scan using b__c_id on b  (cost=0.00..8.27 rows=1 
width=8)
  Index Cond: (c_id = $0)
-  Index Scan using a__b_id on a  (cost=0.00..8.27 rows=1 
width=8)
  Index Cond: (a.b_id = b.b_id)

Which is quite good for such a kind of query.

From 8.4 onwards this gets planned to 

  Nested Loop Semi Join  (cost=1543.00..7708.29 rows=1 width=0)
Join Filter: (c.c_id = b.c_id)
-  Index Scan using c_value_key on c  (cost=0.00..8.27 rows=1 width=4)
  Index Cond: (value = 1)
-  Hash Join  (cost=1543.00..7075.02 rows=5 width=4)
  Hash Cond: (b.b_id = a.b_id)
  -  Seq Scan on b  (cost=0.00..2164.01 rows=150001 width=8)
  -  Hash  (cost=722.00..722.00 rows=5 width=4)
-  Seq Scan on a  (cost=0.00..722.00 rows=5 width=4)

which is the near-equivalent (with s/Semi/IN/) what 8.3 produces for the above 
query with IN instead of EXISTS.

This kind of plan obviously is horrid.

The behavioral change was introduced in Tom's initial commit to support Semi 
Joins Implement SEMI and ANTI joins in the planner and executor. from 
2008-08-14 (I tried the commits before and after). 
Seeing that 8.3 didn't inline EXISTS but IN and showed the bad plan with IN 
its pretty evident that the inlining is the problem and not the patch itself.

Unsurprisingly 8.4 produces a similar plan to 8.3 if one uses a volatile 
function or a OFFSET 0 as that stops inlining.

Two questions:
1. Is there a reason this cannot be optimized? In the semi join case it 
doesn't seem to be *that* complex to push down qualifiers resulting in a plan 
like:


Nested Loop Semi Join
   - Index Scan using c_value_key on c
Index Cond: (value = 1)
   - Nested Loop
   - Index Scan using b__c_id on b
Index Cond: (b.c_id = c.c_id)
   - Index Scan using a__b_id on a
Index Cond: (a.b_id = b.b_id)

or, a bit more complex:

Nested Loop Semi Join
  -  Nested Loop Semi Join
-  Index Scan using c_value_key on c
  Index Cond: (value = 1)
-  Index Scan using b__c_id on b
  Index Cond: (b.c_id = c.c_id)
  -  Index Scan using a__b_id on a
Index Cond: (a.b_id = b.b_id)


2. I can construct several cases off the top of my head where avoiding 
inlining might yield significantly better plans unless variables are pushed 
down more aggressively into EXISTS/IN. While it would solve this issue I doubt 
that generating a separate path without inlining is viable (code complexity, 
plantime)?


I thinks its pretty annoying to have so much worse plans in 8.4 than earlier 
in relatively obvious queries, but I don't see any good, low-impact fix?

Greetings,

Andres

PS: Tom: Do you like to get directly addressed on bugs/mails like this one 
or not? I am not really sure whats the policy regarding that is on the pg 
mailinglists. Is there one?

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sort of a planner regression 8.3-8.4 (due to EXISTS inlining) and related stuff

2010-05-16 Thread Andres Freund
Testschema:

ROLLBACK;
BEGIN;

CREATE TABLE a (
a_id serial PRIMARY KEY NOT NULL,
b_id integer
);
CREATE INDEX a__b_id ON a USING btree (b_id);


CREATE TABLE b (
b_id serial NOT NULL,
c_id integer
);
CREATE INDEX b__c_id ON b USING btree (c_id);


CREATE TABLE c (
c_id serial PRIMARY KEY NOT NULL,
value integer UNIQUE
);

INSERT INTO b (b_id, c_id)
SELECT g.i, g.i FROM generate_series(1, 5) g(i);

INSERT INTO a(b_id)
SELECT g.i FROM generate_series(1, 5) g(i);

COMMIT;
ANALYZE;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sort of a planner regression 8.3-8.4 (due to EXISTS inlining) and related stuff

2010-05-16 Thread Robert Haas
On Sun, May 16, 2010 at 7:07 PM, Andres Freund and...@anarazel.de wrote:
 Reducing the (large and ugly, automatically generated queries) to a
 reproducible testcase I ended up with the following pattern:

 explain SELECT 1
 FROM
    c
 WHERE
    EXISTS (
        SELECT *
        FROM a
            JOIN b USING (b_id)
        WHERE b.c_id = c.c_id)
    AND c.value = 1;

 8.3 planned this to:

  Index Scan using c_value_key on c  (cost=0.00..24.83 rows=1 width=0)
    Index Cond: (value = 1)
    Filter: (subplan)
    SubPlan
      -  Nested Loop  (cost=0.00..16.56 rows=1 width=12)
            -  Index Scan using b__c_id on b  (cost=0.00..8.27 rows=1
 width=8)
                  Index Cond: (c_id = $0)
            -  Index Scan using a__b_id on a  (cost=0.00..8.27 rows=1
 width=8)
                  Index Cond: (a.b_id = b.b_id)

 Which is quite good for such a kind of query.

 From 8.4 onwards this gets planned to
 [something bad]

I believe this is  a result of a limitation we've discussed
previously, namely, that the planner presently uses a limited,
special-case kludge to consider partial index scans, and the executor
uses another kludge to execute them.

http://archives.postgresql.org/pgsql-hackers/2009-09/msg00525.php
http://archives.postgresql.org/pgsql-hackers/2009-10/msg00994.php
http://archives.postgresql.org/pgsql-hackers/2009-12/msg01755.php

I believe that Tom is planning to fix this for 9.1.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sort of a planner regression 8.3-8.4 (due to EXISTS inlining) and related stuff

2010-05-16 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 I believe this is  a result of a limitation we've discussed
 previously, namely, that the planner presently uses a limited,
 special-case kludge to consider partial index scans, and the executor
 uses another kludge to execute them.

Yeah.  To restore this case to something like what previous versions
did, we need to be able to push an inner-indexscan parameter down
through multiple join levels, which neither the planner nor executor
can deal with at the moment.  I am planning to work on this for 9.1.

It may be worth pointing out that while the current code sucks for the
case where a nestloop-with-inner-indexscan would be the best plan, the
previous code sucked for every other case; because the previous code was
only capable of generating the equivalent of a nestloop join.  We have
to continue down this path in order to get to the place we need to be.
It's too bad that all the work didn't get done in one development cycle,
but sometimes life's like that.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sort of a planner regression 8.3-8.4 (due to EXISTS inlining) and related stuff

2010-05-16 Thread Andres Freund
On Monday 17 May 2010 04:10:46 Tom Lane wrote:
 Robert Haas robertmh...@gmail.com writes:
  I believe this is  a result of a limitation we've discussed
  previously, namely, that the planner presently uses a limited,
  special-case kludge to consider partial index scans, and the executor
  uses another kludge to execute them.
 It may be worth pointing out that while the current code sucks for the
 case where a nestloop-with-inner-indexscan would be the best plan, the
 previous code sucked for every other case; because the previous code was
 only capable of generating the equivalent of a nestloop join.  We have
 to continue down this path in order to get to the place we need to be.
 It's too bad that all the work didn't get done in one development cycle,
 but sometimes life's like that.
Yes, I realize that. Thats why I didnt report it as an actual bug... And its 
way easier to deal with 8.4s deficiency than with the former behaviour.

Thanks,

Andres

PS: I think it lead me to an actual bug, expect a report tomorrow...

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers