Re: [SQL] Wrong query plan when using a left outer join

Feike Steenbergen Tue, 17 Jan 2012 10:54:17 -0800

> BTW, add a foreign key and index on handhistory_plain.hand_id (unless> you 
> have it already).
It's there already:


feiketracker=# \d+ handhistory_plain;
            Table "stage.handhistory_plain"
 Column  |  Type   | Modifiers | Storage  | Description
---------+---------+-----------+----------+-------------
 hand_id | integer | not null  | plain    |
 history | text    | not null  | extended |
Indexes:
    "handhistory_plain_pkey" PRIMARY KEY, btree (hand_id) CLUSTER
Foreign-key constraints:
    "handhistory_plain_hand_id_fkey" FOREIGN KEY (hand_id) REFERENCES
hand_meta(hand_id)

> BTW2, if you really don't care on handhistory you can just use
> original query with no join.

Well, sometimes I do, sometimes I don't. For easier application access
I wanted to create a view that joins both these tables together:
easier application design and better performance, as the analyzer
should know best when not to use the handhistory_plain table.


The design is as follows:

hand_meta - holds all metadata for a pokerhand
handhistory_plain holds the history for a pokerhand

hand_meta is going to be used the most, it is around 165 bytes per tuple
handhistory_plain is not going to be used often (it is there as a
reference); it is around 5000 bytes per tuple.

They both hold the same column as primary key, handhistory_plain holds
a fraction of the tuples of hand_meta, the split was only made to make
sure the processed data (hand_meta) is smaller in size and should
therefore require less I/O and thus increase performance.

I'm not sure what to make of:
> imagine that the view on the right side of join has some side effects.
I can see some side effects may occur, but as it is a left join, the
left hand side will always be part of the returning set (there is no
where clause), so the index should be used.
Even though I don't understand, you seem to be right, a natural join
is 30 times faster:

feiketracker=# explain analyze select max(hand_id) from hand_meta left
join handhistory_plain using(hand_id);
                                                                 QUERY
PLAN
---------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=10000049261.00..10000049261.01 rows=1 width=4)
(actual time=31179.238..31179.241 rows=1 loops=1)
   ->  Seq Scan on hand_meta  (cost=10000000000.00..10000043062.40
rows=2479440 width=4) (actual time=0.131..16039.886 rows=2479440
loops=1)
 Total runtime: 31179.725 ms
(3 rows)

Time: 31185.088 ms

feiketracker=# explain analyze select max(hand_id) from hand_meta join
handhistory_plain using(hand_id);

     QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=53043.61..53043.62 rows=1 width=4) (actual
time=962.242..962.245 rows=1 loops=1)
   ->  Nested Loop  (cost=0.00..53029.93 rows=5470 width=4) (actual
time=0.400..920.582 rows=5470 loops=1)
         ->  Index Scan using handhistory_plain_pkey on
handhistory_plain  (cost=0.00..14494.27 rows=5470 width=4) (actual
time=0.215..101.177 rows=5470 loops=1)
         ->  Index Scan using hand_meta_pkey on hand_meta
(cost=0.00..7.03 rows=1 width=4) (actual time=0.100..0.115 rows=1
loops=5470)
               Index Cond: (hand_meta.hand_id = handhistory_plain.hand_id)
 Total runtime: 962.968 ms


> try to experiment with SET enable_seqscan TO false; - and see what happens.
Didn't make a difference; therefore I think postgres determines it is
unable to use the index, is that correct?


Thank you for now: I'll use the inner join (or natural join in this
case) for this specific view

-- 
Sent via pgsql-sql mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql

Re: [SQL] Wrong query plan when using a left outer join

Reply via email to