hello postgresql experts --

i have a strange row estimate problem, which unfortunately i have trouble 
reproducing for anything but very large tables which makes boiling down to a 
simple example hard.  i'm using version 9.1.1, all tables have had analyze run 
on them.

here's the example : i have a large table (billions of rows) with a five column 
primary key and a sixth value column.  for confidentiality purposes i can't 
reproduce the exact schema here, but it is essentially

create table bigtable (
  id1 integer not null,
  id2 date not null,
  id3 integer not null,
  id4 time not null,
  id5 integer not null,
  value real not null,
  primary key (id1, id2, id3, id4, id5)
);

for various reasons there is only one id1 in the table right now, though in the 
future there will be more; also the primary key was added after table creation 
with alter table, though that shouldn't matter.

i need to select out a somewhat arbitrary collection of rows out of bigtable.  
to do so i generate a temporary table 

create table jointable (
  id1 integer not null,
  id2 date not null,
  id3 integer not null,
  id4 time not null,
  id5 integer not null
);

and then perform a join against this table.

if jointable doesn't have many rows, the planner picks a nested loop over 
jointable and a primary key lookup on bigtable.  in the following, for 
expository purposes, jointable has 10 rows.  we can see the planner knows this.

explain select * from bigtable join jointable using (id1, id2, id3, id4, id5);

 Nested Loop  (cost=0.00..6321.03 rows=145 width=28)
   ->  Seq Scan on jointable  (cost=0.00..1.10 rows=10 width=24)
   ->  Index Scan using bigtable_pkey on bigtable  (cost=0.00..631.97 rows=1 
width=28)
         Index Cond: ((id1 = jointable.id1) AND (id2 = jointable.id2) AND (id3 
= jointable.id3) AND (id4 = jointable.id4) AND (vid = foo.vid))
(4 rows)

as you increase the number of rows in jointable, the planner switches to a sort 
+ merge.  in this case jointable has roughly 2 million rows.

 Merge Join  (cost=727807979.29..765482193.16 rows=18212633 width=28)
  Merge Cond: ((bigtable.id1 = jointabe.id1) AND (bigtable.id2 = jointable.id2) 
AND (bigtable.id3 = jointable.id3) AND (bigtable.id4 = bigtable.id4) AND 
(bigtable.id5 = bigtable.id5))
   ->  Sort  (cost=727511795.16..735430711.00 rows=3167566336 width=28)
         Sort Key: bigtable.id3, bigtable.id1, bigtable.id2, bigtable.id4, 
bigtable.id5
         ->  Seq Scan on bigtable  (cost=0.00..76085300.36 rows=3167566336 
width=28)
   ->  Materialize  (cost=295064.70..305399.26 rows=2066911 width=24)
         ->  Sort  (cost=295064.70..300231.98 rows=2066911 width=24)
               Sort Key: jointable.id3, jointable.id1, jointable.id2, 
jointable.id4, jointable.id5
               ->  Seq Scan on jointable  (cost=0.00..35867.11 rows=2066911 
width=24)
(9 rows)

the choice of sort + merge is really bad here, given the size of bigtable (3 
billion rows and counting.)

some questions :

1 - first off, why isn't the sort happening on the primary key, so that 
bigtable does not have to be sorted?

2 - more importantly, since we are joining on the primary key, shouldn't the 
row estimate coming out of the join be limited by the number of rows in 
jointable?

for example, it is strange to see that if i put in a non-limiting limit 
statement (something bigger than the number of rows in jointable) it switches 
back to a nested loop + index scan :

explain select * from bigtable join jointable using (id1, id2, id3, id4, id5) 
limit 2500000;

 Limit  (cost=0.00..178452647.11 rows=2500000 width=28)
   ->  Nested Loop  (cost=0.00..1306127545.35 rows=18297957 width=28)
         ->  Seq Scan on jointable  (cost=0.00..35867.11 rows=2066911 width=24)
         ->  Index Scan using bigtable_pkey on bigtable  (cost=0.00..631.88 
rows=1 width=28)
               Index Cond: ((id1 = jointable.id1) AND (id2 = jointable.id2) AND 
(id3 = jointable.id3) AND (id4 = jointable.id4) AND (id5 = jointable.id5))
(5 rows)

am i not understanding the query planner, or is this a known issue in the query 
planner, or have i stumbled onto something amiss?  unfortunately any synthetic 
examples i was able to make (up to 10 million rows) did not exhibit this 
behavior, which makes it hard to test.

best regards, ben



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Reply via email to