On 07/05/2011 07:26 PM, Clem Dickey wrote:

Updates after belatedly reading the "slow queries" guidelines:

Version: PostgreSQL 8.4.8 on x86_64-redhat-linux-gnu, compiled by GCC gcc (GCC) 4.4.5 20101112 (Red Hat 4.4.5-2), 64-bit

The query has always been slow; the table for this test case is never updated. I don't run VACUUM but do run ANALYZE.

Originally all database config parameters were the default. Since yesterday I have changed
 shared_buffers = 224MB
 effective_cache_size = 1024MB
but seen no change in behavior.

Column | Type | Modifiers
--------+---------+-----------
y | integer | not null
x | integer | not null
k | integer | not null
j | integer | not null
z | integer | not null
Indexes:
"t_pkey" PRIMARY KEY, btree (j, k, x, y, z)

The table population, in pseudocode, is this:
for x in 0..9
for y in 0..9999
for z in 0..29
INSERT INTO t VALUES(y,x,0,0,z)

The query is:

SELECT *
FROM (
SELECT * FROM t GROUP BY j,k,x,z,y
) AS f
NATURAL JOIN t;

The EXPLAIN ANALYZE output is http://explain.depesz.com/s/KGk

Notes on the analysis:
1. I see that the planner estimates that GROUP BY will reduce 300K rows to 30K, a bit odd because every row which the planner could examine is in a unique group. 2. The JOIN is expected to produce one row. I'm not sure how the planner came up with that estimate.

By contrast, placing an intermediate result in a table "u" provides a
result in about 3 seconds:

=> EXPLAIN ANALYZE CREATE TABLE u AS SELECT * FROM t GROUP BY j,k,x,z,y;EXPLAIN ANALYZE SELECT * FROM u NATURAL JOIN t;DROP TABLE u; QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
Group (cost=44508.90..49008.90 rows=30000 width=20) (actual time=1305.381..2028.385 rows=300000 loops=1) -> Sort (cost=44508.90..45258.90 rows=300000 width=20) (actual time=1305.374..1673.843 rows=300000 loops=1)
         Sort Key: j, k, x, z, y
         Sort Method:  external merge  Disk: 8792kB
-> Seq Scan on t (cost=0.00..4911.00 rows=300000 width=20) (actual time=0.008..62.935 rows=300000 loops=1)
 Total runtime: 2873.590 ms
(6 rows)

QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
Merge Join (cost=46229.86..72644.38 rows=1 width=20) (actual time=1420.527..2383.507 rows=300000 loops=1) Merge Cond: ((t.j = u.j) AND (t.k = u.k) AND (t.x = u.x) AND (t.y = u.y) AND (t.z = u.z)) -> Index Scan using t_pkey on t (cost=0.00..14877.18 rows=300000 width=20) (actual time=0.013..118.244 rows=300000 loops=1) -> Materialize (cost=46229.86..50123.52 rows=311493 width=20) (actual time=1420.498..1789.864 rows=300000 loops=1) -> Sort (cost=46229.86..47008.59 rows=311493 width=20) (actual time=1420.493..1692.988 rows=300000 loops=1)
               Sort Key: u.j, u.k, u.x, u.y, u.z
               Sort Method:  external merge  Disk: 8784kB
-> Seq Scan on u (cost=0.00..5025.93 rows=311493 width=20) (actual time=0.018..78.850 rows=300000 loops=1)
 Total runtime: 2424.870 ms
(9 rows)

(Adding an "ANALYZE" on the temporary table improves the JOIN estimated fow count from 1 to about 299500, but does not change the plan.)

Clem Dickey

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Reply via email to