In keeping with some of the recent threads regarding the planner...
I have a fair sized data warehouse in which I am trying to perform an
aggregation, but getting OOM errors in Postgres(8.2.4).
I believe the reason for the OOM is that Postgres is attempting to do a hash
aggregation, but it has grossly underestimated the rows resulting from the
aggregation.
The data in the database is very uniformly distributed so I don't believe that
the table stats are the cause of the problem.
This may be related to table inheritance, and can be demonstrated pretty easily.
CREATE TABLE foo(a INT);ANALYZE foo;
CREATE TABLE foo_1() INHERITS(foo);insert into foo_1 select
generate_series(1,100000);insert into foo_1 select
generate_series(1,100000);insert into foo_1 select
generate_series(1,100000);ANALYZE foo_1;
CREATE TABLE foo_2() INHERITS(foo);insert into foo_2 select
generate_series(1,100000);insert into foo_2 select
generate_series(1,100000);insert into foo_2 select
generate_series(1,100000);ANALYZE foo_2;
-- If I query a particular partition, the plan estimate for the hash aggregate
is good
EXPLAIN ANALYZE SELECT a,COUNT(*) from foo_1 group by a;
HashAggregate (cost=5822.00..7061.01 rows=99121 width=4) (actual
time=554.556..657.121 rows=100000 loops=1) -> Seq Scan on foo_1
(cost=0.00..4322.00 rows=300000 width=4) (actual time=0.014..203.290
rows=300000 loops=1) Total runtime: 712.211 ms
-- If I query the base table, the plan estimate for the hash aggregate is off
by several orders of magnitude
EXPLAIN ANALYZE SELECT a,COUNT(*) from foo group by a;
HashAggregate (cost=11686.10..11688.60 rows=200 width=4) (actual
time=1724.188..1826.630 rows=100000 loops=1) -> Append (cost=0.00..8675.40
rows=602140 width=4) (actual time=0.016..1045.134 rows=600000 loops=1)
-> Seq Scan on foo (cost=0.00..31.40 rows=2140 width=4) (actual
time=0.001..0.001 rows=0 loops=1) -> Seq Scan on foo_1 foo
(cost=0.00..4322.00 rows=300000 width=4) (actual time=0.012..205.130
rows=300000 loops=1) -> Seq Scan on foo_2 foo (cost=0.00..4322.00
rows=300000 width=4) (actual time=0.011..203.542 rows=300000 loops=1) Total
runtime: 1879.550 ms(6 rows)
-- Is there something magical about the hash aggregate estimate of 200 rows?
-- I can have 30,000 or 300,000 rows in each child partition table and multiple
partition's with different values of "a" and yet it always come up with 200.
-- eg.
create table foo_3() inherits(foo);insert into foo_3 select
generate_series(100000,300000);analyze foo_3;
EXPLAIN ANALYZE SELECT a,COUNT(*) from foo group by a;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=15568.12..15570.62 rows=200 width=4) (actual
time=2386.813..2691.254 rows=300000 loops=1) -> Append (cost=0.00..11557.41
rows=802141 width=4) (actual time=0.016..1403.121 rows=800001 loops=1)
-> Seq Scan on foo (cost=0.00..31.40 rows=2140 width=4) (actual
time=0.002..0.002 rows=0 loops=1) -> Seq Scan on foo_1 foo
(cost=0.00..4322.00 rows=300000 width=4) (actual time=0.013..201.549
rows=300000 loops=1) -> Seq Scan on foo_2 foo (cost=0.00..4322.00
rows=300000 width=4) (actual time=0.010..211.332 rows=300000 loops=1)
-> Seq Scan on foo_3 foo (cost=0.00..2882.01 rows=200001 width=4) (actual
time=0.011..137.262 rows=200001 loops=1) Total runtime: 2851.990 ms
Is this a bug, or some subtlety of the Postgres query planner?
In my particular case, I am doing a join out to another small table as part of
the aggregation and using constraint exclusion on the partitions, but I believe
the cause of my problem is the same.
I am running on 64bit FreeBSD/Postgres 8.2.4 on a machine with 8GB of memory.
An explain of the query and resulting OOM diagnostics follow:
The aggregation will result in 5,000,000 rows, not 5,000.In the stats_dtl table
there are 12 observations(hourly) for each customerThere are 125 different
policy_id and 25 different policy_group_id'sPolicy's and policy_groups are even
distributed across all customers
userquery Scan table_4760 (cost=2897243.60..2897418.60 rows=5000 width=152)
-> HashAggregate (cost=2897243.60..2897368.60 rows=5000 width=40) ->
Hash Join (cost=7.81..2241002.00 rows=37499520 width=40) Hash
Cond: (public.customer_stats_dtl.policy_id = policy.policy_id) ->
Append (cost=0.00..1641001.87 rows=59999232 width=40) ->
Seq Scan on customer_stats_dtl (cost=0.00..22.45 rows=4 width=40)
Filter: ((period_start >= '2007-09-08 20:00:00-04'::timestamp with
time zone) AND (period_start < '2007-09-09 08:00:00-04'::timestamp with time
zone)) -> Seq Scan on customer_stats_dtl_027
customer_stats_dtl (cost=0.00..1640979.42 rows=59999228 width=40)
Filter: ((period_start >= '2007-09-08 20:00:00-04'::timestamp with
time zone) AND (period_start < '2007-09-09 08:00:00-04'::timestamp with time
zone)) -> Hash (cost=6.25..6.25 rows=125 width=8)
-> Seq Scan on policy (cost=0.00..6.25 rows=125 width=8)
TopMemoryContext: 268400 total in 32 blocks; 20912 free (43 chunks);
247488 usedunnamed prepared statement: 2097152 total in 8 blocks; 721456 free
(2 chunks); 1375696 usedTopTransactionContext: 8192 total in 1 blocks; 7648
free (0 chunks); 544 usedSPI Plan: 3072 total in 2 blocks; 1320 free (0
chunks); 1752 usedSPI Plan: 7168 total in 3 blocks; 1312 free (0 chunks); 5856
usedSPI Plan: 3072 total in 2 blocks; 1320 free (0 chunks); 1752 usedSPI Plan:
3072 total in 2 blocks; 928 free (0 chunks); 2144 usedSPI Plan: 7168 total in 3
blocks; 3920 free (0 chunks); 3248 usedSPI Plan: 3072 total in 2 blocks; 1224
free (0 chunks); 1848 usedPL/PgSQL function context: 24576 total in 2 blocks;
12008 free (9 chunks); 12568 usedCFuncHash: 8192 total in 1 blocks; 1680 free
(0 chunks); 6512 usedRendezvous variable hash: 8192 total in 1 blocks; 1680
free (0 chunks); 6512 usedPLpgSQL function cache: 24224 total in 2 blocks; 3744
free (0 chunks); 20480 usedType information cache: 24576 total in 2 blocks;
11888 free (5 chunks); 12688 usedS_2: 1024 total in 1 blocks; 488 free (0
chunks); 536 usedS_1: 1024 total in 1 blocks; 488 free (0 chunks); 536
usedPrepared Queries: 24576 total in 2 blocks; 11888 free (5 chunks); 12688
usedRecord information cache: 24576 total in 2 blocks; 15984 free (5 chunks);
8592 usedMessageContext: 8192 total in 1 blocks; 7744 free (0 chunks); 448
usedOperator class cache: 8192 total in 1 blocks; 1680 free (0 chunks); 6512
usedsmgr relation table: 57344 total in 3 blocks; 21872 free (8 chunks); 35472
usedTransactionAbortContext: 32768 total in 1 blocks; 32736 free (0 chunks); 32
usedPortal hash: 8192 total in 1 blocks; 1680 free (0 chunks); 6512
usedPortalMemory: 8192 total in 1 blocks; 7888 free (1 chunks); 304
usedPortalHeapMemory: 1024 total in 1 blocks; 768 free (0 chunks); 256
usedExecutorState: 122880 total in 4 blocks; 68728 free (2 chunks); 54152
usedHashTableContext: 0 total in 0 blocks; 0 free (0 chunks); 0
usedHashBatchContext: 32888 total in 3 blocks; 14512 free (0 chunks); 18376
usedExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 usedExprContext: 0
total in 0 blocks; 0 free (0 chunks); 0 usedExprContext: 8192 total in 1
blocks; 8160 free (0 chunks); 32 usedExprContext: 0 total in 0 blocks; 0 free
(0 chunks); 0 usedExprContext: 8192 total in 1 blocks; 8160 free (0 chunks); 32
usedAggContext: 1033887744 total in 133 blocks; 688 free (0 chunks); 1033887056
used **TupleHashTable: 566485040 total in 78 blocks; 2064896 free
(304 chunks); 564420144 used **ExprContext: 0 total in 0 blocks; 0 free
(0 chunks); 0 usedExprContext: 8192 total in 1 blocks; 8040 free (1 chunks);
152 usedExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 usedRelcache by
OID: 57344 total in 3 blocks; 31024 free (6 chunks); 26320
usedCacheMemoryContext: 4679720 total in 23 blocks; 404104 free (1 chunks);
4275616 used
If you have actually read this far, the following wrinkle is where I am
currently stuck.
I set ENABLE_HASHAGG=OFF and still get an OOM :-(
In this situation Postgres runs for much longer before getting the OOM and I
believe it has actually written some records to the aggregationI base this on
seeing autovacuum kick off against the aggregation table after the query fails,
which may not be valid.
userquery Scan table_4760 (cost=8771233.77..9521474.17 rows=5000 width=152)
-> GroupAggregate (cost=8771233.77..9521349.17 rows=5000 width=40) ->
Sort (cost=8771233.77..8864982.57 rows=37499520 width=40) Sort
Key: public.customer_stats_dtl.user_id, policy.policy_group_id ->
Hash Join (cost=7.81..2241002.00 rows=37499520 width=40)
Hash Cond: (public.customer_stats_dtl.policy_id = policy.policy_id)
-> Append (cost=0.00..1641001.87 rows=59999232 width=40)
-> Seq Scan on customer_stats_dtl (cost=0.00..22.45 rows=4
width=40) Filter: ((period_start >= '2007-09-08
20:00:00-04'::timestamp with time zone) AND (period_start < '2007-09-09
08:00:00-04'::timestamp with time zone)) -> Seq Scan
on customer_stats_dtl_027 customer_stats_dtl (cost=0.00..1640979.42
rows=59999228 width=40) Filter: ((period_start
>= '2007-09-08 20:00:00-04'::timestamp with time zone) AND (period_start <
'2007-09-09 08:00:00-04'::timestamp with time zone)) ->
Hash (cost=6.25..6.25 rows=125 width=8) -> Seq Scan
on policy (cost=0.00..6.25 rows=125 width=8)
TopMemoryContext: 874608 total in 106 blocks; 14240 free (45 chunks); 860368
usedunnamed prepared statement: 2097152 total in 8 blocks; 720544 free (2
chunks); 1376608 usedTopTransactionContext: 8192 total in 1 blocks; 7648 free
(0 chunks); 544 usedAfterTriggerEvents: 758112256 total in 102 blocks; 1792
free (8 chunks); 758110464 usedS_2: 1024 total in 1 blocks; 488 free (0
chunks); 536 usedType information cache: 24576 total in 2 blocks; 11888 free (5
chunks); 12688 usedSPI Plan: 3072 total in 2 blocks; 1320 free (0 chunks); 1752
usedSPI Plan: 7168 total in 3 blocks; 1312 free (0 chunks); 5856 usedSPI Plan:
3072 total in 2 blocks; 1320 free (0 chunks); 1752 usedSPI Plan: 3072 total in
2 blocks; 928 free (0 chunks); 2144 usedSPI Plan: 7168 total in 3 blocks; 3920
free (0 chunks); 3248 usedSPI Plan: 3072 total in 2 blocks; 1224 free (0
chunks); 1848 usedPL/PgSQL function context: 24576 total in 2 blocks; 12008
free (9 chunks); 12568 usedCFuncHash: 8192 total in 1 blocks; 1680 free (0
chunks); 6512 usedRendezvous variable hash: 8192 total in 1 blocks; 1680 free
(0 chunks); 6512 usedPLpgSQL function cache: 24224 total in 2 blocks; 3744 free
(0 chunks); 20480 usedS_1: 1024 total in 1 blocks; 488 free (0 chunks); 536
usedPrepared Queries: 24576 total in 2 blocks; 11888 free (5 chunks); 12688
usedRecord information cache: 24576 total in 2 blocks; 15984 free (5 chunks);
8592 usedMessageContext: 8192 total in 1 blocks; 7744 free (0 chunks); 448
usedOperator class cache: 8192 total in 1 blocks; 1680 free (0 chunks); 6512
usedsmgr relation table: 253952 total in 5 blocks; 40912 free (16 chunks);
213040 usedTransactionAbortContext: 32768 total in 1 blocks; 32736 free (0
chunks); 32 usedPortal hash: 8192 total in 1 blocks; 1680 free (0 chunks); 6512
usedPortalMemory: 8192 total in 1 blocks; 7888 free (1 chunks); 304
usedPortalHeapMemory: 1024 total in 1 blocks; 768 free (0 chunks); 256
usedExecutorState: 122880 total in 4 blocks; 61720 free (15 chunks); 61160
usedExprContext: 8192 total in 1 blocks; 8160 free (0 chunks); 32
usedHashTableContext: 0 total in 0 blocks; 0 free (0 chunks); 0
usedHashBatchContext: 32888 total in 3 blocks; 14512 free (0 chunks); 18376
usedTupleSort: 822437960 total in 105 blocks; 424201336 free (5267294 chunks);
398236624 usedExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0
usedExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 usedExprContext:
8192 total in 1 blocks; 8160 free (0 chunks); 32 usedExprContext: 0 total in 0
blocks; 0 free (0 chunks); 0 usedExprContext: 8192 total in 1 blocks; 8160 free
(0 chunks); 32 usedAggContext: 8192 total in 1 blocks; 8000 free (3 chunks);
192 usedExprContext: 8192 total in 1 blocks; 8000 free (0 chunks); 192
usedExprContext: 8192 total in 1 blocks; 8160 free (0 chunks); 32
usedExprContext: 8192 total in 1 blocks; 8040 free (0 chunks); 152 usedRelcache
by OID: 253952 total in 5 blocks; 117488 free (8 chunks); 136464
usedCacheMemoryContext: 17262632 total in 25 blocks; 3573056 free (0 chunks);
13689576 used
_________________________________________________________________
Explore the seven wonders of the world
http://search.msn.com/results.aspx?q=7+wonders+world&mkt=en-US&form=QBRE