On 11 October 2010 16:44, Tom Lane <t...@sss.pgh.pa.us> wrote: > Dean Rasheed <dean.a.rash...@gmail.com> writes: >> On 11 October 2010 15:03, Tom Lane <t...@sss.pgh.pa.us> wrote: >>> Reflecting on it, I think it'd be best to allow an agg to >>> provide an estimation function that'd be told the input data type and >>> expected number of rows --- even on a per-aggregate basis, a constant >>> estimate just isn't good enough. > >> How good will that estimate of the number of rows be though? > > It can't possibly be any worse than a hard-wired constant ;-) > >> If they're coming from a SRF it could be a huge under-estimate, and you'd >> still risk eating all the memory, if you allowed a hash aggregate. > > If, for a particular aggregate, you're too chicken to ever allow hash > aggregation, you could just return a very large number from the > estimation hook function. I doubt that's a very useful behavior in the > majority of cases though. >
Yeah, for median I'd be too chicken :-) set work_mem = '2MB'; explain analyse select median(i) from generate_series(1,40000) as g(i) group by i; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------- HashAggregate (cost=15.00..17.50 rows=200 width=4) (actual time=229.524..287.153 rows=40000 loops=1) -> Function Scan on generate_series g (cost=0.00..10.00 rows=1000 width=4) (actual time=8.738..16.595 rows=40000 loops=1) Total runtime: 811.460 ms (3 rows) The estimate of 200 x 8K is below work_mem, so it uses a hash aggregate. In reality, each tuplesort allocates around 30K initially, so it very quickly uses over 1GB. A better estimate for the aggregate wouldn't improve this situation much. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers