[HACKERS] O(n^2) aggregates

2007-12-10 Thread Gregory Stark

I was trying to test my patch to do posix_fadvise to speed up bitmap heap
scans (with disappointing results so far) and ran into a bit of a gotcha. I'm
not sure where this should be documented but it probably should be somewhere.

In order to test bitmap heap scan I had to build an array and use the = ANY(a)
form. (The natural approach of building a table of values to search for
produces a hash join or nested loop -- it may make sense to add a new kind of
join which uses an outer bitmap heap scan.)

So I defined an aggregate to group up the random values in an array in the
usual fashion:

CREATE AGGREGATE arrayize(anyelement) (
SFUNC = array_append,
STYPE = anyarray,
INITCOND = '{}'
);

and ran queries like:

select count(*)
  from huge
 where h = any ((select arrayize( (1+random()*3)::integer )
  from generate_series(1,1000)
)::integer[])

To test the behaviour for larger and larger samples I bumped the 1000 up
further and further and noticed a larger and large pause in disk access. When
I tried 4 the query took over 47 minutes nearly all of which had one cpu
pegged at 100%.

What's going on is that arrayize() is actually a O(n^2) algorithm since each
transition requires creating a copy of the entire array.

The solution to this would analogous to what we did to count(). We would need
to add a field to ArrayMetaState which is stored in fn_extra to remember the
last array returned. Then if array_push notices it has been called from an
aggregate context it can store its result in there. The next time it would
extend that array in place (which is code which doesn't currently exist),
possibly repallocing it and return the same pointer.

It's a bit of a hack but I think this is going to be a pretty common use case
and I don't see any more general solution.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's Slony Replication support!

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] O(n^2) aggregates

2007-12-10 Thread Tom Lane
Gregory Stark [EMAIL PROTECTED] writes:
 The solution to this would analogous to what we did to count(). We would need
 to add a field to ArrayMetaState which is stored in fn_extra to remember the
 last array returned. Then if array_push notices it has been called from an
 aggregate context it can store its result in there. The next time it would
 extend that array in place (which is code which doesn't currently exist),

contrib/intagg

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] O(n^2) aggregates

2007-12-10 Thread Pavel Stehule
Hello


 select count(*)
   from huge
  where h = any ((select arrayize( (1+random()*3)::integer )
   from generate_series(1,1000)
 )::integer[])


select array(select (1+random()*3):: integer from
generate_series(1,4));
Time: 111,807 ms

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings