>>>>> "Mark" == Mark Dilger <hornschnor...@gmail.com> writes:

 Mark> Is there a performance test case where this patch should shine
 Mark> brightest?  I'd like to load a schema with lots of data, and run
 Mark> a grouping sets query, both before and after applying the patch,
 Mark> to see what the performance advantage is.

The area which I think is most important for performance is the handling
of small cubes; without this patch, a 2d cube needs 2 full sorts, a 3d
one needs 3, and a 4d one needs 6. In many real-world data sets these
would all hash entirely in memory.

So here's a very simple example (deliberately using integers for
grouping to minimize the advantage; grouping by text columns in a non-C
locale would show a much greater speedup for the patch):

create table sales (
  id serial,
  product_id integer,
  store_id integer,
  customer_id integer,
  qty integer);

-- random integer function
create function d(integer) returns integer language sql
 as $f$ select floor(random()*$1)::integer + 1; $f$;

-- 10 million random rows
insert into sales (product_id,store_id,customer_id,qty)
  select d(20), d(6), d(10), d(100) from generate_series(1,10000000);

-- example 2d cube:
select product_id, store_id, count(*), sum(qty)
  from sales
 group by cube(product_id, store_id);

-- example 3d cube:
select product_id, store_id, customer_id, count(*), sum(qty)
  from sales
 group by cube(product_id, store_id, customer_id);

-- example 4d cube with a computed column:
select product_id, store_id, customer_id, (qty / 10), count(*), sum(qty)
  from sales
 group by cube(product_id, store_id, customer_id, (qty / 10));

On my machine, the 2d cube is about 3.6 seconds with the patch, and
about 8 seconds without it; the 4d is about 18 seconds with the patch
and about 32 seconds without it (all with work_mem=1GB, compiled with
-O2 and assertions off).

Andrew (irc:RhodiumToad)

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to