When sum(int4) or sum(int2) is executed, many cycles are spent by AllocSetReset. Because per-tuple context is used to allocate the first data of each group.
An attached patch uses AggState->aggcontext instead of per-tuple context to allocate the data. As a result, per-tuple context is not used, and the cycles of AllocSetReset is reduced. test data: pgbench -i -s 5 SQL: select a.bid, sum(a.abalance) from accounts a group by a.bid; execution time(compile option "-O2"): original: 1.530s patched: 1.441s profile result of original code(compile option "-g -pg"): ---------------------------------------------------------------------------- % cumulative self self total time seconds seconds calls s/call s/call name 15.64 0.35 0.35 1500000 0.00 0.00 slot_deform_tuple 11.67 0.62 0.27 1000002 0.00 0.00 AllocSetReset 6.61 0.77 0.15 1999995 0.00 0.00 slot_getattr 5.29 0.89 0.12 500002 0.00 0.00 heapgettup 3.52 0.97 0.08 524420 0.00 0.00 hash_search profile result of patched code(compile option "-g -pg"): ---------------------------------------------------------------------------- % cumulative self self total time seconds seconds calls s/call s/call name 17.39 0.32 0.32 1500000 0.00 0.00 slot_deform_tuple 6.52 0.44 0.12 500002 0.00 0.00 heapgettup 6.25 0.56 0.12 1999995 0.00 0.00 slot_getattr 4.35 0.64 0.08 524420 0.00 0.00 hash_search 4.35 0.71 0.08 499995 0.00 0.00 execTuplesMatch (skip ...) 0.54 1.67 0.01 1000002 0.00 0.00 AllocSetReset regards, --- Atsushi Ogawa
sum.patch
Description: Binary data
---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster