When sum(int4) or sum(int2) is executed, many cycles are spent by
AllocSetReset. Because per-tuple context is used to allocate the
first data of each group.

An attached patch uses AggState->aggcontext instead of per-tuple
context to allocate the data. As a result, per-tuple context is not
used, and the cycles of AllocSetReset is reduced.

test data:
pgbench -i -s 5

SQL:
select a.bid, sum(a.abalance)
from accounts a
group by a.bid;

execution time(compile option "-O2"):
 original: 1.530s
 patched:  1.441s

profile result of original code(compile option "-g -pg"):
----------------------------------------------------------------------------
  %   cumulative   self              self     total
 time   seconds   seconds    calls   s/call   s/call  name
 15.64      0.35     0.35  1500000     0.00     0.00  slot_deform_tuple
 11.67      0.62     0.27  1000002     0.00     0.00  AllocSetReset
  6.61      0.77     0.15  1999995     0.00     0.00  slot_getattr
  5.29      0.89     0.12   500002     0.00     0.00  heapgettup
  3.52      0.97     0.08   524420     0.00     0.00  hash_search

profile result of patched code(compile option "-g -pg"):
----------------------------------------------------------------------------
  %   cumulative   self              self     total
 time   seconds   seconds    calls   s/call   s/call  name
 17.39      0.32     0.32  1500000     0.00     0.00  slot_deform_tuple
  6.52      0.44     0.12   500002     0.00     0.00  heapgettup
  6.25      0.56     0.12  1999995     0.00     0.00  slot_getattr
  4.35      0.64     0.08   524420     0.00     0.00  hash_search
  4.35      0.71     0.08   499995     0.00     0.00  execTuplesMatch
   (skip ...)
  0.54      1.67     0.01  1000002     0.00     0.00  AllocSetReset

regards,

--- Atsushi Ogawa

Attachment: sum.patch
Description: Binary data

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Reply via email to