Re: [PATCHES] sum(int4)/sum(int2) improvement

2005-09-15 Thread Bruce Momjian

This has been saved for the 8.2 release:

http://momjian.postgresql.org/cgi-bin/pgpatches_hold

---

Atsushi Ogawa wrote:
 
 When sum(int4) or sum(int2) is executed, many cycles are spent by
 AllocSetReset. Because per-tuple context is used to allocate the
 first data of each group.
 
 An attached patch uses AggState-aggcontext instead of per-tuple
 context to allocate the data. As a result, per-tuple context is not
 used, and the cycles of AllocSetReset is reduced.
 
 test data:
 pgbench -i -s 5
 
 SQL:
 select a.bid, sum(a.abalance)
 from accounts a
 group by a.bid;
 
 execution time(compile option -O2):
  original: 1.530s
  patched:  1.441s
 
 profile result of original code(compile option -g -pg):
 
   %   cumulative   self  self total
  time   seconds   secondscalls   s/call   s/call  name
  15.64  0.35 0.35  150 0.00 0.00  slot_deform_tuple
  11.67  0.62 0.27  102 0.00 0.00  AllocSetReset
   6.61  0.77 0.15  195 0.00 0.00  slot_getattr
   5.29  0.89 0.12   52 0.00 0.00  heapgettup
   3.52  0.97 0.08   524420 0.00 0.00  hash_search
 
 profile result of patched code(compile option -g -pg):
 
   %   cumulative   self  self total
  time   seconds   secondscalls   s/call   s/call  name
  17.39  0.32 0.32  150 0.00 0.00  slot_deform_tuple
   6.52  0.44 0.12   52 0.00 0.00  heapgettup
   6.25  0.56 0.12  195 0.00 0.00  slot_getattr
   4.35  0.64 0.08   524420 0.00 0.00  hash_search
   4.35  0.71 0.08   45 0.00 0.00  execTuplesMatch
(skip ...)
   0.54  1.67 0.01  102 0.00 0.00  AllocSetReset
 
 regards,
 
 --- Atsushi Ogawa

[ Attachment, skipping... ]

 
 ---(end of broadcast)---
 TIP 2: Don't 'kill -9' the postmaster

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PATCHES] sum(int4)/sum(int2) improvement

2005-09-15 Thread Bruce Momjian

Patch already applied in 8.1 in different way.

---

Atsushi Ogawa wrote:
 
 When sum(int4) or sum(int2) is executed, many cycles are spent by
 AllocSetReset. Because per-tuple context is used to allocate the
 first data of each group.
 
 An attached patch uses AggState-aggcontext instead of per-tuple
 context to allocate the data. As a result, per-tuple context is not
 used, and the cycles of AllocSetReset is reduced.
 
 test data:
 pgbench -i -s 5
 
 SQL:
 select a.bid, sum(a.abalance)
 from accounts a
 group by a.bid;
 
 execution time(compile option -O2):
  original: 1.530s
  patched:  1.441s
 
 profile result of original code(compile option -g -pg):
 
   %   cumulative   self  self total
  time   seconds   secondscalls   s/call   s/call  name
  15.64  0.35 0.35  150 0.00 0.00  slot_deform_tuple
  11.67  0.62 0.27  102 0.00 0.00  AllocSetReset
   6.61  0.77 0.15  195 0.00 0.00  slot_getattr
   5.29  0.89 0.12   52 0.00 0.00  heapgettup
   3.52  0.97 0.08   524420 0.00 0.00  hash_search
 
 profile result of patched code(compile option -g -pg):
 
   %   cumulative   self  self total
  time   seconds   secondscalls   s/call   s/call  name
  17.39  0.32 0.32  150 0.00 0.00  slot_deform_tuple
   6.52  0.44 0.12   52 0.00 0.00  heapgettup
   6.25  0.56 0.12  195 0.00 0.00  slot_getattr
   4.35  0.64 0.08   524420 0.00 0.00  hash_search
   4.35  0.71 0.08   45 0.00 0.00  execTuplesMatch
(skip ...)
   0.54  1.67 0.01  102 0.00 0.00  AllocSetReset
 
 regards,
 
 --- Atsushi Ogawa

[ Attachment, skipping... ]

 
 ---(end of broadcast)---
 TIP 2: Don't 'kill -9' the postmaster

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PATCHES] sum(int4)/sum(int2) improvement

2005-09-01 Thread Tom Lane
Atsushi Ogawa [EMAIL PROTECTED] writes:
 An attached patch uses AggState-aggcontext instead of per-tuple
 context to allocate the data. As a result, per-tuple context is not
 used, and the cycles of AllocSetReset is reduced.

Why is this better than the fix already in place?

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PATCHES] sum(int4)/sum(int2) improvement

2005-09-01 Thread Atsushi Ogawa
Tom Lane wrote:
 Atsushi Ogawa [EMAIL PROTECTED] writes:
  An attached patch uses AggState-aggcontext instead of per-tuple
  context to allocate the data. As a result, per-tuple context is not
  used, and the cycles of AllocSetReset is reduced.
 
 Why is this better than the fix already in place?

Because per-tuple context is reset many times. If per-tuple context is 
never used, the following codes of AllocSetReset become effective.

/* Nothing to do if context has never contained any data */
if (block == NULL)
return;

--- Atsushi Ogawa


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PATCHES] sum(int4)/sum(int2) improvement

2005-09-01 Thread Tom Lane
Atsushi Ogawa [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 Why is this better than the fix already in place?

 Because per-tuple context is reset many times. If per-tuple context is 
 never used, the following codes of AllocSetReset become effective.

 /* Nothing to do if context has never contained any data */
 if (block == NULL)
 return;

Hm.  It's still pretty ugly though, since fixing only those two places
isn't very exciting.

I went back and looked at your AllocSetReset patch from last May, and
liked it better on second consideration.  The point I had missed before
is that even though we're adding an instruction to palloc, it's only
needed in the slow paths.  The cases that we have to be worried about,
IMHO, are:

1. Loop containing palloc(s) and pfree(s).

2. Loop containing palloc(s) and MemoryContextReset.

3. Loop containing only MemoryContextReset.

(We need not worry about cases doing only palloc since that is obviously
not sustainable over any long period.)  Case 1 is the one that was
bothering me, but in practice, once the loop is rolling the pallocs
should mostly be reusing previously-pfreed chunks.  That is the fast
path through AllocSetAlloc, and it doesn't need any extra overhead
because it doesn't need to clear the alloc-set-is-empty flag --- the
set is clearly already nonempty if it has free chunks.  There's enough
instructions in the other paths that one more isn't going to hurt.

In case 2 we're adding instructions to both palloc and
MemoryContextReset, for no gain :-( ... but again, these are in code
paths that aren't the fastest possible.  I don't think it'll hurt
noticeably.

I notice also that MemoryContextIsEmpty could return a better answer
if we maintained a flag like this.  This is not real important in the
current scheme of things, but might be more interesting someday.

I'll fix up the old patch and apply it.  I think that's a better answer
than kluging up the aggregate functions themselves.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq