On Apr10, 2014, at 02:13 , Florian Pflug <f...@phlo.org> wrote:
> On Apr9, 2014, at 23:17 , Florian Pflug <f...@phlo.org> wrote:
>> On Apr9, 2014, at 21:35 , Tom Lane <t...@sss.pgh.pa.us> wrote:
>>> A quick test says that avg(int4)
>>> is about five percent slower than sum(int4), so that's the kind of hit
>>> we'd be taking on non-windowed aggregations if we do it like this.
>> That's rather surprising though. AFAICS, there's isn't much difference
>> between the two transfer functions int4_sum and int4_avg_accum at all.
>> The differences come down to (+ denoting things which ought to make
>> int4_avg_accum slower compared to int4_sum, - denoting the opposite)
>> 1. +) int4_avg_accum calls AggCheckCallContext
>> 2. -) int4_sum checks if the state is NULL (it never is for int4_avg_accum)
>> 3. +) int4_avg_accum uses ARR_HASNULL, ARR_SIZE, ARR_OVERHEAD_NONULLS
>>      to verify that the state is a 2-element array without NULL entries
>> 4. -) int4_sum checks if the input is NULL
> I've done a bit of profiling on this (using Instruments.app on OSX). One
> thing I missed is that inv4_avg_accum also calls pg_detoast_datum - that
> calls comes from the PG_GETARG_ARRAYTYPE_P macro. Doing that is a bit silly,
> since we know that the datum cannot possibly be toasted I think (or if it
> was, nodeAgg.c should do the de-toasting).
> The profile also attributes a rather large percent of the total runtime of
> int4_avg_accum (around 30%) to the call of AggCheckCallContext(). Getting
> rid of that would help quite a few transition functions, invertible or not.
> That certainly seems doable - we'd need a way to mark functions as internal
> support functions, and prevent user-initiated calls of such functions. 
> Transition functions marked that way could then safely scribble over their
> state arguments without having to consult AggCheckCallContext() first.
> ...
> However, I still believe the best approach at this point is to just work
> on making int4_avg_accum faster. I still see no principal reason what it
> has to be noticeably slower - the only additional work it absolutely *has*
> to perform is *one* 64-bit increment.

I played with this a bit.

Currently, int4_avg_accum invokes AggCheckCallContext every time, and also
repeatedly checks whether the array has the required dimension - which,
incidentally, is the only big difference between int4_avg_accum and int4_sum.

To avoid that, I added a flags field to fcinfo which nodeAgg uses to tell
transition functions whether they're called for the first time, or are being
called with whatever state they themselves returned last time, i.e the
n-th time.

If the n-th time flag is set, int4_avg_accum simply retrieves the state with
PG_GETARG_DATUM() instead of PG_GETARG_ARRAYTYPE_P(), relying on the fact
that it never returns toasted datums itself, and also doesn't bother to validate
the array size, for the same reason.

If the flag is not set, it uses PG_GETARG_ARRAYTYPE_COPY_P and does validate
the array size. In theory, it could further distinguish between a 1st call
in an aggregation context (where the copy is unnecessary), and a user-initiated
call (i.e. outside an aggregation). But that seems unnecessary - one additional
copy per aggregation group seems unlikely to be a problem.

With this in place, instruction-level profiling using Apple's Instruments.app
shows that int4_avg_accum takes about 1.5% of the total runtime of a simple
aggregation, while int4_sum takes about 1.2%. 

A (very rough) patch is attached.

I haven't been able to repeat Tom's measurement which shows a 5% difference in
performance between the invertible and the non-invertible versions of SUM(int4),
so I cannot say if this removes that. But the profiling I've done would 
indicate it should.

best regards,
Florian Pflug

Attachment: invtrans_sumint4_opt2.patch
Description: Binary data

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to