On Apr9, 2014, at 21:35 , Tom Lane <t...@sss.pgh.pa.us> wrote: > As a quick check, I compared aggregation performance in HEAD, non-assert > builds, with and without --disable-float8-byval on a 64-bit machine. > So this tests replacing a pass-by-val transition datatype with a > pass-by-ref one without any other changes. There's essentially no > difference in performance of sum(int4), AFAICT, but that's because > int4_sum goes out of its way to cheat and avoid palloc overhead.
> I looked to the bit_and() aggregates to see what would happen to > an aggregate not thus optimized. As expected, int4 and int8 bit_and > are just about the same speed if int8 is pass by value ... but if it's > pass by ref, the int8 case is a good 60% slower. True, but that just means that aggregate transition functions really ought to update the state in-place, no? > So added palloc overhead, at least, is a no-go. I see that the patched > version of sum(int4) avoids that trap, but nonetheless it's replaced a > pretty cheap transition function with a less cheap function, namely the > function previously used for avg(int4). A quick test says that avg(int4) > is about five percent slower than sum(int4), so that's the kind of hit > we'd be taking on non-windowed aggregations if we do it like this. That's rather surprising though. AFAICS, there's isn't much difference between the two transfer functions int4_sum and int4_avg_accum at all. The differences come down to (+ denoting things which ought to make int4_avg_accum slower compared to int4_sum, - denoting the opposite) 1. +) int4_avg_accum calls AggCheckCallContext 2. -) int4_sum checks if the state is NULL (it never is for int4_avg_accum) 3. +) int4_avg_accum uses ARR_HASNULL, ARR_SIZE, ARR_OVERHEAD_NONULLS to verify that the state is a 2-element array without NULL entries 4. -) int4_sum checks if the input is NULL The number of conditional branches should be about the same (and all are seldomly taken). The validity checks on the state array, i.e. (3), should be rather cheap I think - not quite as cheap as PG_ARGISNULL maybe, but not so much more expensive either. That leaves the AggCheckCallContext call. If that call costs us 5%, maybe we can find a way to make it faster, or get rid of it entirely? Still seems a lot of a call of a not-very-complex function, though... I'll go and check the disassembly - maybe something in int4_avg_accum turns out to be more complex than is immediately obvious. I'll also try to create a call profile, unless you already have one from your test runs. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (email@example.com) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers