On 9 April 2014 22:55, Tom Lane <t...@sss.pgh.pa.us> wrote:
> Florian Pflug <f...@phlo.org> writes:
>> I was (and still am) not in favour of duplicating the whole quadruple of
>> (state, initialvalue, transferfunction, finalfunction) because it seems
>> excessive. In fact, I believed that doing this would probably be grounds for
>> outright rejection of the patch, on the base of catalog bloat. And your
>> initial response to this suggestion seemed to confirm this.
> Well, I think it's much more likely that causing a performance penalty for
> cases unrelated to window aggregates would lead to outright rejection :-(.
> The majority of our users probably don't ever use window functions, but
> for sure they've heard of SUM(). We can't penalize the non-window case.
> Expanding pg_aggregate from 10 columns (as per patch) to 14 (as per this
> suggestion) is a little annoying but it doesn't sound like a show stopper.
> It seems reasonable to assume that the extra initval would be NULL in most
> cases, so it's probably a net addition of 12 bytes per row.
>> On Apr9, 2014, at 20:20 , Tom Lane <t...@sss.pgh.pa.us> wrote:
>>> The patch has in fact already done that to a couple of basic aggregates like
>>> sum(int4). Has anyone bothered to test what side-effects that has on
>>> non-windowed aggregation performance?
>> I'm pretty sure David Rowley did some benchmarking. The results should be
>> in this thread somewhere I think, but they currently evade me... Maybe David
>> can re-post, if he's following this...
> I saw benchmarks addressing window aggregation, but none looking for
> side-effects on plain aggregation.
>> If we really go down that road (and I'm far from convinced), then maybe
>> instead of having a bunch of additional fields, we could have separate
>> entries in pg_aggregate for the two cases, with links between them.
> That seems like a complete mess; in particular it would break the primary
> key for pg_aggregate (aggfnoid), and probably break every existing query
> that looks at pg_aggregate. Some extra fields would not break such
> expectations (in fact we've added fields to pg_aggregate in the past).
This may initially sound unrelated, but I think it might address some
of these issues. Suppose we added a 'firsttrans' function, that took a
single argument (the first value to be aggregated) and was responsible
for creating the initial state from that first value.
This would apply to aggregates that ignore null values, but whose
transition function cannot currently be declared strict (either
because the state type is internal, or because it is not the same as
the aggregate's argument type).
I think quite a lot of the existing aggregates fall into this
category, and this would allow their transition functions to be made
strict and simplified --- no more testing if the state is null, and
then building it, and no more testing if the argument is null and
ignoring it. That might give a noticeable performance boost in the
regular aggregate case, especially over data containing nulls.
But in addition, it would help with writing inverse transition
functions because if the transition functions could be made strict,
they wouldn't need to do null-counting, which would mean that their
state types would not need to be expanded.
So for example sum(int4) could continue to have int8 as its state
type, it could use int8(int4) as its firsttrans function, and
int4_sum() could become strict and lose all its null-handling logic.
Then int4_sum_inv() would be the trivial to write - just doing the
same in reverse.
I'm not sure it helps for all aggregates, but there are certainly some
where it would seem to simplify things.
Sent via pgsql-hackers mailing list (firstname.lastname@example.org)
To make changes to your subscription: