Re: [HACKERS] Combining Aggregates

Tomas Vondra Sun, 20 Mar 2016 14:56:27 -0700

Hi,

On 03/20/2016 04:48 AM, David Rowley wrote:

On 17 March 2016 at 14:25, Tomas Vondra <[email protected]> wrote:

On 03/16/2016 12:03 PM, David Rowley wrote:

Patch 2:
This adds the serial/deserial aggregate infrastructure, pg_dump
support, CREATE AGGREGATE changes, and nodeAgg.c changes to have it
serialise and deserialise aggregate states when instructed to do so.


Patch 3:
This adds a boat load of serial/deserial functions, and combine
functions for most of the built-in numerical aggregate functions. It
also contains some regression tests which should really be in patch 2,
but I with patch 2 there's no suitable serialisation or
de-serialisation functions to test CREATE AGGREGATE with. I think
having them here is ok, as patch 2 is quite useless without patch 3
anyway.


I don't see how you could move the tests into #2 when the functions are
defined in #3? IMHO this is the right place for the regression tests.


Yeah, but the CREATE AGGREGATE changes which are being tested in 0003
were actually added in 0002. I think 0002 is the right place to test
the changes to CREATE AGGREGATE, but since there's a complete lack of
functions to use, then I've just delayed until 0003.

Another thing to note about this patch is that I've gone and created
serial/de-serial functions for when PolyNumAggState both require
sumX2, and don't require sumX2. I had thought about perhaps putting an
extra byte in the serial format to indicate if a sumX2 is included,
but I ended up not doing it this way. I don't really want these serial
formats getting too complex as we might like to do fun things like
pass them along to sharded servers one day, so it might be nice to
keep them simple.



Hmmm, I've noticed that while eyeballing the diffs, and I'm not
sure if it's worth the additional complexity at this point. I mean,
one byte is hardly going to make a measurable difference - we're
probably wasting more than that due to alignment, for example.


I don't think any alignment gets done here. Look at pq_sendint(). Did
you mean the complexity of having extra functions, or having the
extra byte to check in the deserial func?

I haven't realized alignment does not apply here, but I still think theextra byte would be negligible in the bigger scheme of things. Forexample we're serializing the state into a bytea, which adds up to 4Bheader. So I don't think adding 1B would make any measurable difference.

But that assumes adding the 1B header would actually make the codesimpler. Now that I think about it, that may not the case - in the endyou'd probably get a function with two branches, with about the samesize as the two functions combined. So not really an improvement.

As you've mentioned sharded servers - how stable should the
serialized format be? I've assumed it to be transient, i.e. in the
extreme case it might change after restarting a database - for the
parallel aggregate that's clearly sufficient.

But if we expect this to eventually work in a sharded environment,
that's going to be way more complicated. I guess in these cases we
could rely on implicit format versioning, via the major the
version (doubt we'd tweak the format in a minor version anyway).

I'm not sure the sharding is particularly useful argument at this
point, considering we don't really know if the current format is
actually a good match for that.


Perhaps you're right. At this stage I've no idea if we'd want to
support shards on varying major versions. I think probably not,
since the node write functions might not be compatible with the node
read functions on the other server.


OK, so let's ignore the sharded setup for now.

I've attached another series of patches:

0001: This is the latest Parallel Aggregate Patch, not intended for
review here, but is required for the remaining patches. This patch has
changed quite a bit from the previous one that I posted here, and the
remaining patches needed re-based due to those changes.

0002: Adds serial/de-serial function support to CREATE AGGREGATE,
contains minor fix-ups from last version.

0003: Adds various combine/serial/de-serial functions for the standard
set of aggregates, apart from most float8 aggregates.

0004: Adds regression tests for 0003 pg_aggregate.h changes.

0005: Documents to mention which aggregate functions support partial mode.

0006: Re-based version of Haribabu's floating point aggregate support,
containing some fixes by me.

I went through the changes and found nothing suspicious, except maybefor the wording in one of the doc patches:


    All types apart from floating-point types

It may not be entirely clear for the readers, but this does not include"numeric" data type, only float4 and float8. But I don't think thisreally matters unless we fail to commit the last patch adding functionseven for those data types.

Also, I think it's probably the right time to fix the failures inopr_sanity regression tests. After all, we'll only whitelist a bunch ofserialize functions, so the tests will still detect any unexpectedfunctions dealing with 'internal' data type.


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Combining Aggregates

Reply via email to