On 06/06/2018 04:11 PM, Andres Freund wrote: > On 2018-06-06 16:06:18 +0200, Tomas Vondra wrote: >> On 06/06/2018 04:01 PM, Andres Freund wrote: >>> Hi, >>> >>> On 2018-06-06 15:58:16 +0200, Tomas Vondra wrote: >>>> The other issue is that serialize/deserialize is only a part of a problem - >>>> you also need to know how to do "combine", and not all aggregates can do >>>> that ... (certainly not in universal way). >>> >>> There are several schemes where only serialize/deserialize are needed, >>> no? There are a number of fairly sensible schemes where there won't be >>> multiple transition values for the same group, no? >>> >> >> Possibly, not sure what schemes you have in mind exactly ... >> >> But if you know there's only a single transition value, why would you need >> serialize/deserialize at all. Why couldn't you just finalize the value and >> serialize that? > > Because you don't necessarily have all the necessary input rows > yet. > > Consider e.g. a scheme where we'd switch from hashed aggregation to > sorted aggregation due to memory limits, but already have a number of > transition values in the hash table. Whenever the size of the transition > values in the hashtable exceeds memory size, we write one of them to the > tuplesort (with serialized transition value). From then on further input > rows for that group would only be written to the tuplesort, as the group > isn't present in the hashtable anymore. >
Ah, so you're suggesting that during the second pass we'd deserialize the transition value and then add the tuples to it, instead of building a new transition value. Got it. That being said, I'm not sure if such generic serialize/deserialize can work, but I'd guess no, otherwise we'd probably use it when implementing the parallel aggregate. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services