Hi, While working on partial aggregation a few years ago, I didn't really think it was worthwhile allowing partial aggregation of string_agg and array_agg. I soon realised that I was wrong about that and allowing parallelisation of these aggregates still could be very useful when many rows are filtered out during the scan.
Some benchmarks that I've done locally show that parallel string_agg and array_agg do actually perform better, despite the fact that the aggregate state grows linearly with each aggregated item. Obviously, the performance will get even better when workers are filtering out rows before aggregation takes place, so it seems worthwhile doing this. However, the main reason that I'm motivated to do this is that there are more uses for partial aggregation other than just parallel aggregation, and it seems a shame to disable all these features if a single aggregate does not support partial mode. I've attached a patch which implements all this. I've had most of it stashed away for a while now, but I managed to get some time this weekend to get it into a more completed state. Things are now looking pretty good for the number of aggregates that support partial mode. Just a handful of aggregates now don't support partial aggregation; postgres=# select aggfnoid from pg_aggregate where aggcombinefn=0 and aggkind='n'; aggfnoid ------------------ xmlagg json_agg json_object_agg jsonb_agg jsonb_object_agg (5 rows) ... and a good number do support it; postgres=# select count(*) from pg_aggregate where aggcombinefn<>0 and aggkind='n'; count ------- 122 (1 row) There's probably no reason why the last 5 of those couldn't be done either, it might just require shifting a bit more work into the final functions, although, I'm not planning on that for this patch. As for the patch; there's a bit of a quirk in the implementation of string_agg. We previously always threw away the delimiter that belongs to the first aggregated value, but we do now need to keep that around so we can put it in between two states in the combine function. I decided the path of least resistance to do this was just to borrow StringInfo's cursor variable to use as a pointer to the state of the first value and put the first delimiter before that. Both the string_agg(text) and string_agg(bytea) already have a final function, so we just need to skip over the bytes up until the cursor position to get rid of the first delimiter. I could go and invent some new state type to do the same, but I don't really see the trouble with what I've done with StringInfo, but I'll certainly listen if someone else thinks this is wrong. Another thing that I might review later about this is seeing about getting rid of some of the code duplication between array_agg_array_combine and accumArrayResultArr. I'm going to add this to PG11's final commitfest rather than the January 'fest as it seems more like a final commitfest type of patch. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
combinefn_for_string_and_array_aggs_v1.patch
Description: Binary data