Hi,

On 6/7/17 5:52 AM, Regina Obe wrote:
On 6/6/17 13:52, Regina Obe wrote:
It seems CREATE  AGGREGATE was expanded in 9.6 to support
parallelization of aggregate functions using transitions, with the
addition of serialfunc and deserialfunc to the aggregate definitions.

https://www.postgresql.org/docs/10/static/sql-createaggregate.html

I was looking at the PostgreSQL 10 source code for some example usages
of this and was hoping that array_agg and string_agg would support the feature.

I'm not sure how you would parallelize these, since in most uses
you want to have a deterministic output order.

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Good point.  If that's the reason it wasn't done, that's good just wasn't sure.

But if you didn't have an ORDER BY in your aggregate usage, and you
did have those transition functions, it shouldn't be any different from
any other use case right?
I imagine you are right that most folks who use array_agg and
string_agg usually combine it with array_agg(... ORDER BY ..)


I think that TL had in mind is something like

    SELECT array_agg(x) FROM (
       SELECT x FROM bar ORDER BY y
    ) foo;

i.e. a subquery producing the data in predictable order.

>
My main reason for asking is that most of the PostGIS geometry and
raster aggregate functions use transitions and were patterned after
array agg.


In the case of PostGIS the sorting is done internally and really
only to expedite take advantage of things like cascaded union
algorithms.
That is always done though (so even if each worker does it on just it's
batch that's still better than having only one worker).
So I think it's still very beneficial to break into separate jobs
since in the end the gather, will have say 2 biggish geometries or 2
biggish rasters to union if you have 2 workers which is still better
than having a million smallish geometries/rasters to union
I'm not sure I got your point correctly, but if you can (for example) sort the per-worker results as part of the "serialize" function, and benefit from that while combining that in the gather, then sure, that should be a huge win.

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to