David Rowley <[email protected]> wrote:
> On Wed, 4 Jan 2023 at 23:21, Spring Zhong <[email protected]> wrote:
> > The plan is apparently inefficient, since the hash aggregate goes after the
> > Cartesian product. We could expect the query's performance get much
> > improved if the HashAggregate node can be pushed down to the SCAN node.
>
> > Is someone has suggestions on this?
>
> I think this is being worked on. See [1].
Well, the current version of that patch requires the query to contain at least
one aggregate. It shouldn't be a big deal to modify it. However note that this
feature pushes the aggregate/grouping only to one side of the join ("fake"
aggregate count(*) added to the query):
SET enable_agg_pushdown TO on;
EXPLAIN select i1,i2, count(*) from t1, t2 group by i1,i2;
QUERY PLAN
--------------------------------------------------------------------------------
Finalize GroupAggregate (cost=440.02..440.04 rows=1 width=16)
Group Key: t1.i1, t2.i2
-> Sort (cost=440.02..440.02 rows=1 width=16)
Sort Key: t1.i1, t2.i2
-> Nested Loop (cost=195.00..440.01 rows=1 width=16)
-> Partial HashAggregate (cost=195.00..195.01 rows=1 width=12)
Group Key: t1.i1
-> Seq Scan on t1 (cost=0.00..145.00 rows=10000 width=4)
-> Seq Scan on t2 (cost=0.00..145.00 rows=10000 width=4)
If both sides should be grouped, finalization of the partial aggregates would
be more difficult, and I'm not sure it'd be worth the effort.
> [1] https://commitfest.postgresql.org/41/3764/
--
Antonin Houska
Web: https://www.cybertec-postgresql.com