Re: Eager aggregation, take 3

Radim Marek Fri, 29 May 2026 08:55:29 -0700

Hey Richard,

I might be out of my depth here - but while testing RegreSQL as
correctness/performance harness on PostgreSQL it picked up a problem with
the wrong-results case during eager aggregation.


It reproduces on current HEAD
(commit 2670cc298f42cd7b1c426bf7ccfb0652d8e0b347 now)
with enable_eager_aggregate enabled.

My testing environment
  - Linux aarch64, gcc 12 (Debian)
  - macOS arm64, Apple clang 21
    (PostgreSQL 19devel on aarch64-apple-darwin25.5.0)

== How to reproduce

  CREATE TEMP TABLE c(id int, country text);
  CREATE TEMP TABLE o(customer_id int);
  INSERT INTO c VALUES (1,'US'),(2,'US'),(3,'DE'),(4,'DE'),(5,'DE');
  INSERT INTO o VALUES (1),(3);   -- only customers 1 and 3 have a row in o

  SELECT c.country, count(*) AS n
  FROM c
  WHERE NOT EXISTS (SELECT 1 FROM o WHERE o.customer_id = c.id)
  GROUP BY c.country
  ORDER BY c.country;

Expected results (everywhere except master)

 country | n
---------+---
 DE      | 2
 US      | 1
(2 rows)

The actual result with enable_eager_aggregate = on (default)

 country | n
---------+---
 DE      | 0
 US      | 0
(2 rows)

With SET enable_eager_aggregate = off, the result is correct (DE=2, US=1),
as it is on PG18.

Query Plan

                                                            QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=108.19..108.69 rows=200 width=40) (actual time=0.195..0.197
rows=2.00 loops=1)
   Sort Key: c.country
   Sort Method: quicksort  Memory: 25kB
   Buffers: local hit=2
   ->  Finalize HashAggregate  (cost=98.55..100.55 rows=200 width=40)
(actual time=0.183..0.186 rows=2.00 loops=1)
         Group Key: c.country
         Batches: 1  Memory Usage: 32kB
         Buffers: local hit=2
         ->  Hash Anti Join  (cost=52.75..95.37 rows=635 width=40) (actual
time=0.177..0.179 rows=3.00 loops=1)
               Hash Cond: (c.id = o.customer_id)
               Buffers: local hit=2
               ->  Seq Scan on c  (cost=0.00..22.70 rows=1270 width=36)
(actual time=0.024..0.025 rows=5.00 loops=1)
                     Buffers: local hit=1
               ->  Hash  (cost=50.25..50.25 rows=200 width=12) (actual
time=0.145..0.146 rows=2.00 loops=1)
                     Buckets: 1024  Batches: 1  Memory Usage: 9kB
                     Buffers: local hit=1
                     ->  Partial HashAggregate  (cost=48.25..50.25 rows=200
width=12) (actual time=0.122..0.123 rows=2.00 loops=1)
                           Group Key: o.customer_id
                           Batches: 1  Memory Usage: 32kB
                           Buffers: local hit=1
                           ->  Seq Scan on o  (cost=0.00..35.50 rows=2550
width=4) (actual time=0.002..0.003 rows=2.00 loops=1)
                                 Buffers: local hit=1
 Planning Time: 0.294 ms
 Execution Time: 0.255 ms
(24 rows)

If this is already known or in progress, apologies for the noise.

---

Radim

On Fri, 29 May 2026 at 17:25, Richard Guo <[email protected]> wrote:

> Hi All,
>
> Eager aggregation is a query optimization technique that partially
> pushes a group-by past a join, and finalizes it once all the relations
> are joined.  Eager aggregation reduces the number of input rows to the
> join and thus may result in a better overall plan.  This technique is
> thoroughly described in the 'Eager Aggregation and Lazy Aggregation'
> paper [1].
>
> Back in 2017, a patch set has been proposed by Antonin Houska to
> implement eager aggregation in thread [2].  However, it was at last
> withdrawn after entering the pattern of "please rebase thx" followed by
> rebasing and getting no feedback until "please rebase again thx".  A
> second attempt in 2022 unfortunately fell into the same pattern about
> one year ago and was eventually closed again [3].
>
> That patch set has included most of the necessary concepts to implement
> eager aggregation.  However, as far as I can see, it has several weak
> points that we need to address.  It introduces invasive changes to some
> core planner functions, such as make_join_rel().  And with such changes
> join_is_legal() would be performed three times for the same proposed
> join, which is not great.  Another weak point is that the complexity of
> join searching dramatically increases with the growing number of
> relations to be joined.  This occurs because when we generate partially
> aggregated paths, each path of the input relation is considered as an
> input path for the grouped paths.  As a result, the number of grouped
> paths we generate increases exponentially, leading to a significant
> explosion in computational complexity.  Other weak points include the
> lack of support for outer joins and partitionwise joins.  And during my
> review of the code, I came across several bugs (planning error or crash)
> that need to be addressed.
>
> I'd like to give it another take to implement eager aggregation, while
> borrowing lots of concepts and many chunks of codes from the previous
> patch set.  Please see attached.  I have chosen to use the term 'Eager
> Aggregation' from the paper [1] instead of 'Aggregation push-down', to
> differentiate the aggregation push-down technique in FDW.
>
> The patch has been split into small pieces to make the review easier.
>
> 0001 introduces the RelInfoList structure, which encapsulates both a
> list and a hash table, so that we can leverage the hash table for faster
> lookups not only for join relations but also for upper relations.  With
> eager aggregation, it is possible that we generate so many upper rels of
> type UPPERREL_PARTIAL_GROUP_AGG that a hash table can help a lot with
> lookups.
>
> 0002 introduces the RelAggInfo structure to store information needed to
> create grouped paths for base and join rels.  It also revises the
> RelInfoList related structures and functions so that they can be used
> with RelAggInfos.
>
> 0003 checks if eager aggregation is applicable, and if so, collects
> suitable aggregate expressions and grouping expressions in the query,
> and records them in root->agg_clause_list and root->group_expr_list
> respectively.
>
> 0004 implements the functions that check if eager aggregation is
> applicable for a given relation, and if so, create RelAggInfo structure
> for the relation, using the infos about aggregate expressions and
> grouping expressions we collected earlier.  In this patch, when we check
> if a target expression can act as grouping expression, we need to check
> if this expression can be known equal to other expressions due to ECs
> that can act as grouping expressions.  This patch leverages function
> exprs_known_equal() to achieve that, after enhancing this function to
> consider opfamily if provided.
>
> 0005 implements the functions that generate paths for grouped relations
> by adding sorted and hashed partial aggregation paths on top of paths of
> the plain base or join relations.  For sorted partial aggregation paths,
> we only consider any suitably-sorted input paths as well as sorting the
> cheapest-total path.  For hashed partial aggregation paths, we only
> consider the cheapest-total path as input.  By not considering other
> paths we can reduce the number of grouping paths as much as possible
> while still achieving reasonable results.
>
> 0006 builds grouped relations for each base relation if possible, and
> generates aggregation paths for the grouped base relations.
>
> 0007 builds grouped relations for each just-processed join relation if
> possible, and generates aggregation paths for the grouped join
> relations.  The changes made to make_join_rel() are relatively minor,
> with the addition of a new function make_grouped_join_rel(), which finds
> or creates a grouped relation for the just-processed joinrel, and
> generates grouped paths by joining a grouped input relation with a
> non-grouped input relation.
>
> The other way to generate grouped paths is by adding sorted and hashed
> partial aggregation paths on top of paths of the joinrel.  This occurs
> in standard_join_search(), after we've run set_cheapest() for the
> joinrel.  The reason for performing this step after set_cheapest() is
> that we need to know the joinrel's cheapest paths (see 0005).
>
> This patch also makes the grouped relation for the topmost join rel act
> as the upper rel representing the result of partial aggregation, so that
> we can add the final aggregation on top of that.  Additionally, this
> patch extends the functionality of eager aggregation to work with
> partitionwise join and geqo.
>
> This patch also makes eager aggregation work with outer joins.  With
> outer join, the aggregate cannot be pushed down if any column referenced
> by grouping expressions or aggregate functions is nullable by an outer
> join above the relation to which we want to apply the partiall
> aggregation.  Thanks to Tom's outer-join-aware-Var infrastructure, we
> can easily identify such situations and subsequently refrain from
> pushing down the aggregates.
>
> Starting from this patch, you should be able to see plans with eager
> aggregation.
>
> 0008 adds test cases for eager aggregation.
>
> 0009 adds a section in README that describes this feature (copied from
> previous patch set, with minor tweaks).
>
> Thoughts and comments are welcome.
>
> [1] https://www.vldb.org/conf/1995/P345.PDF
> [2] https://www.postgresql.org/message-id/flat/9666.1491295317%40localhost
> [3]
> https://www.postgresql.org/message-id/flat/OS3PR01MB66609589B896FBDE190209F495EE9%40OS3PR01MB6660.jpnprd01.prod.outlook.com
>
> Thanks
> Richard
>

Re: Eager aggregation, take 3

Reply via email to