Re: [PATCH] DISTINCT in plain aggregate window functions

Haibo Yan Sun, 28 Jun 2026 10:59:02 -0700

Hi Henson,

Thank you for the careful review. This is very helpful.

I updated the first three patches to expand the regression coverage, and
also updated patch 3 to make the current memory limitation of the
grow-only hash path explicit in both the commit message and code
comments.

On Thu, Jun 25, 2026 at 6:03 PM Henson Choi <[email protected]> wrote:
>
> Hi Haibo,
>
> This is an initial, high-level pass over v2 (patches 1-3), not a detailed
> code review -- I'll follow up with more as time permits. For this round I
> wanted to confirm the shape of the feature and the cases it rejects, flag
> where its behavior diverges from non-window DISTINCT, note test coverage,
> and check one interaction. A short list of suggestions is at the end.
>
> Overall the approach looks reasonable, and for the shapes I tried the
> per-row values matched equivalent non-window formulations (correlated /
> GROUP BY rewrites, and Oracle for the whole-partition forms). The points
> below are mostly about matching our own non-window DISTINCT semantics, not
> about the patch being incorrect.
>
> 1. Added surface
> ----------------
> The series makes agg(DISTINCT x) OVER (...) accepted for plain aggregate
> window functions (count/sum/avg/...), which PostgreSQL currently rejects.
> DISTINCT is carried through parse and deparse so it survives view
> definitions, and execution is enabled for a defined subset of frames.
>
> 2. Behavior by frame shape
> --------------------------
> Execution uses one of two strategies depending on the frame:
>
>   - Whole-partition frames (the frame is effectively the entire
>     partition): the distinct result is the same for every row, so it is
>     computed once and reused.
>
>   - Running / non-shrinking (grow-only) frames (start at UNBOUNDED
>     PRECEDING, no EXCLUDE): the distinct set only grows as rows enter, so it
>     is maintained incrementally per row.
>
> 3. Cases that are (intentionally) rejected
> ------------------------------------------
>   - DISTINCT on an aggregate that takes more than one argument (roadmap 7)
>   - frames that don't start at UNBOUNDED PRECEDING, or that use EXCLUDE
>     (roadmap 4-6)
>   - in the running case, an argument type that is not hashable
>
> 4. Behavior-level observations
> ------------------------------
> Compared with GROUP BY agg(DISTINCT x), three things differ between the two
> frame strategies:
>
>   - Type support: a type that is sortable but not hashable is accepted in
>     the whole-partition case yet rejected in the running case, whereas
>     GROUP BY DISTINCT accepts it (e.g. count(DISTINCT x::money)).
>

That makes sense as a longer-term direction.
For this series, though, I would prefer to keep the current split. My
goal in patches 2 and 3 is to add the basic executor support in small,
reviewable steps: first the whole-partition sort case, then the simpler
grow-only incremental case, before moving on to true sliding frames.
For the current patch 3 implementation, the grow-only path assumes
hashable input types and errors out otherwise. That is an executor-level
decision for this initial step. A planner-driven choice between sort and
hash paths remains an independent direction that can be explored later.

>   - Value order: for order-sensitive aggregates (e.g. array_agg(DISTINCT
>     ...)), the order in which distinct values are aggregated differs
>     between the two frame strategies, so the same expression can produce
>     differently-ordered results.
>

I agree this is worth calling out more explicitly.
For order-insensitive aggregates such as count/sum/min/max, this is not
observable. For order-sensitive aggregates such as array_agg or
string_agg, the result can differ between strategies if no aggregate-
local ORDER BY is present. I do not consider this a correctness issue,
because the SQL standard does not prescribe an ordering for that form.
However, it is a visible plan-dependent behavior difference and should
be documented as such.
I also agree that regression coverage should not rely on unspecified
ordering. In the updated tests I made the order-sensitive grow-only case
deterministic by driving the frame with a unique ORDER BY key, so the
expected result does not depend on flaky input ordering.

>   - Memory: the running (hash) path is the only one of these that holds
>     its distinct set in an unbounded in-memory structure; everything else,
>     including the patch's own whole-partition path, is work_mem-bounded and
>     spills to disk:
>
>         whole-partition (this patch)    sort   work_mem-bounded, spills
>         running (this patch)            hash   unbounded, no spill
>         GROUP BY agg(DISTINCT x)        sort   work_mem-bounded, spills
>         SELECT DISTINCT, HashAgg        hash   work_mem-bounded, spills
>         SELECT DISTINCT, Sort+Unique    sort   work_mem-bounded, spills
>
>     So a large, high-cardinality partition in the running case can grow the
>     hash table without bound -- whereas every other path here, including the
>     hash-based HashAgg, is work_mem-bounded and spills.
>

I agree this is a real limitation of the current patch 3 approach, not
the intended end state.
I updated patch 3 to make this more explicit both in the commit message
and in the code comments. In particular, I added a comment in
initialize_windowaggregate() just above the hash-table setup. The
current grow-only hash path is not bounded by work_mem and retains every
distinct value seen in the partition. I agree that bounded-memory
behavior for this path would need further design, rather than being
treated as an acceptable final trade-off.

> 5. Test coverage (measured)
> ---------------------------
> I ran line coverage on the changed executor code. The patch's added and
> changed lines are about 77% covered (219/284); the file as a whole is
> around 91%. The uncovered ~23% of the new lines is not spread out -- it
> concentrates in a few behaviors the functional tests never exercise,
> because those tests center on count/sum/avg over integer arguments:
>
>   - strict aggregates (max/min DISTINCT), i.e. the strict/NULL-seed path
>   - pass-by-reference argument and transition types (text/numeric)
>   - (plus a couple of effectively unreachable guards)
>
> A few more gaps don't show up as uncovered lines: RANGE frames with peer
> rows (tied ORDER BY keys), GROUPS mode, and order-sensitive aggregates
> (per the value-order point above).
>
> 6. Interaction with row pattern recognition (RPR)
> -------------------------------------------------
> I applied this series on top of the RPR patch set and built the two
> together. They combine cleanly -- no textual (rebase) conflict, and the
> regression tests pass without any change to code or tests.
>
> That they coexist is structural, not luck: RPR requires the frame to start
> at CURRENT ROW, while this feature requires UNBOUNDED PRECEDING, so a single
> window can satisfy at most one of them -- a combined query is always
> rejected by one side or the other. As long as these two frame requirements
> stay mutually exclusive, I would expect the two to merge without trouble.
>
> One asymmetry worth noting: the RPR side's CURRENT-ROW requirement follows
> from the standard's definition of row pattern recognition, whereas this
> feature's UNBOUNDED-PRECEDING requirement is one the roadmap plans to relax
> (the sliding-frame patches). So the open question is whether that
> relaxation will ever reach frames that start at CURRENT ROW -- the exact
> shape RPR requires, and the point at which the two features would meet. Is
> CURRENT-ROW-start support part of the plan?

Thanks for checking this interaction. I am glad to hear the current
series composes cleanly with the RPR patches.
I have not yet studied whether the roadmap would eventually reach
CURRENT ROW-start frames. If it does, the RPR interaction will need
careful re-evaluation.

>
> 7. Suggestions
> --------------
>   - Type support: the planner should decide sort vs hash from the argument
>     type, and the executor should just carry that out -- a non-hashable type
>     would then route to sort instead of being rejected at startup.

On the type-support and memory points above, I agree those are the main
longer-term issues for this approach, and I have tried to clarify the
present  behavior and limitations in the updated patches.

>   - Memory: the running hash seen-set grows without bound. It should spill
>     at work_mem the way HashAgg already does for grouping; hash can stay the
>     default for the common hashable case.
>   - Tests: fill the coverage gaps in section 5 -- strict aggregates,
>     pass-by-reference types, RANGE peers, GROUPS mode. I can provide cases
>     with cross-checked expected values.

Agreed, and I have already expanded the first three patches in that
direction in the updated version.
Patch 1 itself is unchanged here, since its existing parse/deparse and
executor-side rejection coverage was already adequate. In patches 2 and
3, I added pass-by-reference text cases, a strict-aggregate case for
the whole-partition path, and a more representative grow-only case
using array_agg(DISTINCT …) over a deterministic running frame.
The remaining frame-shape coverage, such as RANGE peers and GROUPS mode,
will come naturally with the later patches that introduce those frame
types, rather than being bolted onto these first three patches.

>   - Documentation: there are no SGML changes for this new form. The
>     window-function-call synopsis doesn't show the DISTINCT form, and no
>     prose mentions it.

Agreed. I have not added the SGML changes yet because I wanted to get
review on the execution model first, and this initial sub-series is
still focused on that part. But this definitely needs to be covered
before the full series is complete.

>   - Roadmap ordering (just a thought): a grow-only frame is the add-only
>     special case of the sliding machinery in patches 4-6, and the hash path
>     is where the section-4 divergences live -- so patch 3 reads more like an
>     optimization than a foundation.

I understand the point. From an implementation perspective, though, I
still think it is useful to keep the grow-only case separate.
Although it can be viewed as an add-only special case of the later
sliding machinery, it also lets the incremental DISTINCT state be
reviewed on its own before introducing row exit, refcounted removal,
and the other moving-frame complications in the same patch. So I see it
as both a reviewable intermediate step and a natural strategy for
grow-only frames.

>
> Regards,
> Henson

Thanks again for the review. Please see the updated patches.

Regards,
Haibo

v3-0001-Parse-and-deparse-DISTINCT-in-window-aggregate-ca.patch
Description: Binary data

v3-0002-Support-DISTINCT-in-whole-partition-window-aggreg.patch
Description: Binary data

v3-0003-Extend-DISTINCT-window-aggregates-to-grow-only-fr.patch
Description: Binary data

Re: [PATCH] DISTINCT in plain aggregate window functions

Reply via email to