Hi Hackers

I’d like to start a patch series to add support for DISTINCT in plain
aggregate window functions.

PostgreSQL currently rejects cases such as:

---------------------------------------------------------------------------------------------------------

count(DISTINCT x) OVER (PARTITION BY p)

sum(DISTINCT x)   OVER ()

---------------------------------------------------------------------------------------------------------

My plan is to implement this incrementally, by frame class and by feature
dimension, rather than trying to solve every case in a single patch.

For the first step, I’m posting patches 1-2 only and would appreciate your
review on those.

Patch 1 is intentionally very small:


   - add parse/deparse plumbing for DISTINCT in plain aggregate window
   functions
   - carry the information through WindowFunc
   - preserve it in ruleutils / deparse
   - but still reject execution

Patch 1 by itself does not add user-visible execution support, so I think
it is best reviewed together with patch 2.

Patch 2 adds the first real executor support:


   - plain aggregate window functions only
   - single-argument DISTINCT only
   - whole-partition frames only

That means support for cases where the frame is effectively the entire
partition, for example:

---------------------------------------------------------------------------------------------------------

count(DISTINCT x) OVER (PARTITION BY p)
sum(DISTINCT x)   OVER ()
avg(DISTINCT x)   OVER (
    PARTITION BY p
    ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
)

---------------------------------------------------------------------------------------------------------

The executor approach in patch 2 is deliberately conservative:


   - collect the partition’s aggregate inputs
   - sort and deduplicate them
   - feed the distinct values into the aggregate transition function
   - finalize once
   - reuse the cached result for all rows in the partition

This avoids the much harder moving-frame cases for now.

My proposed overall roadmap is below:

Patch 1


   - parse/deparse plumbing only
   - allow DISTINCT to be represented on plain aggregate window functions
   - preserve it through deparse / view definition
   - still reject execution

Patch 2


   - executor support for whole-partition frames
   - plain aggregate window functions only
   - single-argument DISTINCT only
   - sort-and-dedup implementation

Patch 3


   - executor support for non-shrinking frames
   - frames starting at UNBOUNDED PRECEDING with no EXCLUDE
   - incremental hash-based seen-set
   - covers default ORDER BY frame and supported ... CURRENT ROW / ...
   FOLLOWING cases

Patch 4


   - executor support for sliding ROWS frames
   - refcounted DISTINCT state
   - add/remove distinct contributions as rows enter and leave the frame
   - fallback to restart/recompute for aggregates without inverse
   transition support

Patch 5


   - extend the sliding DISTINCT machinery to sliding RANGE and GROUPS
   - keep the same refcounted model
   - no EXCLUDE yet

Patch 6


   - support EXCLUDE clauses
   - likely correctness-first, with restart/recompute where incremental
   maintenance is too awkward

Patch 7


   - support multi-argument DISTINCT
   - upgrade DISTINCT keys from single datum to tuple/composite key
   representation

Patch 8


   - support aggregate ORDER BY inside window aggregates
   - left until last because it is orthogonal to frame-shape support and
   substantially complicates both parse representation and executor behavior

In short, the roadmap is:


   1. plumbing
   2. whole-partition
   3. non-shrinking
   4. sliding ROWS
   5. sliding RANGE / GROUPS
   6. EXCLUDE
   7. multi-arg DISTINCT
   8. aggregate ORDER BY

For this posting, I’d especially appreciate feedback on:


   - whether patch 1 + patch 2 is a reasonable first split
   - whether whole-partition-only executor support is a good first
   executable step
   - whether the proposed long-term breakdown seems sensible

Thanks in advance for any review or comments.

Best regards,

Haibo Yan

Attachment: v1-0001-Support-parsing-and-deparsing-DISTINCT-in-window-.patch
Description: Binary data

Attachment: v1-0002-Implement-executor-support-for-DISTINCT-in-whole-.patch
Description: Binary data

Reply via email to