Hi jian,
Thanks for the refactor. I went through it change by change -- the
per-file notes are at the bottom.
A note up front, so the notes don't read as dismissive: at this stage I'd
rather prioritize validated code over better code. The branch has a lot of
coverage and Valgrind behind it, and every change -- even a clean,
equivalent one -- moves a line out from under that. So where I say "drop",
it means "not now", not "not good"; most of these would be welcome as a
follow-up. One piece, though, does need a fix before any of it lands.
That piece is the new guard in computeAbsorbability(): it carries a
behavior change that no test covers.
> Move isAbsorbable initialization from finalizeRPRPattern() into
> makeRPRPattern(), and guard the computeAbsorbabilityRecursive() call in
> computeAbsorbability() so it is only entered when the first element is
> an ALT or has an unbounded quantifier.
First, what "absorbable" means here: the first unbounded repetition of a
fixed-length body reachable from the start of the pattern. This rule is
worth writing down -- in README.rpr and in the comment on
computeAbsorbability() (or wherever fits best) -- so it is stated once,
explicitly (patch attached: nocfbot-absorb-definition-doc).
By that definition the leading A+ in (A+ B){2,3} is absorbable, but the
patch drops it.
The offending check is in computeAbsorbability():
if (RPRElemIsAlt(&pattern->elements[0]) ||
pattern->elements[0].max == RPR_QUANTITY_INF)
computeAbsorbabilityRecursive(pattern, 0, &hasAbsorbable);
It looks only at elements[0], so a bounded outer group's BEGIN (finite max)
fails the check and the recursion that would descend to A+ never runs.
I'm attaching a small test for this (nocfbot-absorb-bounded-outer-test): it
passes on the branch and fails with v50-0001 applied, pinning both the a+"
marker and the absorbed-context count.
For the rest, change by change -- "ok" = take as-is, "drop" = leave out:
src/backend/optimizer/plan/rpr.c
The branch is already validated under coverage and Valgrind, so I'd rather
not re-open verified logic for equivalent refactors -- glad to take those
as a follow-up. In file order:
ok palloc_array / palloc0_array in makeRPRPattern()
result->varNames = palloc_array(char *, numVars);
result->elements = palloc0_array(RPRPatternElement, numElements);
ok move the isAbsorbable = false init into makeRPRPattern()
drop rename beginElem -> elem in fillRPRPatternGroup()
only beginElem was renamed; with endElem kept, the begin/end pair is
no longer symmetric (elem / endElem)
drop rename isFixedLengthChildren -> isFixedQuantifier -- keep the old
name
"fixed-length" is the term everywhere else (README.rpr, the nearby
comments, even this function's own body), and it drops the
"children"
that flags a whole-subtree check
drop the jump-1 traversal rewrite in that function -- equivalent
refactor,
defer to a follow-up
drop rename to start_with_unbounded_quantifier -- keep isUnboundedStart
Lone snake_case among camelCase neighbors -- including
isFixedQuantifier,
renamed in this same patch -- and it reads as a predicate though it
sets flags.
drop start_with_*'s new signature + if/else restructure -- equivalent
refactor; keep the original isUnboundedStart body
drop rename branchFirst -> branchElement -- cosmetic churn, no behavior
change
drop the new computeAbsorbability() guard -- the regression above
if (RPRElemIsAlt(&pattern->elements[0]) ||
pattern->elements[0].max == RPR_QUANTITY_INF)
src/include/nodes/plannodes.h
ok drop the redundant /* T_RPRPattern */ on the type field
NodeTag type;
drop reworded "see isUnboundedStart" block comment -- keep the original
(it still matches now that isUnboundedStart stays)
ok drop the inline isAbsorbable comment -- the block comment above
covers it
bool isAbsorbable;
src/backend/executor/README.rpr
ok spell out the skip mode -- a clear doc fix on its own
(1) SKIP PAST LAST ROW (not SKIP TO NEXT ROW)
drop isUnboundedStart -> start_with_unbounded_quantifier -- follows the
rename
Tatsuo, what's your read on all this -- in particular, hold the equivalent
refactors until after commit, or take them now?
Best regards,
Henson
diff --git a/src/test/regress/expected/rpr_explain.out
b/src/test/regress/expected/rpr_explain.out
index d8343c9accc..11c51d6f764 100644
--- a/src/test/regress/expected/rpr_explain.out
+++ b/src/test/regress/expected/rpr_explain.out
@@ -906,6 +906,50 @@ WINDOW w AS (
-> Function Scan on generate_series s (actual rows=50.00 loops=1)
(10 rows)
+-- Context absorption with a bounded outer group whose body starts with A+
+-- The outer count {2,3} is bounded, but A+ is still the first unbounded
portion
+-- of the pattern, so it remains the absorption comparison point (a+ keeps the
"
+-- marker and contexts are absorbed). A bounded outer group must not, by
itself,
+-- disable absorption of the leading unbounded child.
+CREATE VIEW rpr_ev_ctx_absorb_bounded_outer AS
+SELECT count(*) OVER w
+FROM generate_series(1, 50) AS s(v)
+WINDOW w AS (
+ ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
+ AFTER MATCH SKIP PAST LAST ROW
+ PATTERN ((A+ B){2,3})
+ DEFINE A AS v % 5 <> 0, B AS v % 5 = 0
+);
+SELECT line FROM
unnest(string_to_array(pg_get_viewdef('rpr_ev_ctx_absorb_bounded_outer'),
E'\n')) AS line WHERE line ~ 'PATTERN';
+ line
+--------------------------
+ PATTERN ((a+ b){2,3})
+(1 row)
+
+SELECT rpr_explain_filter('
+EXPLAIN (ANALYZE, BUFFERS OFF, COSTS OFF, TIMING OFF, SUMMARY OFF)
+SELECT count(*) OVER w
+FROM generate_series(1, 50) AS s(v)
+WINDOW w AS (
+ ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
+ AFTER MATCH SKIP PAST LAST ROW
+ PATTERN ((A+ B){2,3})
+ DEFINE A AS v % 5 <> 0, B AS v % 5 = 0
+);');
+ rpr_explain_filter
+----------------------------------------------------------------------
+ WindowAgg (actual rows=50.00 loops=1)
+ Window: w AS (ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
+ Pattern: (a+" b){2,3}
+ Nav Mark Lookback: 0
+ Storage: Memory Maximum Storage: NkB
+ NFA States: 6 peak, 118 total, 0 merged
+ NFA Contexts: 3 peak, 51 total, 4 pruned
+ NFA: 3 matched (len 15/15/15.0), 1 mismatched (len 5/5/5.0)
+ NFA: 30 absorbed (len 1/1/1.0), 12 skipped (len 1/5/3.0)
+ -> Function Scan on generate_series s (actual rows=50.00 loops=1)
+(10 rows)
+
-- Contexts skipped by SKIP PAST LAST ROW
CREATE VIEW rpr_ev_ctx_skip AS
SELECT count(*) OVER w
diff --git a/src/test/regress/sql/rpr_explain.sql
b/src/test/regress/sql/rpr_explain.sql
index 3cd94af6bb8..5a9c83e7886 100644
--- a/src/test/regress/sql/rpr_explain.sql
+++ b/src/test/regress/sql/rpr_explain.sql
@@ -570,6 +570,32 @@ WINDOW w AS (
DEFINE A AS v % 5 <> 0, B AS v % 5 = 0
);');
+-- Context absorption with a bounded outer group whose body starts with A+
+-- The outer count {2,3} is bounded, but A+ is still the first unbounded
portion
+-- of the pattern, so it remains the absorption comparison point (a+ keeps the
"
+-- marker and contexts are absorbed). A bounded outer group must not, by
itself,
+-- disable absorption of the leading unbounded child.
+CREATE VIEW rpr_ev_ctx_absorb_bounded_outer AS
+SELECT count(*) OVER w
+FROM generate_series(1, 50) AS s(v)
+WINDOW w AS (
+ ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
+ AFTER MATCH SKIP PAST LAST ROW
+ PATTERN ((A+ B){2,3})
+ DEFINE A AS v % 5 <> 0, B AS v % 5 = 0
+);
+SELECT line FROM
unnest(string_to_array(pg_get_viewdef('rpr_ev_ctx_absorb_bounded_outer'),
E'\n')) AS line WHERE line ~ 'PATTERN';
+SELECT rpr_explain_filter('
+EXPLAIN (ANALYZE, BUFFERS OFF, COSTS OFF, TIMING OFF, SUMMARY OFF)
+SELECT count(*) OVER w
+FROM generate_series(1, 50) AS s(v)
+WINDOW w AS (
+ ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
+ AFTER MATCH SKIP PAST LAST ROW
+ PATTERN ((A+ B){2,3})
+ DEFINE A AS v % 5 <> 0, B AS v % 5 = 0
+);');
+
-- Contexts skipped by SKIP PAST LAST ROW
CREATE VIEW rpr_ev_ctx_skip AS
SELECT count(*) OVER w
diff --git a/src/backend/executor/README.rpr b/src/backend/executor/README.rpr
index 9275e265d4b..f5e90848458 100644
--- a/src/backend/executor/README.rpr
+++ b/src/backend/executor/README.rpr
@@ -407,6 +407,10 @@ IV-5. Absorbability Analysis (RPR_ELEM_ABSORBABLE)
Context absorption is an optimization technique that reduces O(n^2) to O(n).
(Runtime behavior is described in Chapter VIII.)
+Definition: a pattern is absorbable when the first unbounded repetition of a
+fixed-length body reachable from the start of the pattern exists; that
+repetition is the absorption comparison point.
+
This phase determines whether the pattern has a structure suitable for the
absorption optimization and sets flags on the relevant elements:
diff --git a/src/backend/optimizer/plan/rpr.c b/src/backend/optimizer/plan/rpr.c
index fac3d7bcf38..f2801ffcba3 100644
--- a/src/backend/optimizer/plan/rpr.c
+++ b/src/backend/optimizer/plan/rpr.c
@@ -1742,6 +1742,10 @@ computeAbsorbabilityRecursive(RPRPattern *pattern,
RPRElemIdx startIdx,
* newer contexts that cannot produce longer matches than older contexts.
* This achieves O(n^2) -> O(n) performance improvement.
*
+ * Definition: a pattern is absorbable when the first unbounded repetition of
+ * a fixed-length body reachable from the start of the pattern exists; that
+ * repetition is the absorption comparison point.
+ *
* Only greedy unbounded quantifiers at pattern start can be absorbable.
* Reluctant quantifiers are excluded because they don't maintain monotonic
* decrease property required for safe absorption.