Hi Henson, Thank you for the review! I will take care the issues (it will take sometime).
Best regards, -- Tatsuo Ishii SRA OSS K.K. English: http://www.sraoss.co.jp/index_en/ Japanese:http://www.sraoss.co.jp > SQL/RPR Patch Review Report > ============================ > > Patch: Row Pattern Recognition (SQL:2016) > Commitfest: https://commitfest.postgresql.org/patch/4460 > > Review Methodology: > This review focused on quality assessment, not line-by-line code audit. > Key code paths and quality issues were examined with surrounding context > when concerns arose. Documentation files were reviewed with AI-assisted > grammar and typo checking. Code coverage was measured using gcov and > custom analysis tools. > > Limitations: > - Not a security audit or formal verification > - Parser and executor internals reviewed at module level, not exhaustively > - Performance characteristics not benchmarked > > > TABLE OF CONTENTS > ----------------- > > 1. Executive Summary > 2. Issues Found > 2.1 Critical / Major > 2.2 Minor Issues (Review Needed) > 2.3 Minor Issues (Style) > 2.4 Suggestions for Discussion > 3. Test Coverage > 3.1 Covered Areas > 3.2 Untested Items > 3.3 Unimplemented Features (No Test Needed) > 4. Code Coverage Analysis > 4.1 Overall Coverage > 4.2 Coverage by File > 4.3 Uncovered Code Risk Assessment > 5. Commit Analysis > 6. Recommendations > 7. Conclusion > > > 1. EXECUTIVE SUMMARY > -------------------- > > Overall assessment: GOOD > > The SQL/RPR patch demonstrates solid implementation quality within its > defined scope. Code follows PostgreSQL coding standards (with minor style > issues), test coverage is comprehensive at 96.4%, and documentation is > thorough with only minor typos. > > Rating by Area: > - Code Quality: Good (PostgreSQL style compliant, 26 static style fixes > recommended) > - Test Coverage: Good (96.4% line coverage, 28 test cases) > - Documentation: Good (Complete, 1 minor typo) > - Build/Regress: Pass (make check-world, 753 tests passed) > > > 2. ISSUES FOUND > --------------- > > 2.1 Critical / Major > > #1 [Code] Greedy pattern combinatorial explosion > Pattern: PATTERN (A+ B+ C+) with DEFINE A AS TRUE, B AS TRUE, C AS TRUE > Impact: Memory exhaustion or infinite wait > Recommendation: Add work_mem-based memory limit (error on exceed) > > Evidence - No memory limit in current code: > - nodeWindowAgg.c:5718-5722 string_set_init(): palloc() unconditional > - nodeWindowAgg.c:5741-5750 string_set_add(): set_size *= 2; repalloc() > unlimited > - nodeWindowAgg.c:5095-5174 generate_patterns(): Triple loop, no limit > > Only work_mem usage in RPR (nodeWindowAgg.c:1297): > winstate->buffer = tuplestore_begin_heap(false, false, work_mem); > -> For tuple buffer, unrelated to pattern combination memory (StringSet) > > 2.2 Minor Issues (Review Needed) > > #1 [Code] nodeWindowAgg.c:5849,5909,5912 > pos > NUM_ALPHABETS check - intent unclear, causes error at 28 PATTERN > elements > > Reproduction: > - PATTERN (A B C ... Z A) (27 elements) -> OK > - PATTERN (A B C ... Z A B) (28 elements) -> ERROR "initial is not valid > char: b" > > 2.3 Minor Issues (Style) > > #1 [Code] nodeWindowAgg.c (25 blocks) > #ifdef RPR_DEBUG -> recommend elog(DEBUG2, ...) or remove > > #2 [Code] src/backend/executor/nodeWindowAgg.c > static keyword on separate line (26 functions) > > #3 [Code] src/backend/utils/adt/ruleutils.c > Whitespace typo: "regexp !=NULL" -> "regexp != NULL" > > #4 [Code] nodeWindowAgg.c:4619 > Error message case: "Unrecognized" -> "unrecognized" (PostgreSQL style) > > #5 [Doc] doc/src/sgml/ref/select.sgml:1128 > Typo: "maximu" -> "maximum" > > 2.4 Suggestions for Discussion > > #1 Incremental matching with streaming NFA redesign > Benefits: > - O(n) time complexity per row (current: exponential in worst case) > - Bounded memory via state merging and context absorption > - Natural extension for OR patterns, {n,m} quantifiers, nested groups > - Enables MEASURES clause with incremental aggregates > - Proven approach in CEP engines (Flink, Esper) > > > 3. TEST COVERAGE > ---------------- > > 3.1 Covered Areas > > - PATTERN clause: +, *, ? quantifiers (line 41, 71, 86) > - DEFINE clause: Variable conditions, auto-TRUE for missing (line 120-133) > - PREV/NEXT functions: Single argument (line 44, 173) > - AFTER MATCH SKIP: TO NEXT ROW (line 182), PAST LAST ROW (line 198) > - Aggregate integration: sum, avg, count, max, min (line 258-277) > - Window function integration: first_value, last_value, nth_value (line > 34-35) > - JOIN/CTE: JOIN (line 313), WITH (line 324) > - View: VIEW creation, pg_get_viewdef (line 390-406) > - Error cases: 7 error scenarios (line 409-532) > - Large partition: 5000 row smoke test (line 360-387) > - ROWS BETWEEN offset: CURRENT ROW AND 2 FOLLOWING (line 244) > > 3.2 Untested Items > > Feature Priority Recommendation > ------------------------------------------------------------------------------- > 26 variable limit Medium Test 26 variables success, 27th > variable error > NULL value handling Low Test PREV(col) where col or previous > row is NULL > > Sample test for 26 variable limit: > > -- Should fail with 27th variable (parser error, no table needed) > SELECT * FROM (SELECT 1 AS x) t > WINDOW w AS ( > ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING > PATTERN (a b c d e f g h i j k l m n o p q r s t u v w x y z aa) > DEFINE a AS TRUE, b AS TRUE, c AS TRUE, d AS TRUE, e AS TRUE, f AS > TRUE, > g AS TRUE, h AS TRUE, i AS TRUE, j AS TRUE, k AS TRUE, l AS > TRUE, > m AS TRUE, n AS TRUE, o AS TRUE, p AS TRUE, q AS TRUE, r AS > TRUE, > s AS TRUE, t AS TRUE, u AS TRUE, v AS TRUE, w AS TRUE, x AS > TRUE, > y AS TRUE, z AS TRUE, aa AS TRUE > ); > -- ERROR: number of row pattern definition variable names exceeds 26 > > Sample test for NULL handling: > > CREATE TEMP TABLE stock_null (company TEXT, tdate DATE, price INTEGER); > INSERT INTO stock_null VALUES ('c1', '2023-07-01', 100); > INSERT INTO stock_null VALUES ('c1', '2023-07-02', NULL); -- NULL in > middle > INSERT INTO stock_null VALUES ('c1', '2023-07-03', 200); > INSERT INTO stock_null VALUES ('c1', '2023-07-04', 150); > > SELECT company, tdate, price, count(*) OVER w AS match_count > FROM stock_null > WINDOW w AS ( > PARTITION BY company > ORDER BY tdate > ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING > PATTERN (START UP DOWN) > DEFINE START AS TRUE, UP AS price > PREV(price), DOWN AS price < > PREV(price) > ); > -- Result: all rows show match_count = 0 (NULL breaks pattern matching) > > 3.3 Unimplemented Features (No Test Needed) > > Per documentation (select.sgml:1133-1136): > - MEASURES clause: Not implemented > - SUBSET clause: Not implemented > - AFTER MATCH variants: Only SKIP TO NEXT ROW, PAST LAST ROW supported > > Not in documentation, verified by testing: > - {n,m} quantifier: Parser error ("syntax error at or near {") > - (A|B) OR pattern: Not supported (parsed as single variable name "a|b") > - (A B)+ compound repetition: Parser accepts, but execution returns 0 > matches > > > 4. CODE COVERAGE ANALYSIS > ------------------------- > > 4.1 Overall Coverage > > 96.4% (732 / 759 lines) > > 4.2 Coverage by File (major RPR-modified files) > > nodeWindowAgg.c: 96.6% (560/580) - Pattern matching execution engine > parse_clause.c: 96.7% (88/91) - PATTERN/DEFINE analysis > ruleutils.c: 93.1% (54/58) - pg_get_viewdef output > > 4.3 Uncovered Code Risk Assessment > > Total: 27 lines uncovered > > Medium Risk (2 items) - Test addition recommended (see section 3.2): > - parse_clause.c:4093: transformDefineClause - 27th variable error > - nodeWindowAgg.c:5623: get_slots - null_slot for PREV at partition first > row > > Low Risk (25 items) - Defensive code / Unreachable: > - nodeWindowAgg.c:3074: attno_map_walker - Parser validates arg count > - nodeWindowAgg.c:3137-3138: rpr_navigation_walker - switch default > - nodeWindowAgg.c:3566: window_gettupleslot - Before mark position > - nodeWindowAgg.c:4289: WinGetFuncArgInFrame - isout flag > - nodeWindowAgg.c:4393: WinGetSlotInFrame - Boundary check > - nodeWindowAgg.c:4618-4619: row_is_in_reduced_frame - switch default > - nodeWindowAgg.c:4697: register_reduced_frame_map - pos < 0 check > - nodeWindowAgg.c:5007: search_str_set - NULL return continue > - nodeWindowAgg.c:5405: do_pattern_match - Regex compile error > - nodeWindowAgg.c:5435,5437-5438: do_pattern_match - Regex exec error > - nodeWindowAgg.c:5700: pattern_initial - Variable not found > - nodeWindowAgg.c:5776: string_set_get - Index range check > - nodeWindowAgg.c:5850: variable_pos_register - a-z range check > - nodeWindowAgg.c:5910: variable_pos_fetch - a-z range check > - nodeWindowAgg.c:5913: variable_pos_fetch - Index range check > - parse_clause.c:3989: transformDefineClause - A_Expr type check > - parse_clause.c:4145: transformPatternClause - A_Expr type check > - ruleutils.c:6904-6908: get_rule_windowspec - SKIP TO NEXT ROW output > > Conclusion: Most uncovered code consists of defensive error handling or > code unreachable due to parser pre-validation. No security or functional > risk. > > > 5. COMMIT ANALYSIS > ------------------ > > 8 sequential commits: > > Commit Area Files +/- Key Content > ------------------------------------------------------------------------------- > 1 Raw Parser 4 +174/-16 gram.y grammar (PATTERN/DEFINE) > 2 Parse/Analysis 4 +277/-1 parse_clause.c analysis logic > 3 Rewriter 1 +109/-0 pg_get_viewdef extension > 4 Planner 5 +73/-3 WindowAgg plan node extension > 5 Executor 4 +1,942/-11 CORE: Pattern matching engine > (+1,850) > 6 Docs 3 +192/-7 advanced.sgml, func-window.sgml > 7 Tests 3 +1,585/-1 rpr.sql (532), rpr.out (1,052) > 8 typedefs 1 +6/-0 pgindent support > > Code Change Statistics: > - Total files: 25 > - Lines added: 4,358 > - Lines deleted: 39 > - Net increase: +4,319 lines > > > 6. RECOMMENDATIONS > ------------------ > > 6.1 Combinatorial Explosion Prevention (Major, Required) > > Add work_mem-based memory limit for StringSet allocation. > Location: string_set_add() in nodeWindowAgg.c:5741-5750 > Consistent with existing PostgreSQL approach (Hash Join, Sort, etc.) > > 6.2 Code Review Required (Minor, Decision Needed) > > Location: nodeWindowAgg.c:5849,5909,5912 > Issue: pos > NUM_ALPHABETS check intent unclear > Current: PATTERN with 28 elements causes error > Question: Is 27 element limit intentional? > > 6.3 Code Style Fixes (Minor) > > - #ifdef RPR_DEBUG: Use elog(DEBUG2, ...) or remove (25 blocks) > - static keyword on separate line: 26 functions to fix > - Whitespace: "regexp !=NULL" -> "regexp != NULL" > - Error message case: "Unrecognized" -> "unrecognized" > > 6.4 Documentation Fixes (Minor) > > - select.sgml: "maximu" -> "maximum" > > 6.5 Test Additions (Optional) > > Black-box Tests (Functional): > > Feature Test Case Priority > ------------------------------------------------------------------------------- > Variable limit 26 variables success, 27 error Medium > NULL boundary PREV at partition first row Medium > > White-box Tests (Coverage) - covered by above black-box tests: > > File:Line Target Branch > ------------------------------------------------------------------------------- > parse_clause.c:4093 Limit error branch (Variable limit test) > nodeWindowAgg.c:5623 null_slot usage (NULL boundary test) > > > 7. CONCLUSION > ------------- > > Test Quality: GOOD > > Core functionality is thoroughly tested with comprehensive error case > coverage. > > The patch is well-implemented within its defined scope. Identified issues > include > one major concern (combinatorial explosion) and minor style/documentation > items. > > Recommended actions before commit: > 1. Add work_mem-based memory limit for pattern combinations (required) > 2. Clarify pos > NUM_ALPHABETS check intent (review needed) > 3. Fix code style issues (optional but recommended) > 4. Fix documentation typo (trivial) > 5. Add tests for variable limit and NULL handling (optional) > > Points for discussion (optional): > 6. Incremental matching with streaming NFA redesign > > > Attachment: > - coverage.tgz (gcov HTML report, RPR-modified code only) > > > --- > End of Report > > 2025년 9월 24일 (수) PM 7:36, Tatsuo Ishii <[email protected]>님이 작성: > >> Attached are the v33 patches for Row pattern recognition. The >> difference from v32 is that the raw parse tree printing patch is not >> included in v33. This is because now that we have >> debug_print_raw_parse. >> >> Best regards, >> -- >> Tatsuo Ishii >> SRA OSS K.K. >> English: http://www.sraoss.co.jp/index_en/ >> Japanese:http://www.sraoss.co.jp >>
