Hi hackers, I ran a cross-validation test comparing PostgreSQL RPR (current patch) and Trino MATCH_RECOGNIZE to verify both result correctness and performance characteristics across different scales.
The dataset consists of two segments where rows are arranged sequentially as: A → B → C → D Each segment contains a fixed distribution of categories A/B/C/D. Category E does not exist in the dataset and is used to test match failure behavior. Test scales ranged from 20,000 to 100,000 rows. All tests were executed using: AFTER MATCH SKIP PAST LAST ROW Test Environment Item | PostgreSQL | Trino ------------+-------------------------+----------------------- Version | 19devel (RPR patch) | 471 Runtime | Local | Local Docker container Platform | Linux x86_64 | Linux x86_64 RPR syntax | WINDOW clause | MATCH_RECOGNIZE Dataset Structure Each segment contains sequential categories: A : ~1/3 of rows B : ~1/3 of rows C : ~1/3 of rows D : 1 row Example at 1x scale: Segment size : 10,000 rows Total rows : 20,000 Test Cases Test 1 Pattern: PATTERN (A+ B+ C+ D) Expected result: 1 match per segment (2 total) Reason: Data is arranged exactly as A → B → C → D. Test 2 Pattern: PATTERN (A+ B+ C+ E) Expected result: 0 rows (no match) Reason: Category E does not exist in the dataset. Correctness Results Across all scales both systems returned identical results. Test 1 → 1 match per segment (2 total) Test 2 → 0 rows Performance Results Scale | Total Rows | PG Test1 | PG Test2 | Trino Test1 | Trino Test2 ------+------------+----------+----------+-------------+------------- 1x | 20,000 | 19 ms | 17 ms | ~0.3 s | ~358 s 2x | 40,000 | 37 ms | 37 ms | ~0.8 s | ~1,364 s 3x | 60,000 | 57 ms | 51 ms | ~1.5 s | ~4,424 s 4x | 80,000 | 73 ms | 68 ms | ~2.3 s | ~9,989 s 5x | 100,000 | 99 ms | 92 ms | ~3.3 s | ~20,014 s Note: Trino measurements are adjusted to remove JVM startup overhead (~2.4s measured via SELECT 1). Observations Match success (Test 1) Both systems scale approximately linearly because the entire segment is consumed in a single pass after a successful match. PostgreSQL shows lower absolute latency. Match failure (Test 2) PostgreSQL maintains near-linear scaling. This appears to be due to its NFA implementation combined with Context Absorption, which discards redundant matching contexts when they reach the same NFA state but start later in the input. Example: Ctx1 start=0 length=2 Ctx2 start=1 length=1 Ctx1 subsumes Ctx2, so Ctx2 is discarded. This keeps the number of active contexts small and prevents quadratic growth. Measured scaling: 17 ms → 92 ms (1x → 5x) which is consistent with O(n). Trino also uses an NFA approach but appears to lack Context Absorption or a similar optimization. As a result, it explores matching contexts from all possible start positions, leading to rapidly increasing backtracking cost. Measured scaling: 358 s → 20,014 s (1x → 5x) This exceeds theoretical O(n²) growth and appears closer to O(n²)–O(n³) in practice, likely compounded by JVM memory and GC overhead. Summary Correctness PostgreSQL RPR and Trino MATCH_RECOGNIZE produce identical matching results across all tested scales. Performance Match success: Both systems scale roughly linearly. Match failure: PostgreSQL maintains O(n) scaling while Trino shows O(n²) or worse behavior. At the largest tested scale: PostgreSQL : ~92 ms Trino : ~20,014 s (≈ 5.6 hours) PostgreSQL is therefore approximately 217,000× faster in this scenario. Conclusion Both systems use NFA-based pattern matching. The key difference appears to be Context Absorption in PostgreSQL, which removes redundant matching contexts and guarantees linear scaling even when patterns fail to match. This optimization prevents the non-linear performance degradation typically associated with row pattern recognition. Best regards SungJun 2026년 3월 9일 (월) PM 2:22, Tatsuo Ishii <[email protected]>님이 작성: > Hi Henson, > > >> Excellnt findings! BTW, I realized that we cannot use $1 of function > >> in PATTERN clause like: A{$1}. > >> > >> ERROR: 42601: syntax error at or near "$1" > >> LINE 10: PATTERN (A{$1}) > >> ^ > >> LOCATION: scanner_yyerror, scan.l:1211 > >> > >> Should we document somewhere? > >> > > > > The PATTERN quantifier {n} only accepts Iconst (integer literal) in the > > grammar. When a host variable or function parameter is used (e.g., > > A{$1}), the user gets a generic syntax error. > > Ok. > > > Oracle accepts broader syntax and validates later, producing an error > > at a later stage rather than a syntax error at parse time. > > > > PostgreSQL itself already has precedent for this pattern -- in fact, > > within the same window clause, frame offset (ROWS/RANGE/GROUPS) accepts > > a_expr in the grammar and then rejects variables in parse analysis via > > transformFrameOffset() -> checkExprIsVarFree(). > > > > I'd lean against documenting this. The SQL standard already defines > > the quantifier bound as <unsigned integer literal>, so there is nothing > > beyond the standard to call out, and documenting what is *not* allowed > > tends to raise questions that wouldn't otherwise occur to users. > > > > Rather, I think accepting a broader grammar and validating later would > > be the more appropriate response, producing a descriptive error like: > > > > "argument of bounded quantifier must be an integer literal" > > > > I can either include this in the current patch set or handle it as a > > separate follow-up after the main series is committed. What do you > > think? > > I think handing it as a separate follow-up after the commit is enough > unless other developers complain. > > Best regards, > -- > Tatsuo Ishii > SRA OSS K.K. > English: http://www.sraoss.co.jp/index_en/ > Japanese:http://www.sraoss.co.jp >
