Hi hackers, Thank you for the excellent report! I am glad to see that PostgreSQL > RPR is much faster than Trino, especially in the match failure cases. > > Is it possible to share the data generation script and the query for > PostgreSQL so that I could locally perform the tests?
Here is a self-contained guide to reproduce the RPR ABCD pattern test locally using PostgreSQL, Trino, and Oracle. Requirements: PostgreSQL 19devel build with the RPR patch applied (local) Trino and Oracle can be set up via Docker (see README for details): https://github.com/assam258-5892/docker-databases Start all services: cd ~/docker-databases && docker compose up -d trino-service oracle-service Note: docker compose up does not pull images automatically if they are already cached locally. To fetch the latest images, run docker compose pull first. Note: The Oracle image requires an Oracle account. Register at https://container-registry.oracle.com, accept the license for the database image, then log in before pulling: docker login container-registry.oracle.com Connect to each database shell: Trino: docker compose exec -it trino-service trino Oracle: docker compose exec -it oracle-service sqlplus / as sysdba Step 1: Create the test table and data PostgreSQL (1x scale, 20,000 rows): DROP TABLE IF EXISTS abcd_test; CREATE TABLE abcd_test AS SELECT v, CASE WHEN v % 10000 < 3333 THEN 'A' WHEN v % 10000 >= 3333 AND v % 10000 < 6666 THEN 'B' WHEN v % 10000 >= 6666 AND v % 10000 < 9999 THEN 'C' WHEN v % 10000 = 9999 THEN 'D' END AS cat FROM generate_series(0, 19999) AS v; ANALYZE abcd_test; Trino (1x scale, 20,000 rows): CREATE SCHEMA IF NOT EXISTS memory.test; DROP TABLE IF EXISTS memory.test.abcd_test; CREATE TABLE memory.test.abcd_test AS WITH nums AS ( SELECT a.v * 10000 + b.v AS v FROM UNNEST(sequence(0, 1)) AS a(v) CROSS JOIN UNNEST(sequence(0, 9999)) AS b(v) ) SELECT CAST(v AS INTEGER) AS v, CASE WHEN v % 10000 < 3333 THEN 'A' WHEN v % 10000 >= 3333 AND v % 10000 < 6666 THEN 'B' WHEN v % 10000 >= 6666 AND v % 10000 < 9999 THEN 'C' WHEN v % 10000 = 9999 THEN 'D' END AS cat FROM nums; Note: Trino sequence() is limited to 10,000 elements per call, so a CROSS JOIN is used. For scale Sx, change sequence(0, 1) to sequence(0, S*2-1). Oracle (1x scale, 20,000 rows): DROP TABLE abcd_test PURGE; CREATE TABLE abcd_test AS SELECT v, CASE WHEN MOD(v, 10000) < 3333 THEN 'A' WHEN MOD(v, 10000) >= 3333 AND MOD(v, 10000) < 6666 THEN 'B' WHEN MOD(v, 10000) >= 6666 AND MOD(v, 10000) < 9999 THEN 'C' WHEN MOD(v, 10000) = 9999 THEN 'D' END AS cat FROM (SELECT LEVEL - 1 AS v FROM dual CONNECT BY LEVEL <= 20000); EXEC DBMS_STATS.GATHER_TABLE_STATS(USER, 'ABCD_TEST'); Verify data distribution (all engines): SELECT cat, COUNT(*) AS cnt FROM abcd_test GROUP BY cat ORDER BY cat; Expected (1x): A=6666, B=6666, C=6666, D=2 Step 2: Run Test 1 — A+ B+ C+ D (match expected) Expected: 2 rows returned (one match per segment) PostgreSQL: SELECT match_first, match_last, match_len FROM ( SELECT v, first_value(v) OVER w AS match_first, last_value(v) OVER w AS match_last, count(*) OVER w AS match_len FROM abcd_test WINDOW w AS ( ORDER BY v ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING AFTER MATCH SKIP PAST LAST ROW PATTERN (A+ B+ C+ D) DEFINE A AS cat = 'A', B AS cat = 'B', C AS cat = 'C', D AS cat = 'D' ) ) result WHERE match_len > 0; Trino / Oracle: SELECT match_first, match_last, match_len FROM abcd_test MATCH_RECOGNIZE ( ORDER BY v MEASURES FIRST(v) AS match_first, LAST(v) AS match_last, COUNT(*) AS match_len ONE ROW PER MATCH AFTER MATCH SKIP PAST LAST ROW PATTERN (A+ B+ C+ D) DEFINE A AS cat = 'A', B AS cat = 'B', C AS cat = 'C', D AS cat = 'D' ) mr; Note: Trino uses memory.test.abcd_test as the table name. Expected result (1x): match_first | match_last | match_len ------------|------------|---------- 0 | 9999 | 10000 10000 | 19999 | 10000 Step 3: Run Test 2 — A+ B+ C+ E (match failure) Expected: 0 rows (E does not exist) Use the same queries as Test 1 with two changes: PATTERN: (A+ B+ C+ D) → (A+ B+ C+ E) DEFINE: cat = 'D' → cat = 'E' Warning: Trino Test 2 at 1x scale takes approximately 5-6 minutes. Step 4: Scale up (optional) Re-create the test table at 2x scale (40,000 rows) and then repeat Step 2 and Step 3. PostgreSQL (2x scale, 40,000 rows): DROP TABLE IF EXISTS abcd_test; CREATE TABLE abcd_test AS SELECT v, CASE WHEN v % 20000 < 6666 THEN 'A' WHEN v % 20000 >= 6666 AND v % 20000 < 13332 THEN 'B' WHEN v % 20000 >= 13332 AND v % 20000 < 19999 THEN 'C' WHEN v % 20000 = 19999 THEN 'D' END AS cat FROM generate_series(0, 39999) AS v; ANALYZE abcd_test; Trino (2x scale, 40,000 rows): CREATE SCHEMA IF NOT EXISTS memory.test; DROP TABLE IF EXISTS memory.test.abcd_test; CREATE TABLE memory.test.abcd_test AS WITH nums AS ( SELECT a.v * 10000 + b.v AS v FROM UNNEST(sequence(0, 3)) AS a(v) CROSS JOIN UNNEST(sequence(0, 9999)) AS b(v) ) SELECT CAST(v AS INTEGER) AS v, CASE WHEN v % 20000 < 6666 THEN 'A' WHEN v % 20000 >= 6666 AND v % 20000 < 13332 THEN 'B' WHEN v % 20000 >= 13332 AND v % 20000 < 19999 THEN 'C' WHEN v % 20000 = 19999 THEN 'D' END AS cat FROM nums; Oracle (2x scale, 40,000 rows): DROP TABLE abcd_test PURGE; CREATE TABLE abcd_test AS SELECT v, CASE WHEN MOD(v, 20000) < 6666 THEN 'A' WHEN MOD(v, 20000) >= 6666 AND MOD(v, 20000) < 13332 THEN 'B' WHEN MOD(v, 20000) >= 13332 AND MOD(v, 20000) < 19999 THEN 'C' WHEN MOD(v, 20000) = 19999 THEN 'D' END AS cat FROM (SELECT LEVEL - 1 AS v FROM dual CONNECT BY LEVEL <= 40000); EXEC DBMS_STATS.GATHER_TABLE_STATS(USER, 'ABCD_TEST'); Expected (2x): A=13332, B=13332, C=13332, D=4 Then run Step 2 and Step 3 as-is. Warning: Trino Test 2 at 2x scale takes approximately 20-25 minutes. Please let me know if you encounter any issues reproducing this. Best regards SungJun
