>> What about leaving this (reevaluation) for now? Because:
>>
>> 1) we don't have CLASSIFIER
>> 2) we don't allow to give CLASSIFIER to PREV as its arggument
>>
>> so I think we don't need to worry about this for now.
>
> Sure. I'm all for deferring features to make it easier to iterate; I
> just want to make sure the architecture doesn't hit a dead end. Or at
> least, not without being aware of it.
Ok, let's defer this issue. Currently the patch already exceeds 3k
lines. I am afraid too big patch cannot be reviewed by anyone, which
means it will never be committed.
> Also: is CLASSIFIER the only way to run into this issue?
Good question. I would like to know.
>> What if we don't follow the standard, instead we follow POSIX EREs? I
>> think this is better for users unless RPR's REs has significant merit
>> for users.
>
> Piggybacking off of what Vik wrote upthread, I think we would not be
> doing ourselves any favors by introducing a non-compliant
> implementation that performs worse than a traditional NFA. Those would
> be some awful bug reports.
What I am not sure about is, you and Vik mentioned that the
traditional NFA is superior that POSIX NFA in terms of performance.
But how "lexicographic ordering" is related to performance?
>> I am not sure if we need to worry about this because of the reason I
>> mentioned above.
>
> Even if we adopted POSIX NFA semantics, we'd still have to implement
> our own parser for the PATTERN part of the query. I don't think
> there's a good way for us to reuse the parser in src/backend/regex.
Ok.
>> > Does that seem like a workable approach? (Worst-case, my code is just
>> > horrible, and we throw it in the trash.)
>>
>> Yes, it seems workable. I think for the first cut of RPR needs at
>> least the +quantifier with reasonable performance. The current naive
>> implementation seems to have issue because of exhaustive search.
>
> +1
BTW, attched is the v6 patch. The differences from v5 include:
- Now aggregates can be used with RPR. Below is an example from the
regression test cases, which is added by v6 patch.
- Fix assersion error pointed out by Erik.
SELECT company, tdate, price,
first_value(price) OVER w,
last_value(price) OVER w,
max(price) OVER w,
min(price) OVER w,
sum(price) OVER w,
avg(price) OVER w,
count(price) OVER w
FROM stock
WINDOW w AS (
PARTITION BY company
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
AFTER MATCH SKIP PAST LAST ROW
INITIAL
PATTERN (START UP+ DOWN+)
DEFINE
START AS TRUE,
UP AS price > PREV(price),
DOWN AS price < PREV(price)
);
company | tdate | price | first_value | last_value | max | min | sum |
avg | count
----------+------------+-------+-------------+------------+------+-----+------+-----------------------+-------
company1 | 07-01-2023 | 100 | 100 | 140 | 200 | 100 | 590 |
147.5000000000000000 | 4
company1 | 07-02-2023 | 200 | | | | | |
|
company1 | 07-03-2023 | 150 | | | | | |
|
company1 | 07-04-2023 | 140 | | | | | |
|
company1 | 07-05-2023 | 150 | | | | | |
|
company1 | 07-06-2023 | 90 | 90 | 120 | 130 | 90 | 450 |
112.5000000000000000 | 4
company1 | 07-07-2023 | 110 | | | | | |
|
company1 | 07-08-2023 | 130 | | | | | |
|
company1 | 07-09-2023 | 120 | | | | | |
|
company1 | 07-10-2023 | 130 | | | | | |
|
company2 | 07-01-2023 | 50 | 50 | 1400 | 2000 | 50 | 4950 |
1237.5000000000000000 | 4
company2 | 07-02-2023 | 2000 | | | | | |
|
company2 | 07-03-2023 | 1500 | | | | | |
|
company2 | 07-04-2023 | 1400 | | | | | |
|
company2 | 07-05-2023 | 1500 | | | | | |
|
company2 | 07-06-2023 | 60 | 60 | 1200 | 1300 | 60 | 3660 |
915.0000000000000000 | 4
company2 | 07-07-2023 | 1100 | | | | | |
|
company2 | 07-08-2023 | 1300 | | | | | |
|
company2 | 07-09-2023 | 1200 | | | | | |
|
company2 | 07-10-2023 | 1300 | | | | | |
|
(20 rows)
Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp
>From c5699cdf64df9b102c5d9e3b6f6002e504c93fc6 Mon Sep 17 00:00:00 2001
From: Tatsuo Ishii <[email protected]>
Date: Tue, 12 Sep 2023 14:22:22 +0900
Subject: [PATCH v6 1/7] Row pattern recognition patch for raw parser.
---
src/backend/parser/gram.y | 216 +++++++++++++++++++++++++++++---
src/include/nodes/parsenodes.h | 56 +++++++++
src/include/parser/kwlist.h | 8 ++
src/include/parser/parse_node.h | 1 +
4 files changed, 267 insertions(+), 14 deletions(-)
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 7d2032885e..70409cdc9a 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -251,6 +251,8 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
DefElem *defelt;
SortBy *sortby;
WindowDef *windef;
+ RPCommonSyntax *rpcom;
+ RPSubsetItem *rpsubset;
JoinExpr *jexpr;
IndexElem *ielem;
StatsElem *selem;
@@ -453,8 +455,12 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
TriggerTransitions TriggerReferencing
vacuum_relation_list opt_vacuum_relation_list
drop_option_list pub_obj_list
-
-%type <node> opt_routine_body
+ row_pattern_measure_list row_pattern_definition_list
+ opt_row_pattern_subset_clause
+ row_pattern_subset_list row_pattern_subset_rhs
+ row_pattern
+%type <rpsubset> row_pattern_subset_item
+%type <node> opt_routine_body row_pattern_term
%type <groupclause> group_clause
%type <list> group_by_list
%type <node> group_by_item empty_grouping_set rollup_clause cube_clause
@@ -551,6 +557,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
%type <range> relation_expr_opt_alias
%type <node> tablesample_clause opt_repeatable_clause
%type <target> target_el set_target insert_column_item
+ row_pattern_measure_item row_pattern_definition
%type <str> generic_option_name
%type <node> generic_option_arg
@@ -633,6 +640,9 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
%type <list> window_clause window_definition_list opt_partition_clause
%type <windef> window_definition over_clause window_specification
opt_frame_clause frame_extent frame_bound
+%type <rpcom> opt_row_pattern_common_syntax opt_row_pattern_skip_to
+%type <boolean> opt_row_pattern_initial_or_seek
+%type <list> opt_row_pattern_measures
%type <ival> opt_window_exclusion_clause
%type <str> opt_existing_window_name
%type <boolean> opt_if_not_exists
@@ -659,7 +669,6 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
json_object_constructor_null_clause_opt
json_array_constructor_null_clause_opt
-
/*
* Non-keyword token types. These are hard-wired into the "flex" lexer.
* They must be listed first so that their numeric codes do not depend on
@@ -702,7 +711,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
CURRENT_TIME CURRENT_TIMESTAMP CURRENT_USER CURSOR CYCLE
DATA_P DATABASE DAY_P DEALLOCATE DEC DECIMAL_P DECLARE DEFAULT DEFAULTS
- DEFERRABLE DEFERRED DEFINER DELETE_P DELIMITER DELIMITERS DEPENDS DEPTH DESC
+ DEFERRABLE DEFERRED DEFINE DEFINER DELETE_P DELIMITER DELIMITERS DEPENDS DEPTH DESC
DETACH DICTIONARY DISABLE_P DISCARD DISTINCT DO DOCUMENT_P DOMAIN_P
DOUBLE_P DROP
@@ -718,7 +727,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
HANDLER HAVING HEADER_P HOLD HOUR_P
IDENTITY_P IF_P ILIKE IMMEDIATE IMMUTABLE IMPLICIT_P IMPORT_P IN_P INCLUDE
- INCLUDING INCREMENT INDENT INDEX INDEXES INHERIT INHERITS INITIALLY INLINE_P
+ INCLUDING INCREMENT INDENT INDEX INDEXES INHERIT INHERITS INITIAL INITIALLY INLINE_P
INNER_P INOUT INPUT_P INSENSITIVE INSERT INSTEAD INT_P INTEGER
INTERSECT INTERVAL INTO INVOKER IS ISNULL ISOLATION
@@ -731,7 +740,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
LEADING LEAKPROOF LEAST LEFT LEVEL LIKE LIMIT LISTEN LOAD LOCAL
LOCALTIME LOCALTIMESTAMP LOCATION LOCK_P LOCKED LOGGED
- MAPPING MATCH MATCHED MATERIALIZED MAXVALUE MERGE METHOD
+ MAPPING MATCH MATCHED MATERIALIZED MAXVALUE MEASURES MERGE METHOD
MINUTE_P MINVALUE MODE MONTH_P MOVE
NAME_P NAMES NATIONAL NATURAL NCHAR NEW NEXT NFC NFD NFKC NFKD NO NONE
@@ -743,8 +752,8 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
ORDER ORDINALITY OTHERS OUT_P OUTER_P
OVER OVERLAPS OVERLAY OVERRIDING OWNED OWNER
- PARALLEL PARAMETER PARSER PARTIAL PARTITION PASSING PASSWORD
- PLACING PLANS POLICY
+ PARALLEL PARAMETER PARSER PARTIAL PARTITION PASSING PASSWORD PAST
+ PATTERN_P PERMUTE PLACING PLANS POLICY
POSITION PRECEDING PRECISION PRESERVE PREPARE PREPARED PRIMARY
PRIOR PRIVILEGES PROCEDURAL PROCEDURE PROCEDURES PROGRAM PUBLICATION
@@ -755,12 +764,13 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
RESET RESTART RESTRICT RETURN RETURNING RETURNS REVOKE RIGHT ROLE ROLLBACK ROLLUP
ROUTINE ROUTINES ROW ROWS RULE
- SAVEPOINT SCALAR SCHEMA SCHEMAS SCROLL SEARCH SECOND_P SECURITY SELECT
+ SAVEPOINT SCALAR SCHEMA SCHEMAS SCROLL SEARCH SECOND_P SECURITY SEEK SELECT
SEQUENCE SEQUENCES
+
SERIALIZABLE SERVER SESSION SESSION_USER SET SETS SETOF SHARE SHOW
SIMILAR SIMPLE SKIP SMALLINT SNAPSHOT SOME SQL_P STABLE STANDALONE_P
START STATEMENT STATISTICS STDIN STDOUT STORAGE STORED STRICT_P STRIP_P
- SUBSCRIPTION SUBSTRING SUPPORT SYMMETRIC SYSID SYSTEM_P SYSTEM_USER
+ SUBSCRIPTION SUBSET SUBSTRING SUPPORT SYMMETRIC SYSID SYSTEM_P SYSTEM_USER
TABLE TABLES TABLESAMPLE TABLESPACE TEMP TEMPLATE TEMPORARY TEXT_P THEN
TIES TIME TIMESTAMP TO TRAILING TRANSACTION TRANSFORM
@@ -853,6 +863,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
*/
%nonassoc UNBOUNDED /* ideally would have same precedence as IDENT */
%nonassoc IDENT PARTITION RANGE ROWS GROUPS PRECEDING FOLLOWING CUBE ROLLUP
+%nonassoc MEASURES AFTER INITIAL SEEK PATTERN_P
%left Op OPERATOR /* multi-character ops and user-defined operators */
%left '+' '-'
%left '*' '/' '%'
@@ -15857,7 +15868,8 @@ over_clause: OVER window_specification
;
window_specification: '(' opt_existing_window_name opt_partition_clause
- opt_sort_clause opt_frame_clause ')'
+ opt_sort_clause opt_row_pattern_measures opt_frame_clause
+ opt_row_pattern_common_syntax ')'
{
WindowDef *n = makeNode(WindowDef);
@@ -15865,10 +15877,12 @@ window_specification: '(' opt_existing_window_name opt_partition_clause
n->refname = $2;
n->partitionClause = $3;
n->orderClause = $4;
+ n->rowPatternMeasures = $5;
/* copy relevant fields of opt_frame_clause */
- n->frameOptions = $5->frameOptions;
- n->startOffset = $5->startOffset;
- n->endOffset = $5->endOffset;
+ n->frameOptions = $6->frameOptions;
+ n->startOffset = $6->startOffset;
+ n->endOffset = $6->endOffset;
+ n->rpCommonSyntax = $7;
n->location = @1;
$$ = n;
}
@@ -15892,6 +15906,31 @@ opt_partition_clause: PARTITION BY expr_list { $$ = $3; }
| /*EMPTY*/ { $$ = NIL; }
;
+/*
+ * ROW PATTERN_P MEASURES
+ */
+opt_row_pattern_measures: MEASURES row_pattern_measure_list { $$ = $2; }
+ | /*EMPTY*/ { $$ = NIL; }
+ ;
+
+row_pattern_measure_list:
+ row_pattern_measure_item
+ { $$ = list_make1($1); }
+ | row_pattern_measure_list ',' row_pattern_measure_item
+ { $$ = lappend($1, $3); }
+ ;
+
+row_pattern_measure_item:
+ a_expr AS ColLabel
+ {
+ $$ = makeNode(ResTarget);
+ $$->name = $3;
+ $$->indirection = NIL;
+ $$->val = (Node *) $1;
+ $$->location = @1;
+ }
+ ;
+
/*
* For frame clauses, we return a WindowDef, but only some fields are used:
* frameOptions, startOffset, and endOffset.
@@ -16051,6 +16090,139 @@ opt_window_exclusion_clause:
| /*EMPTY*/ { $$ = 0; }
;
+opt_row_pattern_common_syntax:
+opt_row_pattern_skip_to opt_row_pattern_initial_or_seek
+ PATTERN_P '(' row_pattern ')'
+ opt_row_pattern_subset_clause
+ DEFINE row_pattern_definition_list
+ {
+ RPCommonSyntax *n = makeNode(RPCommonSyntax);
+ n->rpSkipTo = $1->rpSkipTo;
+ n->rpSkipVariable = $1->rpSkipVariable;
+ n->initial = $2;
+ n->rpPatterns = $5;
+ n->rpSubsetClause = $7;
+ n->rpDefs = $9;
+ $$ = n;
+ }
+ | /*EMPTY*/ { $$ = NULL; }
+ ;
+
+opt_row_pattern_skip_to:
+ AFTER MATCH SKIP TO NEXT ROW
+ {
+ RPCommonSyntax *n = makeNode(RPCommonSyntax);
+ n->rpSkipTo = ST_NEXT_ROW;
+ n->rpSkipVariable = NULL;
+ $$ = n;
+ }
+ | AFTER MATCH SKIP PAST LAST_P ROW
+ {
+ RPCommonSyntax *n = makeNode(RPCommonSyntax);
+ n->rpSkipTo = ST_PAST_LAST_ROW;
+ n->rpSkipVariable = NULL;
+ $$ = n;
+ }
+/*
+ | AFTER MATCH SKIP TO FIRST_P ColId %prec FIRST_P
+ {
+ RPCommonSyntax *n = makeNode(RPCommonSyntax);
+ n->rpSkipTo = ST_FIRST_VARIABLE;
+ n->rpSkipVariable = $6;
+ $$ = n;
+ }
+ | AFTER MATCH SKIP TO LAST_P ColId %prec LAST_P
+ {
+ RPCommonSyntax *n = makeNode(RPCommonSyntax);
+ n->rpSkipTo = ST_LAST_VARIABLE;
+ n->rpSkipVariable = $6;
+ $$ = n;
+ }
+ * Shift/reduce
+ | AFTER MATCH SKIP TO ColId
+ {
+ RPCommonSyntax *n = makeNode(RPCommonSyntax);
+ n->rpSkipTo = ST_VARIABLE;
+ n->rpSkipVariable = $5;
+ $$ = n;
+ }
+*/
+ | /*EMPTY*/
+ {
+ RPCommonSyntax *n = makeNode(RPCommonSyntax);
+ /* temporary set default to ST_NEXT_ROW */
+ n->rpSkipTo = ST_PAST_LAST_ROW;
+ n->rpSkipVariable = NULL;
+ $$ = n;
+ }
+ ;
+
+opt_row_pattern_initial_or_seek:
+ INITIAL { $$ = true; }
+ | SEEK
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("SEEK is not supported"),
+ errhint("Use INITIAL."),
+ parser_errposition(@1)));
+ }
+ | /*EMPTY*/ { $$ = true; }
+ ;
+
+row_pattern:
+ row_pattern_term { $$ = list_make1($1); }
+ | row_pattern row_pattern_term { $$ = lappend($1, $2); }
+ ;
+
+row_pattern_term:
+ ColId { $$ = (Node *) makeSimpleA_Expr(AEXPR_OP, "", (Node *)makeString($1), NULL, @1); }
+ | ColId '*' { $$ = (Node *) makeSimpleA_Expr(AEXPR_OP, "*", (Node *)makeString($1), NULL, @1); }
+ | ColId '+' { $$ = (Node *) makeSimpleA_Expr(AEXPR_OP, "+", (Node *)makeString($1), NULL, @1); }
+ | ColId '?' { $$ = (Node *) makeSimpleA_Expr(AEXPR_OP, "?", (Node *)makeString($1), NULL, @1); }
+ ;
+
+opt_row_pattern_subset_clause:
+ SUBSET row_pattern_subset_list { $$ = $2; }
+ | /*EMPTY*/ { $$ = NIL; }
+ ;
+
+row_pattern_subset_list:
+ row_pattern_subset_item { $$ = list_make1($1); }
+ | row_pattern_subset_list ',' row_pattern_subset_item { $$ = lappend($1, $3); }
+ | /*EMPTY*/ { $$ = NIL; }
+ ;
+
+row_pattern_subset_item: ColId '=' '(' row_pattern_subset_rhs ')'
+ {
+ RPSubsetItem *n = makeNode(RPSubsetItem);
+ n->name = $1;
+ n->rhsVariable = $4;
+ $$ = n;
+ }
+ ;
+
+row_pattern_subset_rhs:
+ ColId { $$ = list_make1(makeStringConst($1, @1)); }
+ | row_pattern_subset_rhs ',' ColId { $$ = lappend($1, makeStringConst($3, @1)); }
+ | /*EMPTY*/ { $$ = NIL; }
+ ;
+
+row_pattern_definition_list:
+ row_pattern_definition { $$ = list_make1($1); }
+ | row_pattern_definition_list ',' row_pattern_definition { $$ = lappend($1, $3); }
+ ;
+
+row_pattern_definition:
+ ColId AS a_expr
+ {
+ $$ = makeNode(ResTarget);
+ $$->name = $1;
+ $$->indirection = NIL;
+ $$->val = (Node *) $3;
+ $$->location = @1;
+ }
+ ;
/*
* Supporting nonterminals for expressions.
@@ -17146,6 +17318,7 @@ unreserved_keyword:
| INDEXES
| INHERIT
| INHERITS
+ | INITIAL
| INLINE_P
| INPUT_P
| INSENSITIVE
@@ -17173,6 +17346,7 @@ unreserved_keyword:
| MATCHED
| MATERIALIZED
| MAXVALUE
+ | MEASURES
| MERGE
| METHOD
| MINUTE_P
@@ -17215,6 +17389,9 @@ unreserved_keyword:
| PARTITION
| PASSING
| PASSWORD
+ | PAST
+ | PATTERN_P
+ | PERMUTE
| PLANS
| POLICY
| PRECEDING
@@ -17265,6 +17442,7 @@ unreserved_keyword:
| SEARCH
| SECOND_P
| SECURITY
+ | SEEK
| SEQUENCE
| SEQUENCES
| SERIALIZABLE
@@ -17290,6 +17468,7 @@ unreserved_keyword:
| STRICT_P
| STRIP_P
| SUBSCRIPTION
+ | SUBSET
| SUPPORT
| SYSID
| SYSTEM_P
@@ -17477,6 +17656,7 @@ reserved_keyword:
| CURRENT_USER
| DEFAULT
| DEFERRABLE
+ | DEFINE
| DESC
| DISTINCT
| DO
@@ -17639,6 +17819,7 @@ bare_label_keyword:
| DEFAULTS
| DEFERRABLE
| DEFERRED
+ | DEFINE
| DEFINER
| DELETE_P
| DELIMITER
@@ -17714,6 +17895,7 @@ bare_label_keyword:
| INDEXES
| INHERIT
| INHERITS
+ | INITIAL
| INITIALLY
| INLINE_P
| INNER_P
@@ -17763,6 +17945,7 @@ bare_label_keyword:
| MATCHED
| MATERIALIZED
| MAXVALUE
+ | MEASURES
| MERGE
| METHOD
| MINVALUE
@@ -17816,6 +17999,9 @@ bare_label_keyword:
| PARTITION
| PASSING
| PASSWORD
+ | PAST
+ | PATTERN_P
+ | PERMUTE
| PLACING
| PLANS
| POLICY
@@ -17872,6 +18058,7 @@ bare_label_keyword:
| SCROLL
| SEARCH
| SECURITY
+ | SEEK
| SELECT
| SEQUENCE
| SEQUENCES
@@ -17903,6 +18090,7 @@ bare_label_keyword:
| STRICT_P
| STRIP_P
| SUBSCRIPTION
+ | SUBSET
| SUBSTRING
| SUPPORT
| SYMMETRIC
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index fef4c714b8..657651df1d 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -547,6 +547,44 @@ typedef struct SortBy
int location; /* operator location, or -1 if none/unknown */
} SortBy;
+/*
+ * AFTER MATCH row pattern skip to types in row pattern common syntax
+ */
+typedef enum RPSkipTo
+{
+ ST_NONE, /* AFTER MATCH omitted */
+ ST_NEXT_ROW, /* SKIP TO NEXT ROW */
+ ST_PAST_LAST_ROW, /* SKIP TO PAST LAST ROW */
+ ST_FIRST_VARIABLE, /* SKIP TO FIRST variable name */
+ ST_LAST_VARIABLE, /* SKIP TO LAST variable name */
+ ST_VARIABLE /* SKIP TO variable name */
+} RPSkipTo;
+
+/*
+ * Row Pattern SUBSET clause item
+ */
+typedef struct RPSubsetItem
+{
+ NodeTag type;
+ char *name; /* Row Pattern SUBSET clause variable name */
+ List *rhsVariable; /* Row Pattern SUBSET rhs variables (list of char *string) */
+} RPSubsetItem;
+
+/*
+ * RowPatternCommonSyntax - raw representation of row pattern common syntax
+ *
+ */
+typedef struct RPCommonSyntax
+{
+ NodeTag type;
+ RPSkipTo rpSkipTo; /* Row Pattern AFTER MATCH SKIP type */
+ char *rpSkipVariable; /* Row Pattern Skip To variable name, if any */
+ bool initial; /* true if <row pattern initial or seek> is initial */
+ List *rpPatterns; /* PATTERN variables (list of A_Expr) */
+ List *rpSubsetClause; /* row pattern subset clause (list of RPSubsetItem), if any */
+ List *rpDefs; /* row pattern definitions clause (list of ResTarget) */
+} RPCommonSyntax;
+
/*
* WindowDef - raw representation of WINDOW and OVER clauses
*
@@ -562,6 +600,8 @@ typedef struct WindowDef
char *refname; /* referenced window name, if any */
List *partitionClause; /* PARTITION BY expression list */
List *orderClause; /* ORDER BY (list of SortBy) */
+ List *rowPatternMeasures; /* row pattern measures (list of ResTarget) */
+ RPCommonSyntax *rpCommonSyntax; /* row pattern common syntax */
int frameOptions; /* frame_clause options, see below */
Node *startOffset; /* expression for starting bound, if any */
Node *endOffset; /* expression for ending bound, if any */
@@ -1483,6 +1523,11 @@ typedef struct GroupingSet
* the orderClause might or might not be copied (see copiedOrder); the framing
* options are never copied, per spec.
*
+ * "defineClause" is Row Pattern Recognition DEFINE clause (list of
+ * TargetEntry). TargetEntry.resname represents row pattern definition
+ * variable name. "patternVariable" and "patternRegexp" represents PATTERN
+ * clause.
+ *
* The information relevant for the query jumbling is the partition clause
* type and its bounds.
*/
@@ -1514,6 +1559,17 @@ typedef struct WindowClause
Index winref; /* ID referenced by window functions */
/* did we copy orderClause from refname? */
bool copiedOrder pg_node_attr(query_jumble_ignore);
+ /* Row Pattern AFTER MACH SKIP clause */
+ RPSkipTo rpSkipTo; /* Row Pattern Skip To type */
+ bool initial; /* true if <row pattern initial or seek> is initial */
+ /* Row Pattern DEFINE clause (list of TargetEntry) */
+ List *defineClause;
+ /* Row Pattern DEFINE variable initial names (list of String) */
+ List *defineInitial;
+ /* Row Pattern PATTERN variable name (list of String) */
+ List *patternVariable;
+ /* Row Pattern PATTERN regular expression quantifier ('+' or ''. list of String) */
+ List *patternRegexp;
} WindowClause;
/*
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index 5984dcfa4b..2804333b53 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -128,6 +128,7 @@ PG_KEYWORD("default", DEFAULT, RESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("defaults", DEFAULTS, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("deferrable", DEFERRABLE, RESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("deferred", DEFERRED, UNRESERVED_KEYWORD, BARE_LABEL)
+PG_KEYWORD("define", DEFINE, RESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("definer", DEFINER, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("delete", DELETE_P, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("delimiter", DELIMITER, UNRESERVED_KEYWORD, BARE_LABEL)
@@ -212,6 +213,7 @@ PG_KEYWORD("index", INDEX, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("indexes", INDEXES, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("inherit", INHERIT, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("inherits", INHERITS, UNRESERVED_KEYWORD, BARE_LABEL)
+PG_KEYWORD("initial", INITIAL, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("initially", INITIALLY, RESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("inline", INLINE_P, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("inner", INNER_P, TYPE_FUNC_NAME_KEYWORD, BARE_LABEL)
@@ -265,6 +267,7 @@ PG_KEYWORD("match", MATCH, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("matched", MATCHED, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("materialized", MATERIALIZED, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("maxvalue", MAXVALUE, UNRESERVED_KEYWORD, BARE_LABEL)
+PG_KEYWORD("measures", MEASURES, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("merge", MERGE, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("method", METHOD, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("minute", MINUTE_P, UNRESERVED_KEYWORD, AS_LABEL)
@@ -326,6 +329,9 @@ PG_KEYWORD("partial", PARTIAL, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("partition", PARTITION, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("passing", PASSING, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("password", PASSWORD, UNRESERVED_KEYWORD, BARE_LABEL)
+PG_KEYWORD("past", PAST, UNRESERVED_KEYWORD, BARE_LABEL)
+PG_KEYWORD("pattern", PATTERN_P, UNRESERVED_KEYWORD, BARE_LABEL)
+PG_KEYWORD("permute", PERMUTE, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("placing", PLACING, RESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("plans", PLANS, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("policy", POLICY, UNRESERVED_KEYWORD, BARE_LABEL)
@@ -385,6 +391,7 @@ PG_KEYWORD("scroll", SCROLL, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("search", SEARCH, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("second", SECOND_P, UNRESERVED_KEYWORD, AS_LABEL)
PG_KEYWORD("security", SECURITY, UNRESERVED_KEYWORD, BARE_LABEL)
+PG_KEYWORD("seek", SEEK, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("select", SELECT, RESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("sequence", SEQUENCE, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("sequences", SEQUENCES, UNRESERVED_KEYWORD, BARE_LABEL)
@@ -416,6 +423,7 @@ PG_KEYWORD("stored", STORED, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("strict", STRICT_P, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("strip", STRIP_P, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("subscription", SUBSCRIPTION, UNRESERVED_KEYWORD, BARE_LABEL)
+PG_KEYWORD("subset", SUBSET, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("substring", SUBSTRING, COL_NAME_KEYWORD, BARE_LABEL)
PG_KEYWORD("support", SUPPORT, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("symmetric", SYMMETRIC, RESERVED_KEYWORD, BARE_LABEL)
diff --git a/src/include/parser/parse_node.h b/src/include/parser/parse_node.h
index f589112d5e..6640090910 100644
--- a/src/include/parser/parse_node.h
+++ b/src/include/parser/parse_node.h
@@ -51,6 +51,7 @@ typedef enum ParseExprKind
EXPR_KIND_WINDOW_FRAME_RANGE, /* window frame clause with RANGE */
EXPR_KIND_WINDOW_FRAME_ROWS, /* window frame clause with ROWS */
EXPR_KIND_WINDOW_FRAME_GROUPS, /* window frame clause with GROUPS */
+ EXPR_KIND_RPR_DEFINE, /* DEFINE */
EXPR_KIND_SELECT_TARGET, /* SELECT target list item */
EXPR_KIND_INSERT_TARGET, /* INSERT target list item */
EXPR_KIND_UPDATE_SOURCE, /* UPDATE assignment source item */
--
2.25.1
>From cbd7a3b97b951a3e02b89fb85021a127f92f3a88 Mon Sep 17 00:00:00 2001
From: Tatsuo Ishii <[email protected]>
Date: Tue, 12 Sep 2023 14:22:22 +0900
Subject: [PATCH v6 2/7] Row pattern recognition patch (parse/analysis).
---
src/backend/parser/parse_agg.c | 7 +
src/backend/parser/parse_clause.c | 292 +++++++++++++++++++++++++++++-
src/backend/parser/parse_expr.c | 4 +
src/backend/parser/parse_func.c | 3 +
4 files changed, 305 insertions(+), 1 deletion(-)
diff --git a/src/backend/parser/parse_agg.c b/src/backend/parser/parse_agg.c
index 85cd47b7ae..aa7a1cee80 100644
--- a/src/backend/parser/parse_agg.c
+++ b/src/backend/parser/parse_agg.c
@@ -564,6 +564,10 @@ check_agglevels_and_constraints(ParseState *pstate, Node *expr)
errkind = true;
break;
+ case EXPR_KIND_RPR_DEFINE:
+ errkind = true;
+ break;
+
/*
* There is intentionally no default: case here, so that the
* compiler will warn if we add a new ParseExprKind without
@@ -953,6 +957,9 @@ transformWindowFuncCall(ParseState *pstate, WindowFunc *wfunc,
case EXPR_KIND_CYCLE_MARK:
errkind = true;
break;
+ case EXPR_KIND_RPR_DEFINE:
+ errkind = true;
+ break;
/*
* There is intentionally no default: case here, so that the
diff --git a/src/backend/parser/parse_clause.c b/src/backend/parser/parse_clause.c
index 334b9b42bd..60020a7025 100644
--- a/src/backend/parser/parse_clause.c
+++ b/src/backend/parser/parse_clause.c
@@ -100,7 +100,10 @@ static WindowClause *findWindowClause(List *wclist, const char *name);
static Node *transformFrameOffset(ParseState *pstate, int frameOptions,
Oid rangeopfamily, Oid rangeopcintype, Oid *inRangeFunc,
Node *clause);
-
+static void transformRPR(ParseState *pstate, WindowClause *wc, WindowDef *windef, List **targetlist);
+static List *transformDefineClause(ParseState *pstate, WindowClause *wc, WindowDef *windef, List **targetlist);
+static void transformPatternClause(ParseState *pstate, WindowClause *wc, WindowDef *windef);
+static List *transformMeasureClause(ParseState *pstate, WindowClause *wc, WindowDef *windef);
/*
* transformFromClause -
@@ -2950,6 +2953,10 @@ transformWindowDefinitions(ParseState *pstate,
rangeopfamily, rangeopcintype,
&wc->endInRangeFunc,
windef->endOffset);
+
+ /* Process Row Pattern Recognition related clauses */
+ transformRPR(pstate, wc, windef, targetlist);
+
wc->runCondition = NIL;
wc->winref = winref;
@@ -3815,3 +3822,286 @@ transformFrameOffset(ParseState *pstate, int frameOptions,
return node;
}
+
+/*
+ * transformRPR
+ * Process Row Pattern Recognition related clauses
+ */
+static void
+transformRPR(ParseState *pstate, WindowClause *wc, WindowDef *windef, List **targetlist)
+{
+ /*
+ * Window definition exists?
+ */
+ if (windef == NULL)
+ return;
+
+ /*
+ * Row Pattern Common Syntax clause exists?
+ */
+ if (windef->rpCommonSyntax == NULL)
+ return;
+
+ /* Check Frame option. Frame must start at current row */
+ if ((wc->frameOptions & FRAMEOPTION_START_CURRENT_ROW) == 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("FRAME must start at current row when row patttern recognition is used")));
+
+ /* Transform AFTER MACH SKIP TO clause */
+ wc->rpSkipTo = windef->rpCommonSyntax->rpSkipTo;
+
+ /* Transform SEEK or INITIAL clause */
+ wc->initial = windef->rpCommonSyntax->initial;
+
+ /* Transform DEFINE clause into list of TargetEntry's */
+ wc->defineClause = transformDefineClause(pstate, wc, windef, targetlist);
+
+ /* Check PATTERN clause and copy to patternClause */
+ transformPatternClause(pstate, wc, windef);
+
+ /* Transform MEASURE clause */
+ transformMeasureClause(pstate, wc, windef);
+}
+
+/*
+ * transformDefineClause Process DEFINE clause and transform ResTarget into
+ * list of TargetEntry.
+ *
+ * XXX we only support column reference in row pattern definition search
+ * condition, e.g. "price". <row pattern definition variable name>.<column
+ * reference> is not supported, e.g. "A.price".
+ */
+static List *
+transformDefineClause(ParseState *pstate, WindowClause *wc, WindowDef *windef, List **targetlist)
+{
+ /* DEFINE variable name initials */
+ static char *defineVariableInitials = "abcdefghijklmnopqrstuvwxyz";
+
+ ListCell *lc, *l;
+ ResTarget *restarget, *r;
+ List *restargets;
+ char *name;
+ int initialLen;
+ int i;
+
+ /*
+ * If Row Definition Common Syntax exists, DEFINE clause must exist.
+ * (the raw parser should have already checked it.)
+ */
+ Assert(windef->rpCommonSyntax->rpDefs != NULL);
+
+ /*
+ * Check and add "A AS A IS TRUE" if pattern variable is missing in DEFINE
+ * per the SQL standard.
+ */
+ restargets = NIL;
+ foreach(lc, windef->rpCommonSyntax->rpPatterns)
+ {
+ A_Expr *a;
+ bool found = false;
+
+ if (!IsA(lfirst(lc), A_Expr))
+ ereport(ERROR,
+ errmsg("node type is not A_Expr"));
+
+ a = (A_Expr *)lfirst(lc);
+ name = strVal(a->lexpr);
+
+ foreach(l, windef->rpCommonSyntax->rpDefs)
+ {
+ restarget = (ResTarget *)lfirst(l);
+
+ if (!strcmp(restarget->name, name))
+ {
+ found = true;
+ break;
+ }
+ }
+
+ if (!found)
+ {
+ /*
+ * "name" is missing. So create "name AS name IS TRUE" ResTarget
+ * node and add it to the temporary list.
+ */
+ A_Const *n;
+
+ restarget = makeNode(ResTarget);
+ n = makeNode(A_Const);
+ n->val.boolval.type = T_Boolean;
+ n->val.boolval.boolval = true;
+ n->location = -1;
+ restarget->name = pstrdup(name);
+ restarget->indirection = NIL;
+ restarget->val = (Node *)n;
+ restarget->location = -1;
+ restargets = lappend((List *)restargets, restarget);
+ }
+ }
+
+ if (list_length(restargets) >= 1)
+ {
+ /* add missing DEFINEs */
+ windef->rpCommonSyntax->rpDefs = list_concat(windef->rpCommonSyntax->rpDefs,
+ restargets);
+ list_free(restargets);
+ }
+
+ /*
+ * Check for duplicate row pattern definition variables. The standard
+ * requires that no two row pattern definition variable names shall be
+ * equivalent.
+ */
+ restargets = NIL;
+ foreach(lc, windef->rpCommonSyntax->rpDefs)
+ {
+ restarget = (ResTarget *)lfirst(lc);
+ name = restarget->name;
+
+ /*
+ * Add DEFINE expression (Restarget->val) to the targetlist as a
+ * TargetEntry if it does not exist yet. Planner will add the column
+ * ref var node to the outer plan's target list later on. This makes
+ * DEFINE expression could access the outer tuple while evaluating
+ * PATTERN.
+ *
+ * XXX: adding whole expressions of DEFINE to the plan.targetlist is
+ * not so good, because it's not necessary to evalute the expression
+ * in the target list while running the plan. We should extract the
+ * var nodes only then add them to the plan.targetlist.
+ */
+ findTargetlistEntrySQL99(pstate, (Node *)restarget->val, targetlist, EXPR_KIND_RPR_DEFINE);
+
+ /*
+ * Make sure that the row pattern definition search condition is a
+ * boolean expression.
+ */
+ transformWhereClause(pstate, restarget->val,
+ EXPR_KIND_RPR_DEFINE, "DEFINE");
+
+ foreach(l, restargets)
+ {
+ char *n;
+
+ r = (ResTarget *) lfirst(l);
+ n = r->name;
+
+ if (!strcmp(n, name))
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("row pattern definition variable name \"%s\" appears more than once in DEFINE clause",
+ name),
+ parser_errposition(pstate, exprLocation((Node *)r))));
+ }
+ restargets = lappend(restargets, restarget);
+ }
+ list_free(restargets);
+
+ /*
+ * Create list of row pattern DEFINE variable name's initial.
+ * We assign [a-z] to them (up to 26 variable names are allowed).
+ */
+ restargets = NIL;
+ i = 0;
+ initialLen = strlen(defineVariableInitials);
+
+ foreach(lc, windef->rpCommonSyntax->rpDefs)
+ {
+ char initial[2];
+
+ restarget = (ResTarget *)lfirst(lc);
+ name = restarget->name;
+
+ if (i >= initialLen)
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("number of row pattern definition variable names exceeds %d", initialLen),
+ parser_errposition(pstate, exprLocation((Node *)restarget))));
+ }
+ initial[0] = defineVariableInitials[i++];
+ initial[1] = '\0';
+ wc->defineInitial = lappend(wc->defineInitial, makeString(pstrdup(initial)));
+ }
+
+ return transformTargetList(pstate, windef->rpCommonSyntax->rpDefs,
+ EXPR_KIND_RPR_DEFINE);
+}
+
+/*
+ * transformPatternClause
+ * Process PATTERN clause and return PATTERN clause in the raw parse tree
+ */
+static void
+transformPatternClause(ParseState *pstate, WindowClause *wc, WindowDef *windef)
+{
+ ListCell *lc, *l;
+
+ /*
+ * Row Pattern Common Syntax clause exists?
+ */
+ if (windef->rpCommonSyntax == NULL)
+ return;
+
+ /*
+ * Primary row pattern variable names in PATTERN clause must appear in
+ * DEFINE clause as row pattern definition variable names.
+ */
+ wc->patternVariable = NIL;
+ wc->patternRegexp = NIL;
+ foreach(lc, windef->rpCommonSyntax->rpPatterns)
+ {
+ A_Expr *a;
+ char *name;
+ char *regexp;
+ bool found = false;
+
+ if (!IsA(lfirst(lc), A_Expr))
+ ereport(ERROR,
+ errmsg("node type is not A_Expr"));
+
+ a = (A_Expr *)lfirst(lc);
+ name = strVal(a->lexpr);
+
+ foreach(l, windef->rpCommonSyntax->rpDefs)
+ {
+ ResTarget *restarget = (ResTarget *)lfirst(l);
+
+ if (!strcmp(restarget->name, name))
+ {
+ found = true;
+ break;
+ }
+ }
+
+ if (!found)
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("primary row pattern variable name \"%s\" does not appear in DEFINE clause",
+ name),
+ parser_errposition(pstate, exprLocation((Node *)a))));
+ }
+ wc->patternVariable = lappend(wc->patternVariable, makeString(pstrdup(name)));
+ regexp = strVal(lfirst(list_head(a->name)));
+ wc->patternRegexp = lappend(wc->patternRegexp, makeString(pstrdup(regexp)));
+ }
+}
+
+/*
+ * transformMeasureClause
+ * Process MEASURE clause
+ * XXX MEASURE clause is not supported yet
+ */
+static List *
+transformMeasureClause(ParseState *pstate, WindowClause *wc, WindowDef *windef)
+{
+ if (windef->rowPatternMeasures == NIL)
+ return NIL;
+
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("%s","MEASURE clause is not supported yet"),
+ parser_errposition(pstate, exprLocation((Node *)windef->rowPatternMeasures))));
+}
diff --git a/src/backend/parser/parse_expr.c b/src/backend/parser/parse_expr.c
index 64c582c344..18b58ac263 100644
--- a/src/backend/parser/parse_expr.c
+++ b/src/backend/parser/parse_expr.c
@@ -557,6 +557,7 @@ transformColumnRef(ParseState *pstate, ColumnRef *cref)
case EXPR_KIND_COPY_WHERE:
case EXPR_KIND_GENERATED_COLUMN:
case EXPR_KIND_CYCLE_MARK:
+ case EXPR_KIND_RPR_DEFINE:
/* okay */
break;
@@ -1770,6 +1771,7 @@ transformSubLink(ParseState *pstate, SubLink *sublink)
case EXPR_KIND_VALUES:
case EXPR_KIND_VALUES_SINGLE:
case EXPR_KIND_CYCLE_MARK:
+ case EXPR_KIND_RPR_DEFINE:
/* okay */
break;
case EXPR_KIND_CHECK_CONSTRAINT:
@@ -3149,6 +3151,8 @@ ParseExprKindName(ParseExprKind exprKind)
return "GENERATED AS";
case EXPR_KIND_CYCLE_MARK:
return "CYCLE";
+ case EXPR_KIND_RPR_DEFINE:
+ return "DEFINE";
/*
* There is intentionally no default: case here, so that the
diff --git a/src/backend/parser/parse_func.c b/src/backend/parser/parse_func.c
index b3f0b6a137..2ff3699538 100644
--- a/src/backend/parser/parse_func.c
+++ b/src/backend/parser/parse_func.c
@@ -2656,6 +2656,9 @@ check_srf_call_placement(ParseState *pstate, Node *last_srf, int location)
case EXPR_KIND_CYCLE_MARK:
errkind = true;
break;
+ case EXPR_KIND_RPR_DEFINE:
+ errkind = true;
+ break;
/*
* There is intentionally no default: case here, so that the
--
2.25.1
>From bb200a43d23e564dd4cc8ed24973d1638be668d7 Mon Sep 17 00:00:00 2001
From: Tatsuo Ishii <[email protected]>
Date: Tue, 12 Sep 2023 14:22:22 +0900
Subject: [PATCH v6 3/7] Row pattern recognition patch (planner).
---
src/backend/optimizer/plan/createplan.c | 23 ++++++++++++++++++-----
src/backend/optimizer/plan/setrefs.c | 23 +++++++++++++++++++++++
src/include/nodes/plannodes.h | 15 +++++++++++++++
3 files changed, 56 insertions(+), 5 deletions(-)
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 34ca6d4ac2..469fcd156b 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -286,9 +286,10 @@ static WindowAgg *make_windowagg(List *tlist, Index winref,
int ordNumCols, AttrNumber *ordColIdx, Oid *ordOperators, Oid *ordCollations,
int frameOptions, Node *startOffset, Node *endOffset,
Oid startInRangeFunc, Oid endInRangeFunc,
- Oid inRangeColl, bool inRangeAsc, bool inRangeNullsFirst,
- List *runCondition, List *qual, bool topWindow,
- Plan *lefttree);
+ Oid inRangeColl, bool inRangeAsc, bool inRangeNullsFirst, List *runCondition,
+ RPSkipTo rpSkipTo, List *patternVariable, List *patternRegexp, List *defineClause,
+ List *defineInitial,
+ List *qual, bool topWindow, Plan *lefttree);
static Group *make_group(List *tlist, List *qual, int numGroupCols,
AttrNumber *grpColIdx, Oid *grpOperators, Oid *grpCollations,
Plan *lefttree);
@@ -2698,6 +2699,11 @@ create_windowagg_plan(PlannerInfo *root, WindowAggPath *best_path)
wc->inRangeAsc,
wc->inRangeNullsFirst,
wc->runCondition,
+ wc->rpSkipTo,
+ wc->patternVariable,
+ wc->patternRegexp,
+ wc->defineClause,
+ wc->defineInitial,
best_path->qual,
best_path->topwindow,
subplan);
@@ -6601,8 +6607,10 @@ make_windowagg(List *tlist, Index winref,
int ordNumCols, AttrNumber *ordColIdx, Oid *ordOperators, Oid *ordCollations,
int frameOptions, Node *startOffset, Node *endOffset,
Oid startInRangeFunc, Oid endInRangeFunc,
- Oid inRangeColl, bool inRangeAsc, bool inRangeNullsFirst,
- List *runCondition, List *qual, bool topWindow, Plan *lefttree)
+ Oid inRangeColl, bool inRangeAsc, bool inRangeNullsFirst, List *runCondition,
+ RPSkipTo rpSkipTo, List *patternVariable, List *patternRegexp, List *defineClause,
+ List *defineInitial,
+ List *qual, bool topWindow, Plan *lefttree)
{
WindowAgg *node = makeNode(WindowAgg);
Plan *plan = &node->plan;
@@ -6628,6 +6636,11 @@ make_windowagg(List *tlist, Index winref,
node->inRangeAsc = inRangeAsc;
node->inRangeNullsFirst = inRangeNullsFirst;
node->topWindow = topWindow;
+ node->rpSkipTo = rpSkipTo,
+ node->patternVariable = patternVariable;
+ node->patternRegexp = patternRegexp;
+ node->defineClause = defineClause;
+ node->defineInitial = defineInitial;
plan->targetlist = tlist;
plan->lefttree = lefttree;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 97fa561e4e..2ed00b5d41 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -2456,6 +2456,29 @@ set_upper_references(PlannerInfo *root, Plan *plan, int rtoffset)
NRM_EQUAL,
NUM_EXEC_QUAL(plan));
+ /*
+ * Modifies an expression tree in each DEFINE clause so that all Var
+ * nodes reference outputs of a subplan.
+ */
+ if (IsA(plan, WindowAgg))
+ {
+ WindowAgg *wplan = (WindowAgg *) plan;
+
+ foreach(l, wplan->defineClause)
+ {
+ TargetEntry *tle = (TargetEntry *) lfirst(l);
+
+ tle->expr = (Expr *)
+ fix_upper_expr(root,
+ (Node *) tle->expr,
+ subplan_itlist,
+ OUTER_VAR,
+ rtoffset,
+ NRM_EQUAL,
+ NUM_EXEC_QUAL(plan));
+ }
+ }
+
pfree(subplan_itlist);
}
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 1b787fe031..afa27bb45a 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -1096,6 +1096,21 @@ typedef struct WindowAgg
/* nulls sort first for in_range tests? */
bool inRangeNullsFirst;
+ /* Row Pattern Recognition AFTER MACH SKIP clause */
+ RPSkipTo rpSkipTo; /* Row Pattern Skip To type */
+
+ /* Row Pattern PATTERN variable name (list of String) */
+ List *patternVariable;
+
+ /* Row Pattern RPATTERN regular expression quantifier ('+' or ''. list of String) */
+ List *patternRegexp;
+
+ /* Row Pattern DEFINE clause (list of TargetEntry) */
+ List *defineClause;
+
+ /* Row Pattern DEFINE variable initial names (list of String) */
+ List *defineInitial;
+
/*
* false for all apart from the WindowAgg that's closest to the root of
* the plan
--
2.25.1
>From a5b6559438ce9093a64cee4c4b0e7401c20b218d Mon Sep 17 00:00:00 2001
From: Tatsuo Ishii <[email protected]>
Date: Tue, 12 Sep 2023 14:22:22 +0900
Subject: [PATCH v6 4/7] Row pattern recognition patch (executor).
---
src/backend/executor/nodeWindowAgg.c | 842 ++++++++++++++++++++++++++-
src/backend/utils/adt/windowfuncs.c | 37 +-
src/include/catalog/pg_proc.dat | 6 +
src/include/nodes/execnodes.h | 26 +
4 files changed, 898 insertions(+), 13 deletions(-)
diff --git a/src/backend/executor/nodeWindowAgg.c b/src/backend/executor/nodeWindowAgg.c
index 310ac23e3a..32270d051a 100644
--- a/src/backend/executor/nodeWindowAgg.c
+++ b/src/backend/executor/nodeWindowAgg.c
@@ -36,6 +36,7 @@
#include "access/htup_details.h"
#include "catalog/objectaccess.h"
#include "catalog/pg_aggregate.h"
+#include "catalog/pg_collation_d.h"
#include "catalog/pg_proc.h"
#include "executor/executor.h"
#include "executor/nodeWindowAgg.h"
@@ -48,6 +49,7 @@
#include "utils/acl.h"
#include "utils/builtins.h"
#include "utils/datum.h"
+#include "utils/fmgroids.h"
#include "utils/expandeddatum.h"
#include "utils/lsyscache.h"
#include "utils/memutils.h"
@@ -182,8 +184,9 @@ static void begin_partition(WindowAggState *winstate);
static void spool_tuples(WindowAggState *winstate, int64 pos);
static void release_partition(WindowAggState *winstate);
-static int row_is_in_frame(WindowAggState *winstate, int64 pos,
+static int row_is_in_frame(WindowAggState *winstate, int64 pos,
TupleTableSlot *slot);
+
static void update_frameheadpos(WindowAggState *winstate);
static void update_frametailpos(WindowAggState *winstate);
static void update_grouptailpos(WindowAggState *winstate);
@@ -195,9 +198,32 @@ static Datum GetAggInitVal(Datum textInitVal, Oid transtype);
static bool are_peers(WindowAggState *winstate, TupleTableSlot *slot1,
TupleTableSlot *slot2);
-static bool window_gettupleslot(WindowObject winobj, int64 pos,
- TupleTableSlot *slot);
+static int WinGetSlotInFrame(WindowObject winobj, TupleTableSlot *slot,
+ int relpos, int seektype, bool set_mark,
+ bool *isnull, bool *isout);
+static bool window_gettupleslot(WindowObject winobj, int64 pos, TupleTableSlot *slot);
+
+static void attno_map(Node *node);
+static bool attno_map_walker(Node *node, void *context);
+static int row_is_in_reduced_frame(WindowObject winobj, int64 pos);
+static bool rpr_is_defined(WindowAggState *winstate);
+
+static void create_reduced_frame_map(WindowAggState *winstate);
+static int get_reduced_frame_map(WindowAggState *winstate, int64 pos);
+static void register_reduced_frame_map(WindowAggState *winstate, int64 pos, int val);
+static void clear_reduced_frame_map(WindowAggState *winstate);
+static void update_reduced_frame(WindowObject winobj, int64 pos);
+
+static int64 evaluate_pattern(WindowObject winobj, int64 current_pos,
+ char *vname, StringInfo encoded_str, bool *result);
+
+static bool get_slots(WindowObject winobj, int64 current_pos);
+
+static int search_str_set(char *pattern, StringInfo *str_set, int set_size);
+static void search_str_set_recurse(char *pattern, StringInfo *str_set, int set_size, int set_index,
+ char *encoded_str, int *resultlen);
+static char pattern_initial(WindowAggState *winstate, char *vname);
/*
* initialize_windowaggregate
@@ -673,6 +699,7 @@ eval_windowaggregates(WindowAggState *winstate)
WindowObject agg_winobj;
TupleTableSlot *agg_row_slot;
TupleTableSlot *temp_slot;
+ bool agg_result_isnull;
numaggs = winstate->numaggs;
if (numaggs == 0)
@@ -778,6 +805,9 @@ eval_windowaggregates(WindowAggState *winstate)
* Note that we don't strictly need to restart in the last case, but if
* we're going to remove all rows from the aggregation anyway, a restart
* surely is faster.
+ *
+ * - if RPR is enabled and skip mode is SKIP TO NEXT ROW,
+ * we restart aggregation too.
*----------
*/
numaggs_restart = 0;
@@ -788,8 +818,11 @@ eval_windowaggregates(WindowAggState *winstate)
(winstate->aggregatedbase != winstate->frameheadpos &&
!OidIsValid(peraggstate->invtransfn_oid)) ||
(winstate->frameOptions & FRAMEOPTION_EXCLUSION) ||
- winstate->aggregatedupto <= winstate->frameheadpos)
+ winstate->aggregatedupto <= winstate->frameheadpos ||
+ (rpr_is_defined(winstate) &&
+ winstate->rpSkipTo == ST_NEXT_ROW))
{
+ elog(DEBUG1, "peraggstate->restart is set");
peraggstate->restart = true;
numaggs_restart++;
}
@@ -861,8 +894,10 @@ eval_windowaggregates(WindowAggState *winstate)
* If we created a mark pointer for aggregates, keep it pushed up to frame
* head, so that tuplestore can discard unnecessary rows.
*/
+#ifdef NOT_USED
if (agg_winobj->markptr >= 0)
WinSetMarkPosition(agg_winobj, winstate->frameheadpos);
+#endif
/*
* Now restart the aggregates that require it.
@@ -917,6 +952,29 @@ eval_windowaggregates(WindowAggState *winstate)
{
winstate->aggregatedupto = winstate->frameheadpos;
ExecClearTuple(agg_row_slot);
+
+ /*
+ * If RPR is defined, we do not use aggregatedupto_nonrestarted. To
+ * avoid assertion failure below, we reset aggregatedupto_nonrestarted
+ * to frameheadpos.
+ */
+ if (rpr_is_defined(winstate))
+ aggregatedupto_nonrestarted = winstate->frameheadpos;
+ }
+
+ agg_result_isnull = false;
+ /* RPR is defined? */
+ if (rpr_is_defined(winstate))
+ {
+ /*
+ * If the skip mode is SKIP TO PAST LAST ROW and we already know that
+ * current row is a skipped row or an unmatched row, we don't need to
+ * accumulate rows, just return NULL.
+ */
+ if (winstate->rpSkipTo == ST_PAST_LAST_ROW &&
+ (get_reduced_frame_map(winstate, winstate->currentpos) == RF_SKIPPED ||
+ get_reduced_frame_map(winstate, winstate->currentpos) == RF_UNMATCHED))
+ agg_result_isnull = true;
}
/*
@@ -930,6 +988,11 @@ eval_windowaggregates(WindowAggState *winstate)
{
int ret;
+ elog(DEBUG1, "===== loop in frame starts: " INT64_FORMAT, winstate->aggregatedupto);
+
+ if (agg_result_isnull)
+ break;
+
/* Fetch next row if we didn't already */
if (TupIsNull(agg_row_slot))
{
@@ -945,9 +1008,28 @@ eval_windowaggregates(WindowAggState *winstate)
ret = row_is_in_frame(winstate, winstate->aggregatedupto, agg_row_slot);
if (ret < 0)
break;
+
if (ret == 0)
goto next_tuple;
+ if (rpr_is_defined(winstate))
+ {
+ /*
+ * If the row status at currentpos is already decided and current
+ * row status is not decided yet, it means we passed the last
+ * reduced frame. Time to break the loop.
+ */
+ if (get_reduced_frame_map(winstate, winstate->currentpos) != RF_NOT_DETERMINED &&
+ get_reduced_frame_map(winstate, winstate->aggregatedupto) == RF_NOT_DETERMINED)
+ break;
+ /*
+ * Otherwise we need to calculate the reduced frame.
+ */
+ ret = row_is_in_reduced_frame(winstate->agg_winobj, winstate->aggregatedupto);
+ if (ret == -1) /* unmatched row */
+ break;
+ }
+
/* Set tuple context for evaluation of aggregate arguments */
winstate->tmpcontext->ecxt_outertuple = agg_row_slot;
@@ -976,6 +1058,7 @@ next_tuple:
ExecClearTuple(agg_row_slot);
}
+
/* The frame's end is not supposed to move backwards, ever */
Assert(aggregatedupto_nonrestarted <= winstate->aggregatedupto);
@@ -996,6 +1079,16 @@ next_tuple:
peraggstate,
result, isnull);
+ /*
+ * RPR is enabled and we just return NULL. because skip mode is SKIP
+ * TO PAST LAST ROW and current row is skipped row or unmatched row.
+ */
+ if (agg_result_isnull)
+ {
+ *isnull = true;
+ *result = (Datum) 0;
+ }
+
/*
* save the result in case next row shares the same frame.
*
@@ -1090,6 +1183,7 @@ begin_partition(WindowAggState *winstate)
winstate->framehead_valid = false;
winstate->frametail_valid = false;
winstate->grouptail_valid = false;
+ create_reduced_frame_map(winstate);
winstate->spooled_rows = 0;
winstate->currentpos = 0;
winstate->frameheadpos = 0;
@@ -2053,6 +2147,8 @@ ExecWindowAgg(PlanState *pstate)
CHECK_FOR_INTERRUPTS();
+ elog(DEBUG1, "ExecWindowAgg called. pos: " INT64_FORMAT , winstate->currentpos);
+
if (winstate->status == WINDOWAGG_DONE)
return NULL;
@@ -2221,6 +2317,17 @@ ExecWindowAgg(PlanState *pstate)
/* don't evaluate the window functions when we're in pass-through mode */
if (winstate->status == WINDOWAGG_RUN)
{
+ /*
+ * If RPR is defined and skip mode is next row, we need to clear existing
+ * reduced frame info so that we newly calculate the info starting from
+ * current row.
+ */
+ if (rpr_is_defined(winstate))
+ {
+ if (winstate->rpSkipTo == ST_NEXT_ROW)
+ clear_reduced_frame_map(winstate);
+ }
+
/*
* Evaluate true window functions
*/
@@ -2388,6 +2495,9 @@ ExecInitWindowAgg(WindowAgg *node, EState *estate, int eflags)
TupleDesc scanDesc;
ListCell *l;
+ TargetEntry *te;
+ Expr *expr;
+
/* check for unsupported flags */
Assert(!(eflags & (EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK)));
@@ -2483,6 +2593,16 @@ ExecInitWindowAgg(WindowAgg *node, EState *estate, int eflags)
winstate->temp_slot_2 = ExecInitExtraTupleSlot(estate, scanDesc,
&TTSOpsMinimalTuple);
+ winstate->prev_slot = ExecInitExtraTupleSlot(estate, scanDesc,
+ &TTSOpsMinimalTuple);
+
+ winstate->next_slot = ExecInitExtraTupleSlot(estate, scanDesc,
+ &TTSOpsMinimalTuple);
+
+ winstate->null_slot = ExecInitExtraTupleSlot(estate, scanDesc,
+ &TTSOpsMinimalTuple);
+ winstate->null_slot = ExecStoreAllNullTuple(winstate->null_slot);
+
/*
* create frame head and tail slots only if needed (must create slots in
* exactly the same cases that update_frameheadpos and update_frametailpos
@@ -2667,6 +2787,39 @@ ExecInitWindowAgg(WindowAgg *node, EState *estate, int eflags)
winstate->inRangeAsc = node->inRangeAsc;
winstate->inRangeNullsFirst = node->inRangeNullsFirst;
+ /* Set up SKIP TO type */
+ winstate->rpSkipTo = node->rpSkipTo;
+ /* Set up row pattern recognition PATTERN clause */
+ winstate->patternVariableList = node->patternVariable;
+ winstate->patternRegexpList = node->patternRegexp;
+
+ /* Set up row pattern recognition DEFINE clause */
+ winstate->defineInitial = node->defineInitial;
+ winstate->defineVariableList = NIL;
+ winstate->defineClauseList = NIL;
+ if (node->defineClause != NIL)
+ {
+ /*
+ * Tweak arg var of PREV/NEXT so that it refers to scan/inner slot.
+ */
+ foreach(l, node->defineClause)
+ {
+ char *name;
+ ExprState *exps;
+
+ te = lfirst(l);
+ name = te->resname;
+ expr = te->expr;
+
+ elog(DEBUG1, "defineVariable name: %s", name);
+ winstate->defineVariableList = lappend(winstate->defineVariableList,
+ makeString(pstrdup(name)));
+ attno_map((Node *)expr);
+ exps = ExecInitExpr(expr, (PlanState *) winstate);
+ winstate->defineClauseList = lappend(winstate->defineClauseList, exps);
+ }
+ }
+
winstate->all_first = true;
winstate->partition_spooled = false;
winstate->more_partitions = false;
@@ -2674,6 +2827,57 @@ ExecInitWindowAgg(WindowAgg *node, EState *estate, int eflags)
return winstate;
}
+/*
+ * Rewrite varno of Var node that is the argument of PREV/NET so that it sees
+ * scan tuple (PREV) or inner tuple (NEXT).
+ */
+static void
+attno_map(Node *node)
+{
+ (void) expression_tree_walker(node, attno_map_walker, NULL);
+}
+
+static bool
+attno_map_walker(Node *node, void *context)
+{
+ FuncExpr *func;
+ int nargs;
+ Expr *expr;
+ Var *var;
+
+ if (node == NULL)
+ return false;
+
+ if (IsA(node, FuncExpr))
+ {
+ func = (FuncExpr *)node;
+
+ if (func->funcid == F_PREV || func->funcid == F_NEXT)
+ {
+ /* sanity check */
+ nargs = list_length(func->args);
+ if (list_length(func->args) != 1)
+ elog(ERROR, "PREV/NEXT must have 1 argument but function %d has %d args", func->funcid, nargs);
+
+ expr = (Expr *) lfirst(list_head(func->args));
+ if (!IsA(expr, Var))
+ elog(ERROR, "PREV/NEXT's arg is not Var"); /* XXX: is it possible that arg type is Const? */
+ var = (Var *)expr;
+
+ if (func->funcid == F_PREV)
+ /*
+ * Rewrite varno from OUTER_VAR to regular var no so that the
+ * var references scan tuple.
+ */
+ var->varno = var->varnosyn;
+ else
+ var->varno = INNER_VAR;
+ elog(DEBUG1, "PREV/NEXT's varno is rewritten to: %d", var->varno);
+ }
+ }
+ return expression_tree_walker(node, attno_map_walker, NULL);
+}
+
/* -----------------
* ExecEndWindowAgg
* -----------------
@@ -2691,6 +2895,8 @@ ExecEndWindowAgg(WindowAggState *node)
ExecClearTuple(node->agg_row_slot);
ExecClearTuple(node->temp_slot_1);
ExecClearTuple(node->temp_slot_2);
+ ExecClearTuple(node->prev_slot);
+ ExecClearTuple(node->next_slot);
if (node->framehead_slot)
ExecClearTuple(node->framehead_slot);
if (node->frametail_slot)
@@ -2740,6 +2946,8 @@ ExecReScanWindowAgg(WindowAggState *node)
ExecClearTuple(node->agg_row_slot);
ExecClearTuple(node->temp_slot_1);
ExecClearTuple(node->temp_slot_2);
+ ExecClearTuple(node->prev_slot);
+ ExecClearTuple(node->next_slot);
if (node->framehead_slot)
ExecClearTuple(node->framehead_slot);
if (node->frametail_slot)
@@ -3100,7 +3308,7 @@ window_gettupleslot(WindowObject winobj, int64 pos, TupleTableSlot *slot)
return false;
if (pos < winobj->markpos)
- elog(ERROR, "cannot fetch row before WindowObject's mark position");
+ elog(ERROR, "cannot fetch row: " INT64_FORMAT " before WindowObject's mark position: " INT64_FORMAT, pos, winobj->markpos );
oldcontext = MemoryContextSwitchTo(winstate->ss.ps.ps_ExprContext->ecxt_per_query_memory);
@@ -3420,14 +3628,54 @@ WinGetFuncArgInFrame(WindowObject winobj, int argno,
WindowAggState *winstate;
ExprContext *econtext;
TupleTableSlot *slot;
- int64 abs_pos;
- int64 mark_pos;
Assert(WindowObjectIsValid(winobj));
winstate = winobj->winstate;
econtext = winstate->ss.ps.ps_ExprContext;
slot = winstate->temp_slot_1;
+ if (WinGetSlotInFrame(winobj, slot,
+ relpos, seektype, set_mark,
+ isnull, isout) == 0)
+ {
+ econtext->ecxt_outertuple = slot;
+ return ExecEvalExpr((ExprState *) list_nth(winobj->argstates, argno),
+ econtext, isnull);
+ }
+
+ if (isout)
+ *isout = true;
+ *isnull = true;
+ return (Datum) 0;
+}
+
+/*
+ * WinGetSlotInFrame
+ * slot: TupleTableSlot to store the result
+ * relpos: signed rowcount offset from the seek position
+ * seektype: WINDOW_SEEK_HEAD or WINDOW_SEEK_TAIL
+ * set_mark: If the row is found/in frame and set_mark is true, the mark is
+ * moved to the row as a side-effect.
+ * isnull: output argument, receives isnull status of result
+ * isout: output argument, set to indicate whether target row position
+ * is out of frame (can pass NULL if caller doesn't care about this)
+ *
+ * Returns 0 if we successfullt got the slot. false if out of frame.
+ * (also isout is set)
+ */
+static int
+WinGetSlotInFrame(WindowObject winobj, TupleTableSlot *slot,
+ int relpos, int seektype, bool set_mark,
+ bool *isnull, bool *isout)
+{
+ WindowAggState *winstate;
+ int64 abs_pos;
+ int64 mark_pos;
+ int num_reduced_frame;
+
+ Assert(WindowObjectIsValid(winobj));
+ winstate = winobj->winstate;
+
switch (seektype)
{
case WINDOW_SEEK_CURRENT:
@@ -3494,11 +3742,21 @@ WinGetFuncArgInFrame(WindowObject winobj, int argno,
winstate->frameOptions);
break;
}
+ num_reduced_frame = row_is_in_reduced_frame(winobj, winstate->frameheadpos);
+ if (num_reduced_frame < 0)
+ goto out_of_frame;
+ else if (num_reduced_frame > 0)
+ if (relpos >= num_reduced_frame)
+ goto out_of_frame;
break;
case WINDOW_SEEK_TAIL:
/* rejecting relpos > 0 is easy and simplifies code below */
if (relpos > 0)
goto out_of_frame;
+
+ /* RPR cares about frame head pos. Need to call update_frameheadpos */
+ update_frameheadpos(winstate);
+
update_frametailpos(winstate);
abs_pos = winstate->frametailpos - 1 + relpos;
@@ -3565,6 +3823,12 @@ WinGetFuncArgInFrame(WindowObject winobj, int argno,
mark_pos = 0; /* keep compiler quiet */
break;
}
+
+ num_reduced_frame = row_is_in_reduced_frame(winobj, winstate->frameheadpos + relpos);
+ if (num_reduced_frame < 0)
+ goto out_of_frame;
+ else if (num_reduced_frame > 0)
+ abs_pos = winstate->frameheadpos + relpos + num_reduced_frame - 1;
break;
default:
elog(ERROR, "unrecognized window seek type: %d", seektype);
@@ -3583,15 +3847,13 @@ WinGetFuncArgInFrame(WindowObject winobj, int argno,
*isout = false;
if (set_mark)
WinSetMarkPosition(winobj, mark_pos);
- econtext->ecxt_outertuple = slot;
- return ExecEvalExpr((ExprState *) list_nth(winobj->argstates, argno),
- econtext, isnull);
+ return 0;
out_of_frame:
if (isout)
*isout = true;
*isnull = true;
- return (Datum) 0;
+ return -1;
}
/*
@@ -3622,3 +3884,561 @@ WinGetFuncArgCurrent(WindowObject winobj, int argno, bool *isnull)
return ExecEvalExpr((ExprState *) list_nth(winobj->argstates, argno),
econtext, isnull);
}
+
+/*
+ * rpr_is_defined
+ * return true if Row pattern recognition is defined.
+ */
+static
+bool rpr_is_defined(WindowAggState *winstate)
+{
+ return winstate->patternVariableList != NIL;
+}
+
+/*
+ * row_is_in_reduced_frame
+ * Determine whether a row is in the current row's reduced window frame according
+ * to row pattern matching
+ *
+ * The row must has been already determined that it is in a full window frame
+ * and fetched it into slot.
+ *
+ * Returns:
+ * = 0, RPR is not defined.
+ * >0, if the row is the first in the reduced frame. Return the number of rows in the reduced frame.
+ * -1, if the row is unmatched row
+ * -2, if the row is in the reduced frame but needed to be skipped because of
+ * AFTER MATCH SKIP PAST LAST ROW
+ */
+static
+int row_is_in_reduced_frame(WindowObject winobj, int64 pos)
+{
+ WindowAggState *winstate = winobj->winstate;
+ int state;
+ int rtn;
+
+ if (!rpr_is_defined(winstate))
+ {
+ /*
+ * RPR is not defined. Assume that we are always in the the reduced
+ * window frame.
+ */
+ rtn = 0;
+ elog(DEBUG1, "row_is_in_reduced_frame returns %d: pos: " INT64_FORMAT, rtn, pos);
+ return rtn;
+ }
+
+ state = get_reduced_frame_map(winstate, pos);
+
+ if (state == RF_NOT_DETERMINED)
+ {
+ update_frameheadpos(winstate);
+ update_reduced_frame(winobj, pos);
+ }
+
+ state = get_reduced_frame_map(winstate, pos);
+
+ switch (state)
+ {
+ int64 i;
+ int num_reduced_rows;
+
+ case RF_FRAME_HEAD:
+ num_reduced_rows = 1;
+ for (i = pos + 1; get_reduced_frame_map(winstate,i) == RF_SKIPPED; i++)
+ num_reduced_rows++;
+ rtn = num_reduced_rows;
+ break;
+
+ case RF_SKIPPED:
+ rtn = -2;
+ break;
+
+ case RF_UNMATCHED:
+ rtn = -1;
+ break;
+
+ default:
+ elog(ERROR, "Unrecognized state: %d at: " INT64_FORMAT, state, pos);
+ break;
+ }
+
+ elog(DEBUG1, "row_is_in_reduced_frame returns %d: pos: " INT64_FORMAT, rtn, pos);
+ return rtn;
+}
+
+#define REDUCED_FRAME_MAP_INIT_SIZE 1024L
+
+/*
+ * Create reduced frame map
+ */
+static
+void create_reduced_frame_map(WindowAggState *winstate)
+{
+ winstate->reduced_frame_map =
+ MemoryContextAlloc(winstate->partcontext, REDUCED_FRAME_MAP_INIT_SIZE);
+ winstate->alloc_sz = REDUCED_FRAME_MAP_INIT_SIZE;
+ clear_reduced_frame_map(winstate);
+}
+
+/*
+ * Clear reduced frame map
+ */
+static
+void clear_reduced_frame_map(WindowAggState *winstate)
+{
+ Assert(winstate->reduced_frame_map != NULL);
+ MemSet(winstate->reduced_frame_map, RF_NOT_DETERMINED,
+ winstate->alloc_sz);
+}
+
+/*
+ * Get reduced frame map specified by pos
+ */
+static
+int get_reduced_frame_map(WindowAggState *winstate, int64 pos)
+{
+ Assert(winstate->reduced_frame_map != NULL);
+
+ if (pos < 0 || pos >= winstate->alloc_sz)
+ elog(ERROR, "wrong pos: " INT64_FORMAT, pos);
+
+ return winstate->reduced_frame_map[pos];
+}
+
+/*
+ * Add/replace reduced frame map member at pos.
+ * If there's no enough space, expand the map.
+ */
+static
+void register_reduced_frame_map(WindowAggState *winstate, int64 pos, int val)
+{
+ int64 realloc_sz;
+
+ Assert(winstate->reduced_frame_map != NULL);
+
+ if (pos < 0)
+ elog(ERROR, "wrong pos: " INT64_FORMAT, pos);
+
+ if (pos > winstate->alloc_sz - 1)
+ {
+ realloc_sz = winstate->alloc_sz * 2;
+
+ winstate->reduced_frame_map =
+ repalloc(winstate->reduced_frame_map, realloc_sz);
+
+ MemSet(winstate->reduced_frame_map + winstate->alloc_sz,
+ RF_NOT_DETERMINED, realloc_sz - winstate->alloc_sz);
+
+ winstate->alloc_sz = realloc_sz;
+ }
+
+ winstate->reduced_frame_map[pos] = val;
+}
+
+/*
+ * update_reduced_frame
+ * Update reduced frame info.
+ */
+static
+void update_reduced_frame(WindowObject winobj, int64 pos)
+{
+ WindowAggState *winstate = winobj->winstate;
+ ListCell *lc1, *lc2;
+ bool expression_result;
+ int num_matched_rows;
+ int64 original_pos;
+ bool anymatch;
+ StringInfo encoded_str;
+ StringInfo pattern_str = makeStringInfo();
+
+ /*
+ * Array of pattern variables evaluted to true.
+ * Each character corresponds to pattern variable.
+ * Example:
+ * str_set[0] = "AB";
+ * str_set[1] = "AC";
+ * In this case at row 0 A and B are true, and A and C are true in row 1.
+ */
+ #define ENCODED_STR_ARRAY_ALLOC_SIZE 128
+ StringInfo *str_set = NULL;
+ int str_set_index;
+ int str_set_size;
+
+ /* save original pos */
+ original_pos = pos;
+
+ /*
+ * Loop over until none of pattern matches or encounters end of frame.
+ */
+ for (;;)
+ {
+ int64 result_pos = -1;
+
+ /*
+ * Loop over each PATTERN variable.
+ */
+ anymatch = false;
+ encoded_str = makeStringInfo();
+
+ forboth(lc1, winstate->patternVariableList, lc2, winstate->patternRegexpList)
+ {
+ char *vname = strVal(lfirst(lc1));
+ char *quantifier = strVal(lfirst(lc2));
+
+ elog(DEBUG1, "pos: " INT64_FORMAT " pattern vname: %s quantifier: %s", pos, vname, quantifier);
+
+ expression_result = false;
+
+ /* evaluate row pattern against current row */
+ result_pos = evaluate_pattern(winobj, pos, vname, encoded_str, &expression_result);
+ if (expression_result)
+ {
+ elog(DEBUG1, "expression result is true");
+ anymatch = true;
+ }
+
+ /*
+ * If out of frame, we are done.
+ */
+ if (result_pos < 0)
+ break;
+ }
+
+ if (!anymatch)
+ {
+ /* none of patterns matched. */
+ break;
+ }
+
+ /* build encoded string array */
+ if (str_set == NULL)
+ {
+ str_set_index = 0;
+ str_set_size = ENCODED_STR_ARRAY_ALLOC_SIZE * sizeof(StringInfo);
+ str_set = palloc(str_set_size);
+ }
+
+ str_set[str_set_index++] = encoded_str;
+
+ elog(DEBUG1, "pos: " INT64_FORMAT " str_set_index: %d encoded_str: %s", pos, str_set_index, encoded_str->data);
+
+ if (str_set_index >= str_set_size)
+ {
+ str_set_size *= 2;
+ str_set = repalloc(str_set, str_set_size);
+ }
+
+ /* move to next row */
+ pos++;
+
+ if (result_pos < 0)
+ {
+ /* out of frame */
+ break;
+ }
+ }
+
+ if (str_set == NULL)
+ {
+ /* no match found in the first row */
+ register_reduced_frame_map(winstate, original_pos, RF_UNMATCHED);
+ return;
+ }
+
+ elog(DEBUG2, "pos: " INT64_FORMAT " encoded_str: %s", pos, encoded_str->data);
+
+ /* build regular expression */
+ pattern_str = makeStringInfo();
+ appendStringInfoChar(pattern_str, '^');
+ forboth (lc1, winstate->patternVariableList, lc2, winstate->patternRegexpList)
+ {
+ char *vname = strVal(lfirst(lc1));
+ char *quantifier = strVal(lfirst(lc2));
+ char initial;
+
+ initial = pattern_initial(winstate, vname);
+ Assert(initial != 0);
+ appendStringInfoChar(pattern_str, initial);
+ if (quantifier[0])
+ appendStringInfoChar(pattern_str, quantifier[0]);
+ elog(DEBUG1, "vname: %s initial: %c quantifier: %s", vname, initial, quantifier);
+ }
+
+ elog(DEBUG2, "pos: " INT64_FORMAT " pattern: %s", pos, pattern_str->data);
+
+ /* look for matching pattern variable sequence */
+ num_matched_rows = search_str_set(pattern_str->data, str_set, str_set_index);
+ /*
+ * We are at the first row in the reduced frame. Save the number of
+ * matched rows as the number of rows in the reduced frame.
+ */
+ if (num_matched_rows <= 0)
+ {
+ /* no match */
+ register_reduced_frame_map(winstate, original_pos, RF_UNMATCHED);
+ }
+ else
+ {
+ int64 i;
+
+ register_reduced_frame_map(winstate, original_pos, RF_FRAME_HEAD);
+
+ for (i = original_pos + 1; i < original_pos + num_matched_rows; i++)
+ {
+ register_reduced_frame_map(winstate, i, RF_SKIPPED);
+ }
+ }
+
+ return;
+}
+
+/*
+ * search set of encode_str.
+ * set_size: size of set_str array.
+ */
+static
+int search_str_set(char *pattern, StringInfo *str_set, int set_size)
+{
+ char *encoded_str = palloc0(set_size+1);
+ int resultlen = 0;
+
+ search_str_set_recurse(pattern, str_set, set_size, 0, encoded_str, &resultlen);
+ elog(DEBUG1, "search_str_set returns %d", resultlen);
+ return resultlen;
+}
+
+/*
+ * Workhorse of search_str_set.
+ */
+static
+void search_str_set_recurse(char *pattern, StringInfo *str_set,
+ int set_size, int set_index, char *encoded_str, int *resultlen)
+{
+ char *p;
+
+ if (set_index >= set_size)
+ {
+ Datum d;
+ text *res;
+ char *substr;
+
+ /*
+ * We first perform pattern matching using regexp_instr, then call
+ * textregexsubstr to get matched substring to know how log the
+ * matched string is. That is the number of rows in the reduced window
+ * frame. The reason why we can't call textregexsubstr is, it error
+ * out if pattern is not match.
+ */
+ if (DatumGetInt32(DirectFunctionCall2Coll(regexp_instr, DEFAULT_COLLATION_OID,
+ PointerGetDatum(cstring_to_text(encoded_str)),
+ PointerGetDatum(cstring_to_text(pattern)))) > 0)
+ {
+ d = DirectFunctionCall2Coll(textregexsubstr,
+ DEFAULT_COLLATION_OID,
+ PointerGetDatum(cstring_to_text(encoded_str)),
+ PointerGetDatum(cstring_to_text(pattern)));
+ if (d != 0)
+ {
+ int len;
+
+ res = DatumGetTextPP(d);
+ substr = text_to_cstring(res);
+ len = strlen(substr);
+ if (len > *resultlen)
+ /* remember the longest match */
+ *resultlen = len;
+ }
+ }
+ return;
+ }
+
+ p = str_set[set_index]->data;
+ while (*p)
+ {
+ encoded_str[set_index] = *p;
+ p++;
+ search_str_set_recurse(pattern, str_set, set_size, set_index + 1, encoded_str, resultlen);
+ }
+}
+
+
+/*
+ * Evaluate expression associated with PATTERN variable vname.
+ * relpos is relative row position in a frame (starting from 0).
+ * "quantifier" is the quatifier part of the PATTERN regular expression.
+ * Currently only '+' is allowed.
+ * result is out paramater representing the expression evaluation result
+ * is true of false.
+ * Return values are:
+ * >=0: the last match absolute row position
+ * other wise out of frame.
+ */
+static
+int64 evaluate_pattern(WindowObject winobj, int64 current_pos,
+ char *vname, StringInfo encoded_str, bool *result)
+{
+ WindowAggState *winstate = winobj->winstate;
+ ExprContext *econtext = winstate->ss.ps.ps_ExprContext;
+ ListCell *lc1, *lc2, *lc3;
+ ExprState *pat;
+ Datum eval_result;
+ bool out_of_frame = false;
+ bool isnull;
+
+ forthree (lc1, winstate->defineVariableList, lc2, winstate->defineClauseList, lc3, winstate->defineInitial)
+ {
+ char initial;
+ char *name = strVal(lfirst(lc1));
+
+ if (strcmp(vname, name))
+ continue;
+
+ initial = *(strVal(lfirst(lc3)));
+
+ /* set expression to evaluate */
+ pat = lfirst(lc2);
+
+ /* get current, previous and next tuples */
+ if (!get_slots(winobj, current_pos))
+ {
+ out_of_frame = true;
+ }
+ else
+ {
+ /* evaluate the expression */
+ eval_result = ExecEvalExpr(pat, econtext, &isnull);
+ if (isnull)
+ {
+ /* expression is NULL */
+ elog(DEBUG1, "expression for %s is NULL at row: " INT64_FORMAT, vname, current_pos);
+ *result = false;
+ }
+ else
+ {
+ if (!DatumGetBool(eval_result))
+ {
+ /* expression is false */
+ elog(DEBUG1, "expression for %s is false at row: " INT64_FORMAT, vname, current_pos);
+ *result = false;
+ }
+ else
+ {
+ /* expression is true */
+ elog(DEBUG1, "expression for %s is true at row: " INT64_FORMAT, vname, current_pos);
+ appendStringInfoChar(encoded_str, initial);
+ *result = true;
+ }
+ }
+ break;
+ }
+
+ if (out_of_frame)
+ {
+ *result = false;
+ return -1;
+ }
+ }
+ return current_pos;
+}
+
+/*
+ * Get current, previous and next tuples.
+ * Returns false if current row is out of partition/full frame.
+ */
+static
+bool get_slots(WindowObject winobj, int64 current_pos)
+{
+ WindowAggState *winstate = winobj->winstate;
+ TupleTableSlot *slot;
+ int ret;
+ ExprContext *econtext;
+
+ econtext = winstate->ss.ps.ps_ExprContext;
+
+ /* set up current row tuple slot */
+ slot = winstate->temp_slot_1;
+ if (!window_gettupleslot(winobj, current_pos, slot))
+ {
+ elog(DEBUG1, "current row is out of partition at:" INT64_FORMAT, current_pos);
+ return false;
+
+ ret = row_is_in_frame(winstate, current_pos, slot);
+ if (ret <= 0)
+ {
+ elog(DEBUG1, "current row is out of frame at: " INT64_FORMAT, current_pos);
+ return false;
+ }
+ }
+ econtext->ecxt_outertuple = slot;
+
+ /* for PREV */
+ if (current_pos > 0)
+ {
+ slot = winstate->prev_slot;
+ if (!window_gettupleslot(winobj, current_pos - 1, slot))
+ {
+ elog(DEBUG1, "previous row is out of partition at: " INT64_FORMAT, current_pos - 1);
+ econtext->ecxt_scantuple = winstate->null_slot;
+ }
+ else
+ {
+ ret = row_is_in_frame(winstate, current_pos - 1, slot);
+ if (ret <= 0)
+ {
+ elog(DEBUG1, "previous row is out of frame at: " INT64_FORMAT, current_pos - 1);
+ econtext->ecxt_scantuple = winstate->null_slot;
+ }
+ else
+ {
+ econtext->ecxt_scantuple = slot;
+ }
+ }
+ }
+ else
+ econtext->ecxt_scantuple = winstate->null_slot;
+
+ /* for NEXT */
+ slot = winstate->next_slot;
+ if (!window_gettupleslot(winobj, current_pos + 1, slot))
+ {
+ elog(DEBUG1, "next row is out of partiton at: " INT64_FORMAT, current_pos + 1);
+ econtext->ecxt_innertuple = winstate->null_slot;
+ }
+ else
+ {
+ ret = row_is_in_frame(winstate, current_pos + 1, slot);
+ if (ret <= 0)
+ {
+ elog(DEBUG1, "next row is out of frame at: " INT64_FORMAT, current_pos + 1);
+ econtext->ecxt_innertuple = winstate->null_slot;
+ }
+ else
+ econtext->ecxt_innertuple = slot;
+ }
+ return true;
+}
+
+/*
+ * Return pattern variable initial character
+ * matching with pattern variable name vname.
+ * If not found, return 0.
+ */
+static
+char pattern_initial(WindowAggState *winstate, char *vname)
+{
+ char initial;
+ char *name;
+ ListCell *lc1, *lc2;
+
+ forboth (lc1, winstate->defineVariableList, lc2, winstate->defineInitial)
+ {
+ name = strVal(lfirst(lc1)); /* DEFINE variable name */
+ initial = *(strVal(lfirst(lc2))); /* DEFINE variable initial */
+
+
+ if (!strcmp(name, vname))
+ return initial; /* found */
+ }
+ return 0;
+}
diff --git a/src/backend/utils/adt/windowfuncs.c b/src/backend/utils/adt/windowfuncs.c
index b87a624fb2..9ebcc7b5d2 100644
--- a/src/backend/utils/adt/windowfuncs.c
+++ b/src/backend/utils/adt/windowfuncs.c
@@ -13,6 +13,9 @@
*/
#include "postgres.h"
+#include "catalog/pg_collation_d.h"
+#include "executor/executor.h"
+#include "nodes/execnodes.h"
#include "nodes/supportnodes.h"
#include "utils/builtins.h"
#include "windowapi.h"
@@ -36,11 +39,19 @@ typedef struct
int64 remainder; /* (total rows) % (bucket num) */
} ntile_context;
+/*
+ * rpr process information.
+ * Used for AFTER MATCH SKIP PAST LAST ROW
+ */
+typedef struct SkipContext
+{
+ int64 pos; /* last row absolute position */
+} SkipContext;
+
static bool rank_up(WindowObject winobj);
static Datum leadlag_common(FunctionCallInfo fcinfo,
bool forward, bool withoffset, bool withdefault);
-
/*
* utility routine for *_rank functions.
*/
@@ -673,7 +684,7 @@ window_last_value(PG_FUNCTION_ARGS)
bool isnull;
result = WinGetFuncArgInFrame(winobj, 0,
- 0, WINDOW_SEEK_TAIL, true,
+ 0, WINDOW_SEEK_TAIL, false,
&isnull, NULL);
if (isnull)
PG_RETURN_NULL();
@@ -713,3 +724,25 @@ window_nth_value(PG_FUNCTION_ARGS)
PG_RETURN_DATUM(result);
}
+
+/*
+ * prev
+ * Dummy function to invoke RPR's navigation operator "PREV".
+ * This is *not* a window function.
+ */
+Datum
+window_prev(PG_FUNCTION_ARGS)
+{
+ PG_RETURN_DATUM(PG_GETARG_DATUM(0));
+}
+
+/*
+ * next
+ * Dummy function to invoke RPR's navigation operation "NEXT".
+ * This is *not* a window function.
+ */
+Datum
+window_next(PG_FUNCTION_ARGS)
+{
+ PG_RETURN_DATUM(PG_GETARG_DATUM(0));
+}
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 9805bc6118..d20f803cf5 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -10416,6 +10416,12 @@
{ oid => '3114', descr => 'fetch the Nth row value',
proname => 'nth_value', prokind => 'w', prorettype => 'anyelement',
proargtypes => 'anyelement int4', prosrc => 'window_nth_value' },
+{ oid => '6122', descr => 'previous value',
+ proname => 'prev', provolatile => 's', prorettype => 'anyelement',
+ proargtypes => 'anyelement', prosrc => 'window_prev' },
+{ oid => '6123', descr => 'next value',
+ proname => 'next', provolatile => 's', prorettype => 'anyelement',
+ proargtypes => 'anyelement', prosrc => 'window_next' },
# functions for range types
{ oid => '3832', descr => 'I/O',
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index cb714f4a19..63feb68f60 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -2471,6 +2471,11 @@ typedef enum WindowAggStatus
* tuples during spool */
} WindowAggStatus;
+#define RF_NOT_DETERMINED 0
+#define RF_FRAME_HEAD 1
+#define RF_SKIPPED 2
+#define RF_UNMATCHED 3
+
typedef struct WindowAggState
{
ScanState ss; /* its first field is NodeTag */
@@ -2519,6 +2524,15 @@ typedef struct WindowAggState
int64 groupheadpos; /* current row's peer group head position */
int64 grouptailpos; /* " " " " tail position (group end+1) */
+ /* these fields are used in Row pattern recognition: */
+ RPSkipTo rpSkipTo; /* Row Pattern Skip To type */
+ List *patternVariableList; /* list of row pattern variables names (list of String) */
+ List *patternRegexpList; /* list of row pattern regular expressions ('+' or ''. list of String) */
+ List *defineVariableList; /* list of row pattern definition variables (list of String) */
+ List *defineClauseList; /* expression for row pattern definition
+ * search conditions ExprState list */
+ List *defineInitial; /* list of row pattern definition variable initials (list of String) */
+
MemoryContext partcontext; /* context for partition-lifespan data */
MemoryContext aggcontext; /* shared context for aggregate working data */
MemoryContext curaggcontext; /* current aggregate's working data */
@@ -2555,6 +2569,18 @@ typedef struct WindowAggState
TupleTableSlot *agg_row_slot;
TupleTableSlot *temp_slot_1;
TupleTableSlot *temp_slot_2;
+
+ /* temporary slots for RPR */
+ TupleTableSlot *prev_slot; /* PREV row navigation operator */
+ TupleTableSlot *next_slot; /* NEXT row navigation operator */
+ TupleTableSlot *null_slot; /* all NULL slot */
+
+ /*
+ * Each byte corresponds to a row positioned at absolute its pos in
+ * partition. See above definition for RF_*
+ */
+ char *reduced_frame_map;
+ int64 alloc_sz; /* size of the map */
} WindowAggState;
/* ----------------
--
2.25.1
>From 69921d78b024bd860a6f37c238b7b602d98bf9b7 Mon Sep 17 00:00:00 2001
From: Tatsuo Ishii <[email protected]>
Date: Tue, 12 Sep 2023 14:22:22 +0900
Subject: [PATCH v6 5/7] Row pattern recognition patch (docs).
---
doc/src/sgml/advanced.sgml | 52 ++++++++++++++++++++++++++++++++++
doc/src/sgml/func.sgml | 54 ++++++++++++++++++++++++++++++++++++
doc/src/sgml/ref/select.sgml | 38 +++++++++++++++++++++++--
3 files changed, 142 insertions(+), 2 deletions(-)
diff --git a/doc/src/sgml/advanced.sgml b/doc/src/sgml/advanced.sgml
index 755c9f1485..eda3612822 100644
--- a/doc/src/sgml/advanced.sgml
+++ b/doc/src/sgml/advanced.sgml
@@ -537,6 +537,58 @@ WHERE pos < 3;
<literal>rank</literal> less than 3.
</para>
+ <para>
+ Row pattern common syntax can be used with row pattern common syntax to
+ perform row pattern recognition in a query. Row pattern common syntax
+ includes two sub clauses. <literal>DEFINE</literal> defines definition
+ variables along with an expression. The expression must be a logical
+ expression, which means it must
+ return <literal>TRUE</literal>, <literal>FALSE</literal>
+ or <literal>NULL</literal>. Moreover if the expression comprises a column
+ reference, it must be the argument of <function>rpr</function>. An example
+ of <literal>DEFINE</literal> is as follows.
+
+<programlisting>
+DEFINE
+ LOWPRICE AS price <= 100,
+ UP AS price > PREV(price),
+ DOWN AS price < PREV(price)
+</programlisting>
+
+ Note that <function>PREV</function> returns price column in the previous
+ row if it's called in a context of row pattern recognition. So in the
+ second line means the definition variable "UP" is <literal>TRUE</literal>
+ when price column in the current row is greater than the price column in
+ the previous row. Likewise, "DOWN" is <literal>TRUE</literal> when when
+ price column in the current row is lower than the price column in the
+ previous row.
+ </para>
+ <para>
+ Once <literal>DEFINE</literal> exists, <literal>PATTERN</literal> can be
+ used. <literal>PATTERN</literal> defines a sequence of rows that satisfies
+ certain conditions. For example following <literal>PATTERN</literal>
+ defines that a row starts with the condition "LOWPRICE", then one or more
+ rows satisfy "UP" and finally one or more rows satisfy "DOWN". If a
+ sequence of rows found, rpr returns the column at the starting row.
+ Example of a <literal>SELECT</literal> using the <literal>DEFINE</literal>
+ and <literal>PATTERN</literal> clause is as follows.
+
+<programlisting>
+SELECT company, tdate, price, max(price) OVER w FROM stock
+ WINDOW w AS (
+ PARTITION BY company
+ ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
+ AFTER MATCH SKIP PAST LAST ROW
+ INITIAL
+ PATTERN (LOWPRICE UP+ DOWN+)
+ DEFINE
+ LOWPRICE AS price <= 100,
+ UP AS price > PREV(price),
+ DOWN AS price < PREV(price)
+);
+</programlisting>
+ </para>
+
<para>
When a query involves multiple window functions, it is possible to write
out each one with a separate <literal>OVER</literal> clause, but this is
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 24ad87f910..9c99dda4ae 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -21780,6 +21780,7 @@ SELECT count(*) FROM sometable;
returns <literal>NULL</literal> if there is no such row.
</para></entry>
</row>
+
</tbody>
</tgroup>
</table>
@@ -21819,6 +21820,59 @@ SELECT count(*) FROM sometable;
Other frame specifications can be used to obtain other effects.
</para>
+ <para>
+ Row pattern recognition navigation functions are listed in
+ <xref linkend="functions-rpr-navigation-table"/>. These functions
+ can be used to describe DEFINE clause of Row pattern recognition.
+ </para>
+
+ <table id="functions-rpr-navigation-table">
+ <title>Row Pattern Navigation Functions</title>
+ <tgroup cols="1">
+ <thead>
+ <row>
+ <entry role="func_table_entry"><para role="func_signature">
+ Function
+ </para>
+ <para>
+ Description
+ </para></entry>
+ </row>
+ </thead>
+
+ <tbody>
+ <row>
+ <entry role="func_table_entry"><para role="func_signature">
+ <indexterm>
+ <primary>prev</primary>
+ </indexterm>
+ <function>prev</function> ( <parameter>value</parameter> <type>anyelement</type> )
+ <returnvalue>anyelement</returnvalue>
+ </para>
+ <para>
+ Returns the column value at the previous row;
+ returns NULL if there is no previous row in the window frame.
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="func_table_entry"><para role="func_signature">
+ <indexterm>
+ <primary>next</primary>
+ </indexterm>
+ <function>next</function> ( <parameter>value</parameter> <type>anyelement</type> )
+ <returnvalue>anyelement</returnvalue>
+ </para>
+ <para>
+ Returns the column value at the next row;
+ returns NULL if there is no next row in the window frame.
+ </para></entry>
+ </row>
+
+ </tbody>
+ </tgroup>
+ </table>
+
<note>
<para>
The SQL standard defines a <literal>RESPECT NULLS</literal> or
diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml
index 0ee0cc7e64..8d3becd57a 100644
--- a/doc/src/sgml/ref/select.sgml
+++ b/doc/src/sgml/ref/select.sgml
@@ -966,8 +966,8 @@ WINDOW <replaceable class="parameter">window_name</replaceable> AS ( <replaceabl
The <replaceable class="parameter">frame_clause</replaceable> can be one of
<synopsis>
-{ RANGE | ROWS | GROUPS } <replaceable>frame_start</replaceable> [ <replaceable>frame_exclusion</replaceable> ]
-{ RANGE | ROWS | GROUPS } BETWEEN <replaceable>frame_start</replaceable> AND <replaceable>frame_end</replaceable> [ <replaceable>frame_exclusion</replaceable> ]
+{ RANGE | ROWS | GROUPS } <replaceable>frame_start</replaceable> [ <replaceable>frame_exclusion</replaceable> ] [row_pattern_common_syntax]
+{ RANGE | ROWS | GROUPS } BETWEEN <replaceable>frame_start</replaceable> AND <replaceable>frame_end</replaceable> [ <replaceable>frame_exclusion</replaceable> ] [row_pattern_common_syntax]
</synopsis>
where <replaceable>frame_start</replaceable>
@@ -1074,6 +1074,40 @@ EXCLUDE NO OTHERS
a given peer group will be in the frame or excluded from it.
</para>
+ <para>
+ The
+ optional <replaceable class="parameter">row_pattern_common_syntax</replaceable>
+ defines the <firstterm>row pattern recognition condition</firstterm> for
+ this
+ window. <replaceable class="parameter">row_pattern_common_syntax</replaceable>
+ includes following subclauses. <literal>AFTER MATCH SKIP PAST LAST
+ ROW</literal> or <literal>AFTER MATCH SKIP TO NEXT ROW</literal> controls
+ how to proceed to next row position after a match
+ found. With <literal>AFTER MATCH SKIP PAST LAST ROW</literal> (the
+ default) next row position is next to the last row of previous match. On
+ the other hand, with <literal>AFTER MATCH SKIP TO NEXT ROW</literal> next
+ row position is always next to the last row of previous
+ match. <literal>DEFINE</literal> defines definition variables along with a
+ boolean expression. <literal>PATTERN</literal> defines a sequence of rows
+ that satisfies certain conditions using variables defined
+ in <literal>DEFINE</literal> clause. If the variable is not defined in
+ the <literal>DEFINE</literal> clause, it is implicitly assumed
+ following is defined in the <literal>DEFINE</literal> clause.
+
+<synopsis>
+<literal>variable_name</literal> AS TRUE
+</synopsis>
+
+ Note that the maximu number of variables defined
+ in <literal>DEFINE</literal> clause is 26.
+
+<synopsis>
+[ AFTER MATCH SKIP PAST LAST ROW | AFTER MATCH SKIP TO NEXT ROW ]
+PATTERN <replaceable class="parameter">pattern_variable_name</replaceable>[+] [, ...]
+DEFINE <replaceable class="parameter">definition_varible_name</replaceable> AS <replaceable class="parameter">expression</replaceable> [, ...]
+</synopsis>
+ </para>
+
<para>
The purpose of a <literal>WINDOW</literal> clause is to specify the
behavior of <firstterm>window functions</firstterm> appearing in the query's
--
2.25.1
>From 3327d2b35be67523eeb0066a1e68bb9d62cba699 Mon Sep 17 00:00:00 2001
From: Tatsuo Ishii <[email protected]>
Date: Tue, 12 Sep 2023 14:22:22 +0900
Subject: [PATCH v6 6/7] Row pattern recognition patch (tests).
---
src/test/regress/expected/rpr.out | 594 +++++++++++++++++++++++++++++
src/test/regress/parallel_schedule | 2 +-
src/test/regress/sql/rpr.sql | 292 ++++++++++++++
3 files changed, 887 insertions(+), 1 deletion(-)
create mode 100644 src/test/regress/expected/rpr.out
create mode 100644 src/test/regress/sql/rpr.sql
diff --git a/src/test/regress/expected/rpr.out b/src/test/regress/expected/rpr.out
new file mode 100644
index 0000000000..63bed05f05
--- /dev/null
+++ b/src/test/regress/expected/rpr.out
@@ -0,0 +1,594 @@
+--
+-- Test for row pattern definition clause
+--
+CREATE TEMP TABLE stock (
+ company TEXT,
+ tdate DATE,
+ price INTEGER
+ );
+INSERT INTO stock VALUES ('company1', '2023-07-01', 100);
+INSERT INTO stock VALUES ('company1', '2023-07-02', 200);
+INSERT INTO stock VALUES ('company1', '2023-07-03', 150);
+INSERT INTO stock VALUES ('company1', '2023-07-04', 140);
+INSERT INTO stock VALUES ('company1', '2023-07-05', 150);
+INSERT INTO stock VALUES ('company1', '2023-07-06', 90);
+INSERT INTO stock VALUES ('company1', '2023-07-07', 110);
+INSERT INTO stock VALUES ('company1', '2023-07-08', 130);
+INSERT INTO stock VALUES ('company1', '2023-07-09', 120);
+INSERT INTO stock VALUES ('company1', '2023-07-10', 130);
+INSERT INTO stock VALUES ('company2', '2023-07-01', 50);
+INSERT INTO stock VALUES ('company2', '2023-07-02', 2000);
+INSERT INTO stock VALUES ('company2', '2023-07-03', 1500);
+INSERT INTO stock VALUES ('company2', '2023-07-04', 1400);
+INSERT INTO stock VALUES ('company2', '2023-07-05', 1500);
+INSERT INTO stock VALUES ('company2', '2023-07-06', 60);
+INSERT INTO stock VALUES ('company2', '2023-07-07', 1100);
+INSERT INTO stock VALUES ('company2', '2023-07-08', 1300);
+INSERT INTO stock VALUES ('company2', '2023-07-09', 1200);
+INSERT INTO stock VALUES ('company2', '2023-07-10', 1300);
+SELECT * FROM stock;
+ company | tdate | price
+----------+------------+-------
+ company1 | 07-01-2023 | 100
+ company1 | 07-02-2023 | 200
+ company1 | 07-03-2023 | 150
+ company1 | 07-04-2023 | 140
+ company1 | 07-05-2023 | 150
+ company1 | 07-06-2023 | 90
+ company1 | 07-07-2023 | 110
+ company1 | 07-08-2023 | 130
+ company1 | 07-09-2023 | 120
+ company1 | 07-10-2023 | 130
+ company2 | 07-01-2023 | 50
+ company2 | 07-02-2023 | 2000
+ company2 | 07-03-2023 | 1500
+ company2 | 07-04-2023 | 1400
+ company2 | 07-05-2023 | 1500
+ company2 | 07-06-2023 | 60
+ company2 | 07-07-2023 | 1100
+ company2 | 07-08-2023 | 1300
+ company2 | 07-09-2023 | 1200
+ company2 | 07-10-2023 | 1300
+(20 rows)
+
+-- basic test using PREV
+SELECT company, tdate, price, first_value(price) OVER w, last_value(price) OVER w,
+ nth_value(tdate, 2) OVER w AS nth_second
+ FROM stock
+ WINDOW w AS (
+ PARTITION BY company
+ ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
+ INITIAL
+ PATTERN (START UP+ DOWN+)
+ DEFINE
+ START AS TRUE,
+ UP AS price > PREV(price),
+ DOWN AS price < PREV(price)
+);
+ company | tdate | price | first_value | last_value | nth_second
+----------+------------+-------+-------------+------------+------------
+ company1 | 07-01-2023 | 100 | 100 | 140 | 07-02-2023
+ company1 | 07-02-2023 | 200 | | |
+ company1 | 07-03-2023 | 150 | | |
+ company1 | 07-04-2023 | 140 | | |
+ company1 | 07-05-2023 | 150 | | |
+ company1 | 07-06-2023 | 90 | 90 | 120 | 07-07-2023
+ company1 | 07-07-2023 | 110 | | |
+ company1 | 07-08-2023 | 130 | | |
+ company1 | 07-09-2023 | 120 | | |
+ company1 | 07-10-2023 | 130 | | |
+ company2 | 07-01-2023 | 50 | 50 | 1400 | 07-02-2023
+ company2 | 07-02-2023 | 2000 | | |
+ company2 | 07-03-2023 | 1500 | | |
+ company2 | 07-04-2023 | 1400 | | |
+ company2 | 07-05-2023 | 1500 | | |
+ company2 | 07-06-2023 | 60 | 60 | 1200 | 07-07-2023
+ company2 | 07-07-2023 | 1100 | | |
+ company2 | 07-08-2023 | 1300 | | |
+ company2 | 07-09-2023 | 1200 | | |
+ company2 | 07-10-2023 | 1300 | | |
+(20 rows)
+
+-- last_value() should remain consistent
+SELECT company, tdate, price, last_value(price) OVER w
+ FROM stock
+ WINDOW w AS (
+ PARTITION BY company
+ ORDER BY tdate
+ ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
+ INITIAL
+ PATTERN (START UP+ DOWN+)
+ DEFINE
+ START AS TRUE,
+ UP AS price > PREV(price),
+ DOWN AS price < PREV(price)
+);
+ company | tdate | price | last_value
+----------+------------+-------+------------
+ company1 | 07-01-2023 | 100 | 140
+ company1 | 07-02-2023 | 200 |
+ company1 | 07-03-2023 | 150 |
+ company1 | 07-04-2023 | 140 |
+ company1 | 07-05-2023 | 150 |
+ company1 | 07-06-2023 | 90 | 120
+ company1 | 07-07-2023 | 110 |
+ company1 | 07-08-2023 | 130 |
+ company1 | 07-09-2023 | 120 |
+ company1 | 07-10-2023 | 130 |
+ company2 | 07-01-2023 | 50 | 1400
+ company2 | 07-02-2023 | 2000 |
+ company2 | 07-03-2023 | 1500 |
+ company2 | 07-04-2023 | 1400 |
+ company2 | 07-05-2023 | 1500 |
+ company2 | 07-06-2023 | 60 | 1200
+ company2 | 07-07-2023 | 1100 |
+ company2 | 07-08-2023 | 1300 |
+ company2 | 07-09-2023 | 1200 |
+ company2 | 07-10-2023 | 1300 |
+(20 rows)
+
+-- omit "START" in DEFINE but it is ok because "START AS TRUE" is
+-- implicitly defined. per spec.
+SELECT company, tdate, price, first_value(price) OVER w, last_value(price) OVER w,
+ nth_value(tdate, 2) OVER w AS nth_second
+ FROM stock
+ WINDOW w AS (
+ PARTITION BY company
+ ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
+ INITIAL
+ PATTERN (START UP+ DOWN+)
+ DEFINE
+ UP AS price > PREV(price),
+ DOWN AS price < PREV(price)
+);
+ company | tdate | price | first_value | last_value | nth_second
+----------+------------+-------+-------------+------------+------------
+ company1 | 07-01-2023 | 100 | 100 | 140 | 07-02-2023
+ company1 | 07-02-2023 | 200 | | |
+ company1 | 07-03-2023 | 150 | | |
+ company1 | 07-04-2023 | 140 | | |
+ company1 | 07-05-2023 | 150 | | |
+ company1 | 07-06-2023 | 90 | 90 | 120 | 07-07-2023
+ company1 | 07-07-2023 | 110 | | |
+ company1 | 07-08-2023 | 130 | | |
+ company1 | 07-09-2023 | 120 | | |
+ company1 | 07-10-2023 | 130 | | |
+ company2 | 07-01-2023 | 50 | 50 | 1400 | 07-02-2023
+ company2 | 07-02-2023 | 2000 | | |
+ company2 | 07-03-2023 | 1500 | | |
+ company2 | 07-04-2023 | 1400 | | |
+ company2 | 07-05-2023 | 1500 | | |
+ company2 | 07-06-2023 | 60 | 60 | 1200 | 07-07-2023
+ company2 | 07-07-2023 | 1100 | | |
+ company2 | 07-08-2023 | 1300 | | |
+ company2 | 07-09-2023 | 1200 | | |
+ company2 | 07-10-2023 | 1300 | | |
+(20 rows)
+
+-- the first row start with less than or equal to 100
+SELECT company, tdate, price, first_value(price) OVER w, last_value(price) OVER w
+ FROM stock
+ WINDOW w AS (
+ PARTITION BY company
+ ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
+ INITIAL
+ PATTERN (LOWPRICE UP+ DOWN+)
+ DEFINE
+ LOWPRICE AS price <= 100,
+ UP AS price > PREV(price),
+ DOWN AS price < PREV(price)
+);
+ company | tdate | price | first_value | last_value
+----------+------------+-------+-------------+------------
+ company1 | 07-01-2023 | 100 | 100 | 140
+ company1 | 07-02-2023 | 200 | |
+ company1 | 07-03-2023 | 150 | |
+ company1 | 07-04-2023 | 140 | |
+ company1 | 07-05-2023 | 150 | |
+ company1 | 07-06-2023 | 90 | 90 | 120
+ company1 | 07-07-2023 | 110 | |
+ company1 | 07-08-2023 | 130 | |
+ company1 | 07-09-2023 | 120 | |
+ company1 | 07-10-2023 | 130 | |
+ company2 | 07-01-2023 | 50 | 50 | 1400
+ company2 | 07-02-2023 | 2000 | |
+ company2 | 07-03-2023 | 1500 | |
+ company2 | 07-04-2023 | 1400 | |
+ company2 | 07-05-2023 | 1500 | |
+ company2 | 07-06-2023 | 60 | 60 | 1200
+ company2 | 07-07-2023 | 1100 | |
+ company2 | 07-08-2023 | 1300 | |
+ company2 | 07-09-2023 | 1200 | |
+ company2 | 07-10-2023 | 1300 | |
+(20 rows)
+
+-- second row raises 120%
+SELECT company, tdate, price, first_value(price) OVER w, last_value(price) OVER w
+ FROM stock
+ WINDOW w AS (
+ PARTITION BY company
+ ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
+ INITIAL
+ PATTERN (LOWPRICE UP+ DOWN+)
+ DEFINE
+ LOWPRICE AS price <= 100,
+ UP AS price > PREV(price) * 1.2,
+ DOWN AS price < PREV(price)
+);
+ company | tdate | price | first_value | last_value
+----------+------------+-------+-------------+------------
+ company1 | 07-01-2023 | 100 | 100 | 140
+ company1 | 07-02-2023 | 200 | |
+ company1 | 07-03-2023 | 150 | |
+ company1 | 07-04-2023 | 140 | |
+ company1 | 07-05-2023 | 150 | |
+ company1 | 07-06-2023 | 90 | |
+ company1 | 07-07-2023 | 110 | |
+ company1 | 07-08-2023 | 130 | |
+ company1 | 07-09-2023 | 120 | |
+ company1 | 07-10-2023 | 130 | |
+ company2 | 07-01-2023 | 50 | 50 | 1400
+ company2 | 07-02-2023 | 2000 | |
+ company2 | 07-03-2023 | 1500 | |
+ company2 | 07-04-2023 | 1400 | |
+ company2 | 07-05-2023 | 1500 | |
+ company2 | 07-06-2023 | 60 | |
+ company2 | 07-07-2023 | 1100 | |
+ company2 | 07-08-2023 | 1300 | |
+ company2 | 07-09-2023 | 1200 | |
+ company2 | 07-10-2023 | 1300 | |
+(20 rows)
+
+-- using NEXT
+SELECT company, tdate, price, first_value(price) OVER w, last_value(price) OVER w
+ FROM stock
+ WINDOW w AS (
+ PARTITION BY company
+ ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
+ INITIAL
+ PATTERN (START UPDOWN)
+ DEFINE
+ START AS TRUE,
+ UPDOWN AS price > PREV(price) AND price > NEXT(price)
+);
+ company | tdate | price | first_value | last_value
+----------+------------+-------+-------------+------------
+ company1 | 07-01-2023 | 100 | 100 | 200
+ company1 | 07-02-2023 | 200 | |
+ company1 | 07-03-2023 | 150 | |
+ company1 | 07-04-2023 | 140 | 140 | 150
+ company1 | 07-05-2023 | 150 | |
+ company1 | 07-06-2023 | 90 | |
+ company1 | 07-07-2023 | 110 | 110 | 130
+ company1 | 07-08-2023 | 130 | |
+ company1 | 07-09-2023 | 120 | |
+ company1 | 07-10-2023 | 130 | |
+ company2 | 07-01-2023 | 50 | 50 | 2000
+ company2 | 07-02-2023 | 2000 | |
+ company2 | 07-03-2023 | 1500 | |
+ company2 | 07-04-2023 | 1400 | 1400 | 1500
+ company2 | 07-05-2023 | 1500 | |
+ company2 | 07-06-2023 | 60 | |
+ company2 | 07-07-2023 | 1100 | 1100 | 1300
+ company2 | 07-08-2023 | 1300 | |
+ company2 | 07-09-2023 | 1200 | |
+ company2 | 07-10-2023 | 1300 | |
+(20 rows)
+
+-- using AFTER MATCH SKIP TO NEXT ROW
+SELECT company, tdate, price, first_value(price) OVER w, last_value(price) OVER w
+ FROM stock
+ WINDOW w AS (
+ PARTITION BY company
+ ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
+ AFTER MATCH SKIP TO NEXT ROW
+ INITIAL
+ PATTERN (START UPDOWN)
+ DEFINE
+ START AS TRUE,
+ UPDOWN AS price > PREV(price) AND price > NEXT(price)
+);
+ company | tdate | price | first_value | last_value
+----------+------------+-------+-------------+------------
+ company1 | 07-01-2023 | 100 | 100 | 200
+ company1 | 07-02-2023 | 200 | |
+ company1 | 07-03-2023 | 150 | |
+ company1 | 07-04-2023 | 140 | 140 | 150
+ company1 | 07-05-2023 | 150 | |
+ company1 | 07-06-2023 | 90 | |
+ company1 | 07-07-2023 | 110 | 110 | 130
+ company1 | 07-08-2023 | 130 | |
+ company1 | 07-09-2023 | 120 | |
+ company1 | 07-10-2023 | 130 | |
+ company2 | 07-01-2023 | 50 | 50 | 2000
+ company2 | 07-02-2023 | 2000 | |
+ company2 | 07-03-2023 | 1500 | |
+ company2 | 07-04-2023 | 1400 | 1400 | 1500
+ company2 | 07-05-2023 | 1500 | |
+ company2 | 07-06-2023 | 60 | |
+ company2 | 07-07-2023 | 1100 | 1100 | 1300
+ company2 | 07-08-2023 | 1300 | |
+ company2 | 07-09-2023 | 1200 | |
+ company2 | 07-10-2023 | 1300 | |
+(20 rows)
+
+-- match everything
+SELECT company, tdate, price, first_value(price) OVER w, last_value(price) OVER w
+ FROM stock
+ WINDOW w AS (
+ PARTITION BY company
+ ORDER BY tdate
+ ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
+ AFTER MATCH SKIP PAST LAST ROW
+ INITIAL
+ PATTERN (A+)
+ DEFINE
+ A AS TRUE
+);
+ company | tdate | price | first_value | last_value
+----------+------------+-------+-------------+------------
+ company1 | 07-01-2023 | 100 | 100 | 130
+ company1 | 07-02-2023 | 200 | |
+ company1 | 07-03-2023 | 150 | |
+ company1 | 07-04-2023 | 140 | |
+ company1 | 07-05-2023 | 150 | |
+ company1 | 07-06-2023 | 90 | |
+ company1 | 07-07-2023 | 110 | |
+ company1 | 07-08-2023 | 130 | |
+ company1 | 07-09-2023 | 120 | |
+ company1 | 07-10-2023 | 130 | |
+ company2 | 07-01-2023 | 50 | 50 | 1300
+ company2 | 07-02-2023 | 2000 | |
+ company2 | 07-03-2023 | 1500 | |
+ company2 | 07-04-2023 | 1400 | |
+ company2 | 07-05-2023 | 1500 | |
+ company2 | 07-06-2023 | 60 | |
+ company2 | 07-07-2023 | 1100 | |
+ company2 | 07-08-2023 | 1300 | |
+ company2 | 07-09-2023 | 1200 | |
+ company2 | 07-10-2023 | 1300 | |
+(20 rows)
+
+-- backtracking with reclassification of rows
+-- using AFTER MATCH SKIP PAST LAST ROW
+SELECT company, tdate, price, first_value(tdate) OVER w, last_value(tdate) OVER w
+ FROM stock
+ WINDOW w AS (
+ PARTITION BY company
+ ORDER BY tdate
+ ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
+ AFTER MATCH SKIP PAST LAST ROW
+ INITIAL
+ PATTERN (A+ B+)
+ DEFINE
+ A AS price > 100,
+ B AS price > 100
+);
+ company | tdate | price | first_value | last_value
+----------+------------+-------+-------------+------------
+ company1 | 07-01-2023 | 100 | |
+ company1 | 07-02-2023 | 200 | 07-02-2023 | 07-05-2023
+ company1 | 07-03-2023 | 150 | |
+ company1 | 07-04-2023 | 140 | |
+ company1 | 07-05-2023 | 150 | |
+ company1 | 07-06-2023 | 90 | |
+ company1 | 07-07-2023 | 110 | 07-07-2023 | 07-10-2023
+ company1 | 07-08-2023 | 130 | |
+ company1 | 07-09-2023 | 120 | |
+ company1 | 07-10-2023 | 130 | |
+ company2 | 07-01-2023 | 50 | |
+ company2 | 07-02-2023 | 2000 | 07-02-2023 | 07-05-2023
+ company2 | 07-03-2023 | 1500 | |
+ company2 | 07-04-2023 | 1400 | |
+ company2 | 07-05-2023 | 1500 | |
+ company2 | 07-06-2023 | 60 | |
+ company2 | 07-07-2023 | 1100 | 07-07-2023 | 07-10-2023
+ company2 | 07-08-2023 | 1300 | |
+ company2 | 07-09-2023 | 1200 | |
+ company2 | 07-10-2023 | 1300 | |
+(20 rows)
+
+-- backtracking with reclassification of rows
+-- using AFTER MATCH SKIP TO NEXT ROW
+SELECT company, tdate, price, first_value(tdate) OVER w, last_value(tdate) OVER w
+ FROM stock
+ WINDOW w AS (
+ PARTITION BY company
+ ORDER BY tdate
+ ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
+ AFTER MATCH SKIP TO NEXT ROW
+ INITIAL
+ PATTERN (A+ B+)
+ DEFINE
+ A AS price > 100,
+ B AS price > 100
+);
+ company | tdate | price | first_value | last_value
+----------+------------+-------+-------------+------------
+ company1 | 07-01-2023 | 100 | |
+ company1 | 07-02-2023 | 200 | 07-02-2023 | 07-05-2023
+ company1 | 07-03-2023 | 150 | 07-03-2023 | 07-05-2023
+ company1 | 07-04-2023 | 140 | 07-04-2023 | 07-05-2023
+ company1 | 07-05-2023 | 150 | |
+ company1 | 07-06-2023 | 90 | |
+ company1 | 07-07-2023 | 110 | 07-07-2023 | 07-10-2023
+ company1 | 07-08-2023 | 130 | 07-08-2023 | 07-10-2023
+ company1 | 07-09-2023 | 120 | 07-09-2023 | 07-10-2023
+ company1 | 07-10-2023 | 130 | |
+ company2 | 07-01-2023 | 50 | |
+ company2 | 07-02-2023 | 2000 | 07-02-2023 | 07-05-2023
+ company2 | 07-03-2023 | 1500 | 07-03-2023 | 07-05-2023
+ company2 | 07-04-2023 | 1400 | 07-04-2023 | 07-05-2023
+ company2 | 07-05-2023 | 1500 | |
+ company2 | 07-06-2023 | 60 | |
+ company2 | 07-07-2023 | 1100 | 07-07-2023 | 07-10-2023
+ company2 | 07-08-2023 | 1300 | 07-08-2023 | 07-10-2023
+ company2 | 07-09-2023 | 1200 | 07-09-2023 | 07-10-2023
+ company2 | 07-10-2023 | 1300 | |
+(20 rows)
+
+--
+-- Aggregates
+--
+-- using AFTER MATCH SKIP PAST LAST ROW
+SELECT company, tdate, price,
+ first_value(price) OVER w,
+ last_value(price) OVER w,
+ max(price) OVER w,
+ min(price) OVER w,
+ sum(price) OVER w,
+ avg(price) OVER w,
+ count(price) OVER w
+FROM stock
+WINDOW w AS (
+PARTITION BY company
+ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
+AFTER MATCH SKIP PAST LAST ROW
+INITIAL
+PATTERN (START UP+ DOWN+)
+DEFINE
+START AS TRUE,
+UP AS price > PREV(price),
+DOWN AS price < PREV(price)
+);
+ company | tdate | price | first_value | last_value | max | min | sum | avg | count
+----------+------------+-------+-------------+------------+------+-----+------+-----------------------+-------
+ company1 | 07-01-2023 | 100 | 100 | 140 | 200 | 100 | 590 | 147.5000000000000000 | 4
+ company1 | 07-02-2023 | 200 | | | | | | |
+ company1 | 07-03-2023 | 150 | | | | | | |
+ company1 | 07-04-2023 | 140 | | | | | | |
+ company1 | 07-05-2023 | 150 | | | | | | |
+ company1 | 07-06-2023 | 90 | 90 | 120 | 130 | 90 | 450 | 112.5000000000000000 | 4
+ company1 | 07-07-2023 | 110 | | | | | | |
+ company1 | 07-08-2023 | 130 | | | | | | |
+ company1 | 07-09-2023 | 120 | | | | | | |
+ company1 | 07-10-2023 | 130 | | | | | | |
+ company2 | 07-01-2023 | 50 | 50 | 1400 | 2000 | 50 | 4950 | 1237.5000000000000000 | 4
+ company2 | 07-02-2023 | 2000 | | | | | | |
+ company2 | 07-03-2023 | 1500 | | | | | | |
+ company2 | 07-04-2023 | 1400 | | | | | | |
+ company2 | 07-05-2023 | 1500 | | | | | | |
+ company2 | 07-06-2023 | 60 | 60 | 1200 | 1300 | 60 | 3660 | 915.0000000000000000 | 4
+ company2 | 07-07-2023 | 1100 | | | | | | |
+ company2 | 07-08-2023 | 1300 | | | | | | |
+ company2 | 07-09-2023 | 1200 | | | | | | |
+ company2 | 07-10-2023 | 1300 | | | | | | |
+(20 rows)
+
+-- using AFTER MATCH SKIP TO NEXT ROW
+SELECT company, tdate, price,
+ first_value(price) OVER w,
+ last_value(price) OVER w,
+ max(price) OVER w,
+ min(price) OVER w,
+ sum(price) OVER w,
+ avg(price) OVER w,
+ count(price) OVER w
+FROM stock
+WINDOW w AS (
+PARTITION BY company
+ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
+AFTER MATCH SKIP TO NEXT ROW
+INITIAL
+PATTERN (START UP+ DOWN+)
+DEFINE
+START AS TRUE,
+UP AS price > PREV(price),
+DOWN AS price < PREV(price)
+);
+ company | tdate | price | first_value | last_value | max | min | sum | avg | count
+----------+------------+-------+-------------+------------+------+------+------+-----------------------+-------
+ company1 | 07-01-2023 | 100 | 100 | 140 | 200 | 100 | 590 | 147.5000000000000000 | 4
+ company1 | 07-02-2023 | 200 | | | | | | | 0
+ company1 | 07-03-2023 | 150 | | | | | | | 0
+ company1 | 07-04-2023 | 140 | 140 | 90 | 150 | 90 | 380 | 126.6666666666666667 | 3
+ company1 | 07-05-2023 | 150 | | | | | | | 0
+ company1 | 07-06-2023 | 90 | 90 | 120 | 130 | 90 | 450 | 112.5000000000000000 | 4
+ company1 | 07-07-2023 | 110 | 110 | 120 | 130 | 110 | 360 | 120.0000000000000000 | 3
+ company1 | 07-08-2023 | 130 | | | | | | | 0
+ company1 | 07-09-2023 | 120 | | | | | | | 0
+ company1 | 07-10-2023 | 130 | | | | | | | 0
+ company2 | 07-01-2023 | 50 | 50 | 1400 | 2000 | 50 | 4950 | 1237.5000000000000000 | 4
+ company2 | 07-02-2023 | 2000 | | | | | | | 0
+ company2 | 07-03-2023 | 1500 | | | | | | | 0
+ company2 | 07-04-2023 | 1400 | 1400 | 60 | 1500 | 60 | 2960 | 986.6666666666666667 | 3
+ company2 | 07-05-2023 | 1500 | | | | | | | 0
+ company2 | 07-06-2023 | 60 | 60 | 1200 | 1300 | 60 | 3660 | 915.0000000000000000 | 4
+ company2 | 07-07-2023 | 1100 | 1100 | 1200 | 1300 | 1100 | 3600 | 1200.0000000000000000 | 3
+ company2 | 07-08-2023 | 1300 | | | | | | | 0
+ company2 | 07-09-2023 | 1200 | | | | | | | 0
+ company2 | 07-10-2023 | 1300 | | | | | | | 0
+(20 rows)
+
+--
+-- Error cases
+--
+-- row pattern definition variable name must not appear more than once
+SELECT company, tdate, price, first_value(price) OVER w, last_value(price) OVER w
+ FROM stock
+ WINDOW w AS (
+ PARTITION BY company
+ ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
+ ORDER BY tdate
+ INITIAL
+ PATTERN (START UP+ DOWN+)
+ DEFINE
+ START AS TRUE,
+ UP AS price > PREV(price),
+ DOWN AS price < PREV(price),
+ UP AS price > PREV(price)
+);
+ERROR: syntax error at or near "ORDER"
+LINE 6: ORDER BY tdate
+ ^
+-- pattern variable name must appear in DEFINE
+SELECT company, tdate, price, first_value(price) OVER w, last_value(price) OVER w
+ FROM stock
+ WINDOW w AS (
+ PARTITION BY company
+ ORDER BY tdate
+ ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
+ INITIAL
+ PATTERN (START UP+ DOWN+ END)
+ DEFINE
+ START AS TRUE,
+ UP AS price > PREV(price),
+ DOWN AS price < PREV(price)
+);
+ERROR: syntax error at or near "END"
+LINE 8: PATTERN (START UP+ DOWN+ END)
+ ^
+-- FRAME must start at current row when row patttern recognition is used
+SELECT company, tdate, price, first_value(price) OVER w, last_value(price) OVER w
+ FROM stock
+ WINDOW w AS (
+ PARTITION BY company
+ ORDER BY tdate
+ ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
+ INITIAL
+ PATTERN (START UP+ DOWN+)
+ DEFINE
+ START AS TRUE,
+ UP AS price > PREV(price),
+ DOWN AS price < PREV(price)
+);
+ERROR: FRAME must start at current row when row patttern recognition is used
+-- SEEK is not supported
+SELECT company, tdate, price, first_value(price) OVER w, last_value(price) OVER w
+ FROM stock
+ WINDOW w AS (
+ PARTITION BY company
+ ORDER BY tdate
+ ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
+ AFTER MATCH SKIP TO NEXT ROW
+ SEEK
+ PATTERN (START UP+ DOWN+)
+ DEFINE
+ START AS TRUE,
+ UP AS price > PREV(price),
+ DOWN AS price < PREV(price)
+);
+ERROR: SEEK is not supported
+LINE 8: SEEK
+ ^
+HINT: Use INITIAL.
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 4df9d8503b..896531002b 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -98,7 +98,7 @@ test: publication subscription
# Another group of parallel tests
# select_views depends on create_view
# ----------
-test: select_views portals_p2 foreign_key cluster dependency guc bitmapops combocid tsearch tsdicts foreign_data window xmlmap functional_deps advisory_lock indirect_toast equivclass
+test: select_views portals_p2 foreign_key cluster dependency guc bitmapops combocid tsearch tsdicts foreign_data window xmlmap functional_deps advisory_lock indirect_toast equivclass rpr
# ----------
# Another group of parallel tests (JSON related)
diff --git a/src/test/regress/sql/rpr.sql b/src/test/regress/sql/rpr.sql
new file mode 100644
index 0000000000..bdefd20647
--- /dev/null
+++ b/src/test/regress/sql/rpr.sql
@@ -0,0 +1,292 @@
+--
+-- Test for row pattern definition clause
+--
+
+CREATE TEMP TABLE stock (
+ company TEXT,
+ tdate DATE,
+ price INTEGER
+ );
+INSERT INTO stock VALUES ('company1', '2023-07-01', 100);
+INSERT INTO stock VALUES ('company1', '2023-07-02', 200);
+INSERT INTO stock VALUES ('company1', '2023-07-03', 150);
+INSERT INTO stock VALUES ('company1', '2023-07-04', 140);
+INSERT INTO stock VALUES ('company1', '2023-07-05', 150);
+INSERT INTO stock VALUES ('company1', '2023-07-06', 90);
+INSERT INTO stock VALUES ('company1', '2023-07-07', 110);
+INSERT INTO stock VALUES ('company1', '2023-07-08', 130);
+INSERT INTO stock VALUES ('company1', '2023-07-09', 120);
+INSERT INTO stock VALUES ('company1', '2023-07-10', 130);
+INSERT INTO stock VALUES ('company2', '2023-07-01', 50);
+INSERT INTO stock VALUES ('company2', '2023-07-02', 2000);
+INSERT INTO stock VALUES ('company2', '2023-07-03', 1500);
+INSERT INTO stock VALUES ('company2', '2023-07-04', 1400);
+INSERT INTO stock VALUES ('company2', '2023-07-05', 1500);
+INSERT INTO stock VALUES ('company2', '2023-07-06', 60);
+INSERT INTO stock VALUES ('company2', '2023-07-07', 1100);
+INSERT INTO stock VALUES ('company2', '2023-07-08', 1300);
+INSERT INTO stock VALUES ('company2', '2023-07-09', 1200);
+INSERT INTO stock VALUES ('company2', '2023-07-10', 1300);
+
+SELECT * FROM stock;
+
+-- basic test using PREV
+SELECT company, tdate, price, first_value(price) OVER w, last_value(price) OVER w,
+ nth_value(tdate, 2) OVER w AS nth_second
+ FROM stock
+ WINDOW w AS (
+ PARTITION BY company
+ ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
+ INITIAL
+ PATTERN (START UP+ DOWN+)
+ DEFINE
+ START AS TRUE,
+ UP AS price > PREV(price),
+ DOWN AS price < PREV(price)
+);
+
+-- last_value() should remain consistent
+SELECT company, tdate, price, last_value(price) OVER w
+ FROM stock
+ WINDOW w AS (
+ PARTITION BY company
+ ORDER BY tdate
+ ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
+ INITIAL
+ PATTERN (START UP+ DOWN+)
+ DEFINE
+ START AS TRUE,
+ UP AS price > PREV(price),
+ DOWN AS price < PREV(price)
+);
+
+-- omit "START" in DEFINE but it is ok because "START AS TRUE" is
+-- implicitly defined. per spec.
+SELECT company, tdate, price, first_value(price) OVER w, last_value(price) OVER w,
+ nth_value(tdate, 2) OVER w AS nth_second
+ FROM stock
+ WINDOW w AS (
+ PARTITION BY company
+ ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
+ INITIAL
+ PATTERN (START UP+ DOWN+)
+ DEFINE
+ UP AS price > PREV(price),
+ DOWN AS price < PREV(price)
+);
+
+-- the first row start with less than or equal to 100
+SELECT company, tdate, price, first_value(price) OVER w, last_value(price) OVER w
+ FROM stock
+ WINDOW w AS (
+ PARTITION BY company
+ ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
+ INITIAL
+ PATTERN (LOWPRICE UP+ DOWN+)
+ DEFINE
+ LOWPRICE AS price <= 100,
+ UP AS price > PREV(price),
+ DOWN AS price < PREV(price)
+);
+
+-- second row raises 120%
+SELECT company, tdate, price, first_value(price) OVER w, last_value(price) OVER w
+ FROM stock
+ WINDOW w AS (
+ PARTITION BY company
+ ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
+ INITIAL
+ PATTERN (LOWPRICE UP+ DOWN+)
+ DEFINE
+ LOWPRICE AS price <= 100,
+ UP AS price > PREV(price) * 1.2,
+ DOWN AS price < PREV(price)
+);
+
+-- using NEXT
+SELECT company, tdate, price, first_value(price) OVER w, last_value(price) OVER w
+ FROM stock
+ WINDOW w AS (
+ PARTITION BY company
+ ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
+ INITIAL
+ PATTERN (START UPDOWN)
+ DEFINE
+ START AS TRUE,
+ UPDOWN AS price > PREV(price) AND price > NEXT(price)
+);
+
+-- using AFTER MATCH SKIP TO NEXT ROW
+SELECT company, tdate, price, first_value(price) OVER w, last_value(price) OVER w
+ FROM stock
+ WINDOW w AS (
+ PARTITION BY company
+ ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
+ AFTER MATCH SKIP TO NEXT ROW
+ INITIAL
+ PATTERN (START UPDOWN)
+ DEFINE
+ START AS TRUE,
+ UPDOWN AS price > PREV(price) AND price > NEXT(price)
+);
+
+-- match everything
+
+SELECT company, tdate, price, first_value(price) OVER w, last_value(price) OVER w
+ FROM stock
+ WINDOW w AS (
+ PARTITION BY company
+ ORDER BY tdate
+ ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
+ AFTER MATCH SKIP PAST LAST ROW
+ INITIAL
+ PATTERN (A+)
+ DEFINE
+ A AS TRUE
+);
+
+-- backtracking with reclassification of rows
+-- using AFTER MATCH SKIP PAST LAST ROW
+SELECT company, tdate, price, first_value(tdate) OVER w, last_value(tdate) OVER w
+ FROM stock
+ WINDOW w AS (
+ PARTITION BY company
+ ORDER BY tdate
+ ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
+ AFTER MATCH SKIP PAST LAST ROW
+ INITIAL
+ PATTERN (A+ B+)
+ DEFINE
+ A AS price > 100,
+ B AS price > 100
+);
+
+-- backtracking with reclassification of rows
+-- using AFTER MATCH SKIP TO NEXT ROW
+SELECT company, tdate, price, first_value(tdate) OVER w, last_value(tdate) OVER w
+ FROM stock
+ WINDOW w AS (
+ PARTITION BY company
+ ORDER BY tdate
+ ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
+ AFTER MATCH SKIP TO NEXT ROW
+ INITIAL
+ PATTERN (A+ B+)
+ DEFINE
+ A AS price > 100,
+ B AS price > 100
+);
+
+--
+-- Aggregates
+--
+
+-- using AFTER MATCH SKIP PAST LAST ROW
+SELECT company, tdate, price,
+ first_value(price) OVER w,
+ last_value(price) OVER w,
+ max(price) OVER w,
+ min(price) OVER w,
+ sum(price) OVER w,
+ avg(price) OVER w,
+ count(price) OVER w
+FROM stock
+WINDOW w AS (
+PARTITION BY company
+ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
+AFTER MATCH SKIP PAST LAST ROW
+INITIAL
+PATTERN (START UP+ DOWN+)
+DEFINE
+START AS TRUE,
+UP AS price > PREV(price),
+DOWN AS price < PREV(price)
+);
+
+-- using AFTER MATCH SKIP TO NEXT ROW
+SELECT company, tdate, price,
+ first_value(price) OVER w,
+ last_value(price) OVER w,
+ max(price) OVER w,
+ min(price) OVER w,
+ sum(price) OVER w,
+ avg(price) OVER w,
+ count(price) OVER w
+FROM stock
+WINDOW w AS (
+PARTITION BY company
+ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
+AFTER MATCH SKIP TO NEXT ROW
+INITIAL
+PATTERN (START UP+ DOWN+)
+DEFINE
+START AS TRUE,
+UP AS price > PREV(price),
+DOWN AS price < PREV(price)
+);
+
+--
+-- Error cases
+--
+
+-- row pattern definition variable name must not appear more than once
+SELECT company, tdate, price, first_value(price) OVER w, last_value(price) OVER w
+ FROM stock
+ WINDOW w AS (
+ PARTITION BY company
+ ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
+ ORDER BY tdate
+ INITIAL
+ PATTERN (START UP+ DOWN+)
+ DEFINE
+ START AS TRUE,
+ UP AS price > PREV(price),
+ DOWN AS price < PREV(price),
+ UP AS price > PREV(price)
+);
+
+-- pattern variable name must appear in DEFINE
+SELECT company, tdate, price, first_value(price) OVER w, last_value(price) OVER w
+ FROM stock
+ WINDOW w AS (
+ PARTITION BY company
+ ORDER BY tdate
+ ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
+ INITIAL
+ PATTERN (START UP+ DOWN+ END)
+ DEFINE
+ START AS TRUE,
+ UP AS price > PREV(price),
+ DOWN AS price < PREV(price)
+);
+
+-- FRAME must start at current row when row patttern recognition is used
+SELECT company, tdate, price, first_value(price) OVER w, last_value(price) OVER w
+ FROM stock
+ WINDOW w AS (
+ PARTITION BY company
+ ORDER BY tdate
+ ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
+ INITIAL
+ PATTERN (START UP+ DOWN+)
+ DEFINE
+ START AS TRUE,
+ UP AS price > PREV(price),
+ DOWN AS price < PREV(price)
+);
+
+-- SEEK is not supported
+SELECT company, tdate, price, first_value(price) OVER w, last_value(price) OVER w
+ FROM stock
+ WINDOW w AS (
+ PARTITION BY company
+ ORDER BY tdate
+ ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
+ AFTER MATCH SKIP TO NEXT ROW
+ SEEK
+ PATTERN (START UP+ DOWN+)
+ DEFINE
+ START AS TRUE,
+ UP AS price > PREV(price),
+ DOWN AS price < PREV(price)
+);
--
2.25.1
>From b048a98309a3c482118e676e6e950ca405bc14ad Mon Sep 17 00:00:00 2001
From: Tatsuo Ishii <[email protected]>
Date: Tue, 12 Sep 2023 14:22:22 +0900
Subject: [PATCH v6 7/7] Allow to print raw parse tree.
---
src/backend/tcop/postgres.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 21b9763183..3e3653816e 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -651,6 +651,10 @@ pg_parse_query(const char *query_string)
}
#endif
+ if (Debug_print_parse)
+ elog_node_display(LOG, "raw parse tree", raw_parsetree_list,
+ Debug_pretty_print);
+
TRACE_POSTGRESQL_QUERY_PARSE_DONE(query_string);
return raw_parsetree_list;
--
2.25.1