On 7/26/23 14:21, Tatsuo Ishii wrote:
Attached is the v3 patch. In this patch following changes are made.

Excellent.  Thanks!

A few quick comments:

- PERMUTE is still misspelled as PREMUTE

- PATTERN variables do not have to exist in the DEFINE clause. They are considered TRUE if not present.

(1) I completely changed the pattern matching engine so that it
performs backtracking. Now the engine evaluates all pattern elements
defined in PATTER against each row, saving matched pattern variables
in a string per row. For example if the pattern element A and B
evaluated to true, a string "AB" is created for current row.

This continues until all pattern matching fails or encounters the end
of full window frame/partition. After that, the pattern matching
engine creates all possible "pattern strings" and apply the regular
expression matching to each. For example if we have row 0 = "AB" row 1
= "C", possible pattern strings are: "AC" and "BC".

If it matches, the length of matching substring is saved. After all
possible trials are done, the longest matching substring is chosen and
it becomes the width (number of rows) in the reduced window frame.

See row_is_in_reduced_frame, search_str_set and search_str_set_recurse
in nodeWindowAggs.c for more details. For now I use a naive depth
first search and probably there is a lot of rooms for optimization
(for example rewriting it without using
recursion). Suggestions/patches are welcome.

My own reviews will only focus on correctness for now. Once we get a good set of regression tests all passing, I will focus more on optimization. Of course, others might want to review the performance now.

Vik Fearing wrote:
I strongly disagree with this.  Window function do not need to know
how the frame is defined, and indeed they should not.
WinGetFuncArgInFrame should answer yes or no and the window function
just works on that. Otherwise we will get extension (and possibly even
core) functions that don't handle the frame properly.

So I moved row_is_in_reduce_frame into WinGetFuncArgInFrame so that
those window functions are not needed to be changed.

(3) Window function rpr was removed. We can use first_value instead.

Excellent.

(4) Remaining tasks/issues.

- For now I disable WinSetMarkPosition because RPR often needs to
   access a row before the mark is set. We need to fix this in the
   future.

Noted, and agreed.

- I am working on making window aggregates RPR aware now. The
   implementation is in progress and far from completeness. An example
   is below. I think row 2, 3, 4 of "count" column should be NULL
   instead of 3, 2, 0, 0. Same thing can be said to other
   rows. Probably this is an effect of moving aggregate but I still
   studying the window aggregation code.

This tells me again that RPR is not being run in the right place. All windowed aggregates and frame-level window functions should Just Work with no modification.

SELECT company, tdate, first_value(price) OVER W, count(*) OVER w FROM stock
  WINDOW w AS (
  PARTITION BY company
  ORDER BY tdate
  ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
  AFTER MATCH SKIP PAST LAST ROW
  INITIAL
  PATTERN (START UP+ DOWN+)
  DEFINE
   START AS TRUE,
   UP AS price > PREV(price),
   DOWN AS price < PREV(price)
);
  company  |   tdate    | first_value | count
----------+------------+-------------+-------
  company1 | 2023-07-01 |         100 |     4
  company1 | 2023-07-02 |             |     3
  company1 | 2023-07-03 |             |     2
  company1 | 2023-07-04 |             |     0
  company1 | 2023-07-05 |             |     0
  company1 | 2023-07-06 |          90 |     4
  company1 | 2023-07-07 |             |     3
  company1 | 2023-07-08 |             |     2
  company1 | 2023-07-09 |             |     0
  company1 | 2023-07-10 |             |     0
  company2 | 2023-07-01 |          50 |     4
  company2 | 2023-07-02 |             |     3
  company2 | 2023-07-03 |             |     2
  company2 | 2023-07-04 |             |     0
  company2 | 2023-07-05 |             |     0
  company2 | 2023-07-06 |          60 |     4
  company2 | 2023-07-07 |             |     3
  company2 | 2023-07-08 |             |     2
  company2 | 2023-07-09 |             |     0
  company2 | 2023-07-10 |             |     0

In this scenario, row 1's frame is the first 5 rows and specified SKIP PAST LAST ROW, so rows 2-5 don't have *any* frame (because they are skipped) and the result of the outer count should be 0 for all of them because there are no rows in the frame.

When we get to adding count in the MEASURES clause, there will be a difference between no match and empty match, but that does not apply here.

I am going to add this thread to CommitFest and plan to add both of
you as reviewers. Thanks in advance.

My pleasure.  Thank you for working on this difficult feature.
--
Vik Fearing



Reply via email to