[GitHub] flink issue #4331: [FLINK-7169][CEP] Support AFTER MATCH SKIP function in CE...
Github user yestinchen commented on the issue: https://github.com/apache/flink/pull/4331 @dawidwys Thanks for the reviews and comments, please change the documentation during merge. And @kl0u , thanks for the reviews! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #4331: [FLINK-7169][CEP] Support AFTER MATCH SKIP function in CE...
Github user kl0u commented on the issue: https://github.com/apache/flink/pull/4331 I think you can merge it @dawidwys . I was following the evolution of this PR and I think it looks good ;) . Thanks for the work both @yestinchen and @dawidwys ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #4331: [FLINK-7169][CEP] Support AFTER MATCH SKIP function in CE...
Github user yestinchen commented on the issue: https://github.com/apache/flink/pull/4331 Hi @dawidwys , sorry for the late response. Thanks for your reviews, I have updated the test and the document. Please take a look if you have time. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #4331: [FLINK-7169][CEP] Support AFTER MATCH SKIP function in CE...
Github user dawidwys commented on the issue: https://github.com/apache/flink/pull/4331 Still missing some tests: - ensuring the `NFA#extractCurrentMatches` returns patterns in order. - skip to first/last with `oneOrMore` Docs missing --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #4331: [FLINK-7169][CEP] Support AFTER MATCH SKIP function in CE...
Github user yestinchen commented on the issue: https://github.com/apache/flink/pull/4331 Thanks for your reviews @dawidwys ! I'll update the doc in the following commits. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #4331: [FLINK-7169][CEP] Support AFTER MATCH SKIP function in CE...
Github user yestinchen commented on the issue: https://github.com/apache/flink/pull/4331 @dawidwys @dianfu I've updated the approach according to the document. Feel free to comment. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #4331: [FLINK-7169][CEP] Support AFTER MATCH SKIP function in CE...
Github user yestinchen commented on the issue: https://github.com/apache/flink/pull/4331 @dianfu Thanks for your reviewing. I found @dawidwys wrote a draft about the JIRA's implementation. I'll go through that first and address those issues in this PR latter. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #4331: [FLINK-7169][CEP] Support AFTER MATCH SKIP function in CE...
Github user litrain commented on the issue: https://github.com/apache/flink/pull/4331 @dawidwys @yestinchen Thanks for your discussion, I am also working on the empty match issue now. Please have a look at https://issues.apache.org/jira/browse/FLINK-7292. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #4331: [FLINK-7169][CEP] Support AFTER MATCH SKIP function in CE...
Github user dawidwys commented on the issue: https://github.com/apache/flink/pull/4331 Hi @yestinchen Thanks for your feedback. Right now, I just want to address the empty match issue. It is not easy to apply that definition to unbounded data. In the SQL specification at the end of partition we can deterministically decide if a partial match is empty or not. It is not the case in unbounded data, as future arriving events may make the partial matches empty. Those are the nuances I think should be addressed first, but I agree there are many similarities. I will try to address rest later during the day, but I think we should continue in the JIRA. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #4331: [FLINK-7169][CEP] Support AFTER MATCH SKIP function in CE...
Github user dianfu commented on the issue: https://github.com/apache/flink/pull/4331 Sorry for late response. I think this feature is very useful and agree that we should have a clear thought on what things should be for each skip strategy. I noticed that there are already some discussions in FLINK-3703 which we can refer. I will take a look at this PR and also FLINK-3703 these two days and will post my thought. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #4331: [FLINK-7169][CEP] Support AFTER MATCH SKIP function in CE...
Github user yestinchen commented on the issue: https://github.com/apache/flink/pull/4331 @dawidwys Thanks for the reviewing. Problem 1 is easy to fix, we can just start a new match process if the only left computation state reaches stopState. Problem 2 can not be avoided with current approach. It's impossible to know whether there are potential matches. I think the best wary to implement this correctly is try to start a new match process after processing each event, and discard unfinished match process after a successful match according to the skip strategy. In order to do that, we need to keep the logical order of the events, which is the original idea I proposed. As for your general notes, I have some ideas: 1. I agree that the Oracle's specification is designed for bounded data. But match recoginize in unbounded data is very similar to bounded data, since all data are being processed one by one, and there's no need for bound information. As for **_empty match_** , I think we can just use Oracle's definition. > Some patterns permit empty matches. For example: PATTERN (A*) can be matched by zero or more rows that are mapped to A. An empty match does not map any rows to primary row pattern variables; nevertheless, an empty match has a starting row. For example, there can be an empty match at the first row of a row pattern partition, an empty match at the second row of a row pattern partition, etc. An empty match is assigned a sequential match number, based on the ordinal position of its starting row, the same as any other match. 2. I feel uncomfortable with the RuntimeExceptions too. But these exceptions are very important to keep the skip semantics right. I understand your main concern is that Exceptions will stop the matching process, which is unacceptable to online streaming service. To address this, I think we can introduce a default strategy(SKIP_TO_NEXT_EVENT, for example). If these exceptions happens, we can use default strategy to continue the match process, and change the strategy back after a successful match. We can also add a switch to let user decide whether to enable this feature. 3. I still think it's useful to support these skip strategies. Don't know why Esper does not support them. 4. Thanks for the related information. I took a brief look at the PR, which is very similar to this PR. I wonder why it is closed without merging into the master code? Looking forward to your feedbacks. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #4331: [FLINK-7169][CEP] Support AFTER MATCH SKIP function in CE...
Github user dawidwys commented on the issue: https://github.com/apache/flink/pull/4331 Hi @yestinchen , Thanks for the update. After second round of review. I found many problems with current approach. It returns only the first match in a stream in most cases. 1. Let's analyze a pattern `A B C` with `SKIP_TO_FIRST C` and a sequence `a1 b1 c1 a2 b2 c2`. It will return only `a1 b1 c1` and will left the NFA without any valid `ComputationalStates` which results in stopping processing. 2. Another problem is we do not handle a matches that can potentially finish before previously started. E.g. for Pattern ``` Patternpattern = Pattern.begin("ab").where(new SimpleCondition() { @Override public boolean filter(Event value) throws Exception { return value.getName().equals("a") || value.getName().equals("b"); } }).followedBy("c").where(new IterativeCondition() { @Override public boolean filter(Event value, Context ctx) throws Exception { return value.getName().equals("c") && ctx.getEventsForPattern("ab").iterator().next().getPrice() == value.getPrice(); } }).setAfterMatchSkipStrategy(new AfterMatchSkipStrategy(AfterMatchSkipStrategy.SkipStrategy.SKIP_PAST_LAST_EVENT)); ``` and a sequence `a(price = 1) b(price = 2) c(price = 2)`. I think a desired behaviour would be to start new matching after `c` event, but it won't as the matching started at `a` and will not start at `b`. --- Some general notes: 1. I think the SQL's specification does not suits well into CEP's library as we do not operate on a partition/bounded collection of events. The specification on the other hand assumes such bounded data. I think we would benefit from some additional documentation how the AFTER_MATCH clause works in case of unbounded data. E.g. what does **_empty match_** mean: > Note that the AFTER MATCH SKIP syntax only determines the point to resume scanning for a match after a non-empty match. When an empty match is found, one row is skipped (as if SKIP TO NEXT ROW had been speci ed). Thus an empty match never causes one of these exceptions. etc. 2. I really don't like the idea of so many cases when `RuntimeException` can be thrown. I feel the reason for using CEP is a constantly running jobs that search for patterns in a stream rather than ad-hoc queries. E.g in case of a Pattern like `A B? C` with `SKIP_TO_LAST B` a sequence like `a c` results in an exception and the job being killed. In my opinion it does not suits well into constantly running job. From operational side running such Patterns would be at least interesting ;), as they depend so much on the arriving data. 3. I don't know the reasoning, but Esper, that was mentioned as the other(besides Oracle) library that supports `MATCH_RECOGNIZE` clause does not support `AFTER MATCH` at all. 4. I found out there was already an ongoing work to introduce part of the `AFTER MATCH` (the `SKIP_PAST_LAST`). The corresponding jira: https://issues.apache.org/jira/browse/FLINK-3703 and closed PR: #2367 . To sum up thanks @yestinchen for the work. Unfortunately I think the clause needs a little bit more conceptual discussion before we can introduce this change. I think the `SKIP_PAST_LAST` behaviour would be very helpful (in fact there were alread requests for it in the mailing list) and the most straight forward to implement. I would love to here your opinions @yestinchen as well as @kl0u and @dianfu. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---