[GitHub] flink issue #4331: [FLINK-7169][CEP] Support AFTER MATCH SKIP function in CE...

2017-08-30 Thread yestinchen
Github user yestinchen commented on the issue:

https://github.com/apache/flink/pull/4331
  
@dawidwys Thanks for the reviews and comments, please change the 
documentation during merge.
And @kl0u , thanks for the reviews!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #4331: [FLINK-7169][CEP] Support AFTER MATCH SKIP function in CE...

2017-08-30 Thread kl0u
Github user kl0u commented on the issue:

https://github.com/apache/flink/pull/4331
  
I think you can merge it @dawidwys . 
I was following the evolution of this PR and I think it looks good ;) . 
Thanks for the work both @yestinchen and @dawidwys !


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #4331: [FLINK-7169][CEP] Support AFTER MATCH SKIP function in CE...

2017-08-29 Thread yestinchen
Github user yestinchen commented on the issue:

https://github.com/apache/flink/pull/4331
  
Hi @dawidwys , sorry for the late response. 
Thanks for your reviews, I have updated the test and the document. Please 
take a look if you have time. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #4331: [FLINK-7169][CEP] Support AFTER MATCH SKIP function in CE...

2017-08-21 Thread dawidwys
Github user dawidwys commented on the issue:

https://github.com/apache/flink/pull/4331
  
Still missing some tests:

- ensuring the `NFA#extractCurrentMatches` returns patterns in order.
- skip to first/last with `oneOrMore`

Docs missing


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #4331: [FLINK-7169][CEP] Support AFTER MATCH SKIP function in CE...

2017-08-12 Thread yestinchen
Github user yestinchen commented on the issue:

https://github.com/apache/flink/pull/4331
  
Thanks for your reviews @dawidwys ! I'll update the doc in the following 
commits.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #4331: [FLINK-7169][CEP] Support AFTER MATCH SKIP function in CE...

2017-08-04 Thread yestinchen
Github user yestinchen commented on the issue:

https://github.com/apache/flink/pull/4331
  
@dawidwys @dianfu I've updated the approach according to the document. Feel 
free to comment.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #4331: [FLINK-7169][CEP] Support AFTER MATCH SKIP function in CE...

2017-07-31 Thread yestinchen
Github user yestinchen commented on the issue:

https://github.com/apache/flink/pull/4331
  
@dianfu Thanks for your reviewing. 
I found @dawidwys wrote a draft about the JIRA's implementation. I'll go 
through that first and address those issues in this PR latter. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #4331: [FLINK-7169][CEP] Support AFTER MATCH SKIP function in CE...

2017-07-28 Thread litrain
Github user litrain commented on the issue:

https://github.com/apache/flink/pull/4331
  
@dawidwys  @yestinchen Thanks for your discussion, I am also working on the 
empty match issue now.  Please have a look at 
https://issues.apache.org/jira/browse/FLINK-7292. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #4331: [FLINK-7169][CEP] Support AFTER MATCH SKIP function in CE...

2017-07-28 Thread dawidwys
Github user dawidwys commented on the issue:

https://github.com/apache/flink/pull/4331
  
Hi @yestinchen 

Thanks for your feedback. 

Right now, I just want to address the empty match issue. It is not easy to 
apply that definition to unbounded data. In the SQL specification at the end of 
partition we can deterministically decide if a partial match is empty or not. 
It is not the case in unbounded data, as future arriving events may make the 
partial matches empty. 

Those are the nuances I think should be addressed first, but I agree there 
are many similarities.

I will try to address rest later during the day, but I think we should 
continue in the JIRA.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #4331: [FLINK-7169][CEP] Support AFTER MATCH SKIP function in CE...

2017-07-27 Thread dianfu
Github user dianfu commented on the issue:

https://github.com/apache/flink/pull/4331
  
Sorry for late response. I think this feature is very useful and agree that 
we should have a clear thought on what things should be for each skip strategy. 
I noticed that there are already some discussions in FLINK-3703 which we can 
refer. I will take a look at this PR and also FLINK-3703 these two days and 
will post my thought.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #4331: [FLINK-7169][CEP] Support AFTER MATCH SKIP function in CE...

2017-07-27 Thread yestinchen
Github user yestinchen commented on the issue:

https://github.com/apache/flink/pull/4331
  
@dawidwys Thanks for the reviewing. 

Problem 1 is easy to fix, we can just start a new match process if the only 
left computation state reaches stopState.
Problem 2 can not be avoided with current approach.  It's impossible to 
know whether there are potential matches. 

I think the best wary to implement this correctly is try to start a new 
match process after processing each event, and discard unfinished match process 
after a successful match according to the skip strategy. In order to do that, 
we need to keep the logical order of the events, which is the original idea I 
proposed. 

As for your general notes, I have some ideas:

1. I agree that the Oracle's specification is designed for bounded data. 
But match recoginize in unbounded data is very similar to bounded data, since 
all data are being processed one by one, and there's no need for bound 
information. As for **_empty match_** , I think we can just use Oracle's 
definition.
> Some patterns permit empty matches. For example:
PATTERN (A*)
can be matched by zero or more rows that are mapped to A.
An empty match does not map any rows to primary row pattern variables; 
nevertheless, an empty match has a starting row. For example, there can be an 
empty match at the first row of a row pattern partition, an empty match at the 
second row of a row pattern partition, etc. An empty match is assigned a 
sequential match number, based on the ordinal position of its starting row, the 
same as any other match.

2. I feel uncomfortable with the RuntimeExceptions too. But these 
exceptions are very important to keep the skip semantics right. I understand 
your main concern is that Exceptions will stop the matching process, which is 
unacceptable to online streaming service. To address this, I think we can 
introduce a default strategy(SKIP_TO_NEXT_EVENT, for example). If these 
exceptions happens, we can use default strategy to continue the match process, 
and change the strategy back after a successful match. We can also add a switch 
to let user decide whether to enable this feature.

3. I still think it's useful to support these skip strategies. Don't know 
why Esper does not support them.

4. Thanks for the related information. I took a brief look at the PR, which 
is very similar to this PR. I  wonder why it is closed without merging into the 
master code?

Looking forward to your feedbacks. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #4331: [FLINK-7169][CEP] Support AFTER MATCH SKIP function in CE...

2017-07-27 Thread dawidwys
Github user dawidwys commented on the issue:

https://github.com/apache/flink/pull/4331
  
Hi @yestinchen ,
Thanks for the update.

After second round of review. I found many problems with current approach. 
It returns only the first match in a stream in most cases.

1. Let's analyze a pattern `A B C` with `SKIP_TO_FIRST C` and a sequence 
`a1 b1 c1 a2 b2 c2`. It will return only `a1 b1 c1` and will left the NFA 
without any valid `ComputationalStates` which results in stopping processing.

2. Another problem is we do not handle a matches that can potentially 
finish before previously started. E.g. for Pattern 

```
Pattern pattern = Pattern.begin("ab").where(new 
SimpleCondition() {
@Override
public boolean filter(Event value) throws Exception {
return value.getName().equals("a") || 
value.getName().equals("b");
}
}).followedBy("c").where(new IterativeCondition() {
@Override
public boolean filter(Event value, Context ctx) throws Exception 
{
return value.getName().equals("c") && 
ctx.getEventsForPattern("ab").iterator().next().getPrice() == value.getPrice();
}
}).setAfterMatchSkipStrategy(new 
AfterMatchSkipStrategy(AfterMatchSkipStrategy.SkipStrategy.SKIP_PAST_LAST_EVENT));
```

and a sequence `a(price = 1) b(price = 2) c(price = 2)`. I think a desired 
behaviour would be to start new matching after `c` event, but it won't as the 
matching started at `a` and will not start at `b`.

---

Some general notes:

1. I think the SQL's specification does not suits well into CEP's library 
as we do not operate on a partition/bounded collection of events. The 
specification on the other hand assumes such bounded data. I think we would 
benefit from some additional documentation how the AFTER_MATCH clause works in 
case of unbounded data. E.g. what does **_empty match_** mean:

> Note that the AFTER MATCH SKIP syntax only determines the point to resume 
scanning for a match after a non-empty match. When an empty match is found, one 
row is skipped (as if SKIP TO NEXT ROW had been speci ed). Thus an empty match 
never causes one of these exceptions.

etc.

2. I really don't like the idea of so many cases when `RuntimeException` 
can be thrown. I feel the reason for using CEP is a constantly running jobs 
that search for patterns in a stream rather than ad-hoc queries. 
E.g in case of a Pattern like `A B? C` with `SKIP_TO_LAST B` a sequence 
like `a c` results in an exception and the job being killed. In my opinion it 
does not suits well into constantly running job. From operational side running 
such Patterns would be at least interesting ;), as they depend so much on the 
arriving data.

3. I don't know the reasoning, but Esper, that was mentioned as the 
other(besides Oracle) library that supports `MATCH_RECOGNIZE` clause does not 
support `AFTER MATCH` at all.

4. I found out there was already an ongoing work to introduce part of the 
`AFTER MATCH` (the `SKIP_PAST_LAST`). The corresponding jira: 
https://issues.apache.org/jira/browse/FLINK-3703 and closed PR: #2367 .

To sum up thanks @yestinchen for the work. Unfortunately I think the clause 
needs a little bit more conceptual discussion before we can introduce this 
change. I think the `SKIP_PAST_LAST` behaviour would be very helpful (in fact 
there were alread requests for it in the mailing list) and the most straight 
forward to implement. I would love to here your opinions @yestinchen as well as 
@kl0u and @dianfu.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---