[jira] [Comment Edited] (FLINK-7293) Support custom order by in PatternStream
[ https://issues.apache.org/jira/browse/FLINK-7293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16106909#comment-16106909 ] Dian Fu edited comment on FLINK-7293 at 7/31/17 7:35 AM: - Agree that we can sort both by the time and the custom order by in the upstream and then forward the results to CEP. While I still think that having a custom sort logic in CEP is very important. This would give users who use CEP API more flexibility to control the order of the matching events. As for CEP, the order of the events is very important. Without this feature, the matched results will be non-deterministic for many use case. was (Author: dian.fu): Agree that we can sort both by the time and the custom order by in the upstream and then forward the results to CEP. While I still think that having a custom sort logic in CEP is very important. This would give users who use CEP API more flexibility to control the order of the matching events. As for CEP, the order of the events is very important, without this feature, the matched results will be non-deterministic for many use case. > Support custom order by in PatternStream > > > Key: FLINK-7293 > URL: https://issues.apache.org/jira/browse/FLINK-7293 > Project: Flink > Issue Type: Sub-task > Components: CEP >Reporter: Dian Fu >Assignee: Dian Fu > > Currently, when {{ProcessingTime}} is configured, the events are fed to NFA > in the order of the arriving time and when {{EventTime}} is configured, the > events are fed to NFA in the order of the event time. It should also allow > custom {{order by}} to allow users to define the order of the events besides > the above factors. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (FLINK-7293) Support custom order by in PatternStream
[ https://issues.apache.org/jira/browse/FLINK-7293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16106882#comment-16106882 ] Dawid Wysakowicz edited comment on FLINK-7293 at 7/31/17 7:04 AM: -- I agree there is sort by time, but if the previous operator before CEP would sort both by the time and custom order, the sorting in CEP would not have any impact. You could even reuse the code from DataStreamSort. was (Author: dawidwys): I agree there is sort by time, but if the previous operator before CEP would sort both by the time and custom order, the sorting in CEP would not have any impact. > Support custom order by in PatternStream > > > Key: FLINK-7293 > URL: https://issues.apache.org/jira/browse/FLINK-7293 > Project: Flink > Issue Type: Sub-task > Components: CEP >Reporter: Dian Fu >Assignee: Dian Fu > > Currently, when {{ProcessingTime}} is configured, the events are fed to NFA > in the order of the arriving time and when {{EventTime}} is configured, the > events are fed to NFA in the order of the event time. It should also allow > custom {{order by}} to allow users to define the order of the events besides > the above factors. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (FLINK-7293) Support custom order by in PatternStream
[ https://issues.apache.org/jira/browse/FLINK-7293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16106882#comment-16106882 ] Dawid Wysakowicz edited comment on FLINK-7293 at 7/31/17 7:04 AM: -- I agree there is sort by time, but if the previous operator before CEP would sort both by the time and custom order, the sorting in CEP would not have any impact. was (Author: dawidwys): I agree there is sort by time, but if the previous operator before CEP would sort both by the time and custom order, the sorting in CEP would not have any impact. You could even reuse the code from DataStreamSort. > Support custom order by in PatternStream > > > Key: FLINK-7293 > URL: https://issues.apache.org/jira/browse/FLINK-7293 > Project: Flink > Issue Type: Sub-task > Components: CEP >Reporter: Dian Fu >Assignee: Dian Fu > > Currently, when {{ProcessingTime}} is configured, the events are fed to NFA > in the order of the arriving time and when {{EventTime}} is configured, the > events are fed to NFA in the order of the event time. It should also allow > custom {{order by}} to allow users to define the order of the events besides > the above factors. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (FLINK-7293) Support custom order by in PatternStream
[ https://issues.apache.org/jira/browse/FLINK-7293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16106877#comment-16106877 ] Dian Fu edited comment on FLINK-7293 at 7/31/17 7:00 AM: - As the {{event-time}}/{{process-time}} has higher priority over custom {{order by}}, so we can not first apply the custom sort and then pass it to the CEP library. {quote} This is the same case as in DataStream, which does not have sort function. {quote} Actually there are some differences. For example, there is no sort logic in DataStream at all, so all the sort logic can be implemented in Table API. While there is already sort logic in CEP library (event time) which makes it impossible to implement the sort in Table API without making the changes in this JIRA. Thoughts? was (Author: dian.fu): As the {{event-time}}/{{process-time}} has higher priority over custom {{order by}}, so we can not first apply the custom sort and then pass it to the CEP library. {quote} This is the same case as in DataStream, which does not have sort function. {quote} Actually there are some differences. For example, there is no sort logic in DataStream at all, so all the sort logic can be implemented in Table API. While there is already sort logic in CEP library (event time) which makes us can not implement the sort in Table API alone. Thoughts? > Support custom order by in PatternStream > > > Key: FLINK-7293 > URL: https://issues.apache.org/jira/browse/FLINK-7293 > Project: Flink > Issue Type: Sub-task > Components: CEP >Reporter: Dian Fu >Assignee: Dian Fu > > Currently, when {{ProcessingTime}} is configured, the events are fed to NFA > in the order of the arriving time and when {{EventTime}} is configured, the > events are fed to NFA in the order of the event time. It should also allow > custom {{order by}} to allow users to define the order of the events besides > the above factors. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (FLINK-7293) Support custom order by in PatternStream
[ https://issues.apache.org/jira/browse/FLINK-7293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16106877#comment-16106877 ] Dian Fu edited comment on FLINK-7293 at 7/31/17 6:59 AM: - As the {{event-time}}/{{process-time}} has higher priority over custom {{order by}}, so we can not first apply the custom sort and then pass it to the CEP library. {quote} This is the same case as in DataStream, which does not have sort function. {quote} Actually there are some differences. For example, there is no sort logic in DataStream at all, so all the sort logic can be implemented in Table API. While there is already sort logic in CEP library (event time) which makes us can not implement the sort in Table API alone. Thoughts? was (Author: dian.fu): As the {{event-time}}/{{process-time}} has higher priority over custom {{order by}}, so we can not first apply the custom sort and then pass it to the CEP library. {quote} This is the same case as in DataStream, which does not have sort function. {quote} Actually there are some differences. For example, there is no sort function in DataStream at all, so all the sort logic can be implemented in Table API. While there is already sort logic in CEP library (event time) which makes us can not implement the sort in Table API alone. Thoughts? > Support custom order by in PatternStream > > > Key: FLINK-7293 > URL: https://issues.apache.org/jira/browse/FLINK-7293 > Project: Flink > Issue Type: Sub-task > Components: CEP >Reporter: Dian Fu >Assignee: Dian Fu > > Currently, when {{ProcessingTime}} is configured, the events are fed to NFA > in the order of the arriving time and when {{EventTime}} is configured, the > events are fed to NFA in the order of the event time. It should also allow > custom {{order by}} to allow users to define the order of the events besides > the above factors. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (FLINK-7293) Support custom order by in PatternStream
[ https://issues.apache.org/jira/browse/FLINK-7293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16106877#comment-16106877 ] Dian Fu edited comment on FLINK-7293 at 7/31/17 6:59 AM: - As the {{event-time}}/{{process-time}} has higher priority over custom {{order by}}, so we can not first apply the custom sort and then pass it to the CEP library. {quote} This is the same case as in DataStream, which does not have sort function. {quote} Actually there are some differences. For example, there is no sort function in DataStream at all, so all the sort logic can be implemented in Table API. While there is already sort logic in CEP library (event time) which makes us can not implement the sort in Table API alone. Thoughts? was (Author: dian.fu): As the {{event-time}}/{{process-time}} has higher priority than custom {{order by}}, so we can not first apply the custom sort and then pass it to the CEP library. {quote} This is the same case as in DataStream, which does not have sort function. {quote} Actually there are some differences. For example, there is no sort function in DataStream at all, so all the sort logic can be implemented in Table API. While there is already sort logic in CEP library (event time) which makes us can not implement the sort in Table API alone. Thoughts? > Support custom order by in PatternStream > > > Key: FLINK-7293 > URL: https://issues.apache.org/jira/browse/FLINK-7293 > Project: Flink > Issue Type: Sub-task > Components: CEP >Reporter: Dian Fu >Assignee: Dian Fu > > Currently, when {{ProcessingTime}} is configured, the events are fed to NFA > in the order of the arriving time and when {{EventTime}} is configured, the > events are fed to NFA in the order of the event time. It should also allow > custom {{order by}} to allow users to define the order of the events besides > the above factors. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (FLINK-7293) Support custom order by in PatternStream
[ https://issues.apache.org/jira/browse/FLINK-7293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16105963#comment-16105963 ] Dian Fu edited comment on FLINK-7293 at 7/29/17 1:59 AM: - {quote} Could you explain a bit why this is needed? {quote} As we need to support clauses such as {code} SELECT * FROM Ticker MATCH_RECOGNIZE ( PARTITION BY symbol ORDER BY tstamp, price MEASURES STRT.tstamp AS start_tstamp, LAST(DOWN.tstamp) AS bottom_tstamp, LAST(UP.tstamp) AS end_tstamp ONE ROW PER MATCH AFTER MATCH SKIP TO LAST UP PATTERN (STRT DOWN+ UP+) DEFINE DOWN AS DOWN.price < PREV(DOWN.price), UP AS UP.price > PREV(UP.price) ) MR {code} There may be multiple columns {{tstamp}} and {{price}} to {{order by}}. {quote} I can't see a way to sort an unbounded stream of data Could you elaborate a bit how do you see it working? how this is going to play well with the Time semantics. When both event-time and a custom order-by is used, who is going to win? {quote} This is working in the same way as the implementation of {{sort by}} in table API. That's to say, both the event-time and the custom order-by will be used and the event-time will be considered with higher priority and the custom order-by will be considered with lower priority. With both event-time and a custom order-by are used, when events come, they will be firstly ordered by the event time and when watermark come, the events with the same event time before watermark will be firstly ordered by the custom order-by before emitted (Please refer to [DataStreamSort.scala|https://github.com/apache/flink/blob/b8c8f204de718e6d5b7c3df837deafaed7c375f5/flink-libraries/flink-table/src/main/scala/org/apache/flink/table/plan/nodes/datastream/DataStreamSort.scala] for more details) Thoughts? was (Author: dian.fu): {quote} Could you explain a bit why this is needed? {quote} As we need to support clauses such as {code} SELECT * FROM Ticker MATCH_RECOGNIZE ( PARTITION BY symbol ORDER BY tstamp, price MEASURES STRT.tstamp AS start_tstamp, LAST(DOWN.tstamp) AS bottom_tstamp, LAST(UP.tstamp) AS end_tstamp ONE ROW PER MATCH AFTER MATCH SKIP TO LAST UP PATTERN (STRT DOWN+ UP+) DEFINE DOWN AS DOWN.price < PREV(DOWN.price), UP AS UP.price > PREV(UP.price) ) MR {code} There may be multiple columns to order by. {quote} I can't see a way to sort an unbounded stream of data Could you elaborate a bit how do you see it working? how this is going to play well with the Time semantics. When both event-time and a custom order-by is used, who is going to win? {quote} This is working in the same way as the implementation of {{sort by}} in table API. That's to say, both the event-time and the custom order-by will be used and the event-time should be considered with higher priority and the custom order-by will be considered with lower priorities. With both event-time and a custom order-by are used, when events come, they will be firstly ordered by the event time and when watermark come, the events before watermark with the same event time will firstly ordered by the custom order-by before emitted (Please refer to [DataStreamSort.scala|https://github.com/apache/flink/blob/b8c8f204de718e6d5b7c3df837deafaed7c375f5/flink-libraries/flink-table/src/main/scala/org/apache/flink/table/plan/nodes/datastream/DataStreamSort.scala] for more details) Thoughts? > Support custom order by in PatternStream > > > Key: FLINK-7293 > URL: https://issues.apache.org/jira/browse/FLINK-7293 > Project: Flink > Issue Type: Sub-task > Components: CEP >Reporter: Dian Fu >Assignee: Dian Fu > > Currently, when {{ProcessingTime}} is configured, the events are fed to NFA > in the order of the arriving time and when {{EventTime}} is configured, the > events are fed to NFA in the order of the event time. It should also allow > custom {{order by}} to allow users to define the order of the events besides > the above factors. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (FLINK-7293) Support custom order by in PatternStream
[ https://issues.apache.org/jira/browse/FLINK-7293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16105963#comment-16105963 ] Dian Fu edited comment on FLINK-7293 at 7/29/17 1:30 AM: - {quote} Could you explain a bit why this is needed? {quote} As we need to support clauses such as {code} SELECT * FROM Ticker MATCH_RECOGNIZE ( PARTITION BY symbol ORDER BY tstamp, price MEASURES STRT.tstamp AS start_tstamp, LAST(DOWN.tstamp) AS bottom_tstamp, LAST(UP.tstamp) AS end_tstamp ONE ROW PER MATCH AFTER MATCH SKIP TO LAST UP PATTERN (STRT DOWN+ UP+) DEFINE DOWN AS DOWN.price < PREV(DOWN.price), UP AS UP.price > PREV(UP.price) ) MR {code} There may be multiple columns to order by. {quote} I can't see a way to sort an unbounded stream of data Could you elaborate a bit how do you see it working? how this is going to play well with the Time semantics. When both event-time and a custom order-by is used, who is going to win? {quote} This is working in the same way as the implementation of {{sort by}} in table API. That's to say, both the event-time and the custom order-by will be used and the event-time should be considered with higher priority and the custom order-by will be considered with lower priorities. With both event-time and a custom order-by are used, when events come, they will be firstly ordered by the event time and when watermark come, the events before watermark with the same event time will firstly ordered by the custom order-by before emitted (Please refer to [DataStreamSort.scala|https://github.com/apache/flink/blob/b8c8f204de718e6d5b7c3df837deafaed7c375f5/flink-libraries/flink-table/src/main/scala/org/apache/flink/table/plan/nodes/datastream/DataStreamSort.scala] for more details) Thoughts? was (Author: dian.fu): {quote} Could you explain a bit why this is needed? {quote} As we need to support clauses such as {code} SELECT * FROM Ticker MATCH_RECOGNIZE ( PARTITION BY symbol ORDER BY tstamp, price MEASURES STRT.tstamp AS start_tstamp, LAST(DOWN.tstamp) AS bottom_tstamp, LAST(UP.tstamp) AS end_tstamp ONE ROW PER MATCH AFTER MATCH SKIP TO LAST UP PATTERN (STRT DOWN+ UP+) DEFINE DOWN AS DOWN.price < PREV(DOWN.price), UP AS UP.price > PREV(UP.price) ) MR {code} There may be multiple columns to order by. {quote} I can't see a way to sort an unbounded stream of data Could you elaborate a bit how do you see it working? how this is going to play well with the Time semantics. When both event-time and a custom order-by is used, who is going to win? {quote} This is working in the same way as the implementation of {{sort by}} in table API. That's to say, both the event-time and the custom order-by will be used and the event-time should be considered with higher priority and the custom order-by will be considered with lower priorities. (Please refer to [DataStreamSort.scala|https://github.com/apache/flink/blob/b8c8f204de718e6d5b7c3df837deafaed7c375f5/flink-libraries/flink-table/src/main/scala/org/apache/flink/table/plan/nodes/datastream/DataStreamSort.scala] for more details) Thoughts? > Support custom order by in PatternStream > > > Key: FLINK-7293 > URL: https://issues.apache.org/jira/browse/FLINK-7293 > Project: Flink > Issue Type: Sub-task > Components: CEP >Reporter: Dian Fu >Assignee: Dian Fu > > Currently, when {{ProcessingTime}} is configured, the events are fed to NFA > in the order of the arriving time and when {{EventTime}} is configured, the > events are fed to NFA in the order of the event time. It should also allow > custom {{order by}} to allow users to define the order of the events besides > the above factors. -- This message was sent by Atlassian JIRA (v6.4.14#64029)