[jira] [Comment Edited] (CALCITE-3737) HOP Table-valued Function
[ https://issues.apache.org/jira/browse/CALCITE-3737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17103283#comment-17103283 ] Feng Zhu edited comment on CALCITE-3737 at 5/9/20, 12:26 PM: - Fixed via [https://github.com/apache/calcite/commit/40e588de5f999034e5030b12cdbc90f4073808fe], thanks for your PR [~amaliujia]! was (Author: donnyzone): Fixed via [https://github.com/apache/calcite/commit/890eb61ef486e2192110cefe4cac5aa6f150], thanks for your PR [~amaliujia]! > HOP Table-valued Function > - > > Key: CALCITE-3737 > URL: https://issues.apache.org/jira/browse/CALCITE-3737 > Project: Calcite > Issue Type: Sub-task >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > Labels: pull-request-available > Fix For: 1.23.0 > > Time Spent: 20h > Remaining Estimate: 0h > > Hopping windows place intervals of a fixed size evenly spaced across event > time. Most importantly, in the most common use a given event time timestamp > will generally fall into more than one window. > The table-valued function Hop may produce zero, one, or multiple rows > corresponding to each row of input. Hop takes four required parameters and > one optional parameter. All parameters are analogous to those for Tumble > except for hopsize, which specifies the duration between the starting points > (and endpoints) of the hopping windows, allowing for overlapping windows > (hopsize < dur, common) or gaps in the data (hopsize > dur, rarely useful). > {code:java} > Hop (data , timecol , dur, hopsize) > {code} > The return value of Hop is a relation that includes all columns of data as > well as additional event time columns wstart and wend. Here is an example > (from https://s.apache.org/streaming-beam-sql ): > {code:sql} > SELECT * > FROM Hop ( > data=> TABLE Bids , > timecol => DESCRIPTOR ( bidtime ) , > dur => INTERVAL '10' MINUTES , > hopsize => INTERVAL '5' MINUTES ); > -- > | wstart | wend | bidtime | price | item | > -- > | 8:00 | 8:10 | 8:07| $2| A| > | 8:05 | 8:15 | 8:07| $2| A| > | 8:05 | 8:15 | 8:11| $3| B| > | 8:10 | 8:20 | 8:11| $3| B| > | 8:00 | 8:10 | 8:05| $4| C| > | 8:05 | 8:15 | 8:05| $4| C| > | 8:00 | 8:10 | 8:09| $5| D| > | 8:05 | 8:15 | 8:09| $5| D| > | 8:05 | 8:15 | 8:13| $1| E| > | 8:10 | 8:20 | 8:13| $1| E| > | 8:10 | 8:20 | 8:17| $6| F| > | 8:15 | 8:25 | 8:17| $6| F| > -- > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (CALCITE-3737) HOP Table-valued Function
[ https://issues.apache.org/jira/browse/CALCITE-3737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17040664#comment-17040664 ] Rui Wang edited comment on CALCITE-3737 at 2/20/20 6:29 AM: [~danny0405] It's less likely be in 1.22.0. Remove 1.22 from the fix version. was (Author: amaliujia): [~danny0405] It's less likely been in 1.22.0. Remove 1.22 from the fix version. > HOP Table-valued Function > - > > Key: CALCITE-3737 > URL: https://issues.apache.org/jira/browse/CALCITE-3737 > Project: Calcite > Issue Type: Sub-task >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > Labels: pull-request-available > Time Spent: 2h 40m > Remaining Estimate: 0h > > Hopping windows place intervals of a fixed size evenly spaced across event > time. Most importantly, in the most common use a given event time timestamp > will generally fall into more than one window. > The table-valued function Hop may produce zero, one, or multiple rows > corresponding to each row of input. Hop takes four required parameters and > one optional parameter. All parameters are analogous to those for Tumble > except for hopsize, which specifies the duration between the starting points > (and endpoints) of the hopping windows, allowing for overlapping windows > (hopsize < dur, common) or gaps in the data (hopsize > dur, rarely useful). > {code:java} > Hop (data , timecol , dur, hopsize) > {code} > The return value of Hop is a relation that includes all columns of data as > well as additional event time columns wstart and wend. Here is an example > (from https://s.apache.org/streaming-beam-sql ): > {code:sql} > SELECT * > FROM Hop ( > data=> TABLE Bids , > timecol => DESCRIPTOR ( bidtime ) , > dur => INTERVAL '10' MINUTES , > hopsize => INTERVAL '5' MINUTES ); > -- > | wstart | wend | bidtime | price | item | > -- > | 8:00 | 8:10 | 8:07| $2| A| > | 8:05 | 8:15 | 8:07| $2| A| > | 8:05 | 8:15 | 8:11| $3| B| > | 8:10 | 8:20 | 8:11| $3| B| > | 8:00 | 8:10 | 8:05| $4| C| > | 8:05 | 8:15 | 8:05| $4| C| > | 8:00 | 8:10 | 8:09| $5| D| > | 8:05 | 8:15 | 8:09| $5| D| > | 8:05 | 8:15 | 8:13| $1| E| > | 8:10 | 8:20 | 8:13| $1| E| > | 8:10 | 8:20 | 8:17| $6| F| > | 8:15 | 8:25 | 8:17| $6| F| > -- > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (CALCITE-3737) HOP Table-valued Function
[ https://issues.apache.org/jira/browse/CALCITE-3737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17033135#comment-17033135 ] Rui Wang edited comment on CALCITE-3737 at 2/9/20 6:38 AM: --- [~julianhyde] To illustrate the difference of implementation among TUMBLE, HOP, SESSION, I add an implementation of SESSION table function to #1761. Hopefully from this PR we can tell what's the better way to unify implementations of windowing table functions. was (Author: amaliujia): [~julianhyde] To illustrate the difference of implementation among TUMBLE, HOP, SESSION, I add SESSION table function to #1761. Hopefully from this PR we can tell what's the better way to unify implementations of windowing table functions. > HOP Table-valued Function > - > > Key: CALCITE-3737 > URL: https://issues.apache.org/jira/browse/CALCITE-3737 > Project: Calcite > Issue Type: Sub-task >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > Labels: pull-request-available > Fix For: 1.22.0 > > Time Spent: 2h 40m > Remaining Estimate: 0h > > Hopping windows place intervals of a fixed size evenly spaced across event > time. Most importantly, in the most common use a given event time timestamp > will generally fall into more than one window. > The table-valued function Hop may produce zero, one, or multiple rows > corresponding to each row of input. Hop takes four required parameters and > one optional parameter. All parameters are analogous to those for Tumble > except for hopsize, which specifies the duration between the starting points > (and endpoints) of the hopping windows, allowing for overlapping windows > (hopsize < dur, common) or gaps in the data (hopsize > dur, rarely useful). > {code:java} > Hop (data , timecol , dur, hopsize) > {code} > The return value of Hop is a relation that includes all columns of data as > well as additional event time columns wstart and wend. Here is an example > (from https://s.apache.org/streaming-beam-sql ): > {code:sql} > SELECT * > FROM Hop ( > data=> TABLE Bids , > timecol => DESCRIPTOR ( bidtime ) , > dur => INTERVAL '10' MINUTES , > hopsize => INTERVAL '5' MINUTES ); > -- > | wstart | wend | bidtime | price | item | > -- > | 8:00 | 8:10 | 8:07| $2| A| > | 8:05 | 8:15 | 8:07| $2| A| > | 8:05 | 8:15 | 8:11| $3| B| > | 8:10 | 8:20 | 8:11| $3| B| > | 8:00 | 8:10 | 8:05| $4| C| > | 8:05 | 8:15 | 8:05| $4| C| > | 8:00 | 8:10 | 8:09| $5| D| > | 8:05 | 8:15 | 8:09| $5| D| > | 8:05 | 8:15 | 8:13| $1| E| > | 8:10 | 8:20 | 8:13| $1| E| > | 8:10 | 8:20 | 8:17| $6| F| > | 8:15 | 8:25 | 8:17| $6| F| > -- > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (CALCITE-3737) HOP Table-valued Function
[ https://issues.apache.org/jira/browse/CALCITE-3737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17022551#comment-17022551 ] Rui Wang edited comment on CALCITE-3737 at 1/23/20 10:19 PM: - Addressed your comments and have two responses to two of the comments: > Can HOP and TUMBLE share implementation? TUMBLE, HOP, SESSION have different models of computing window_start and window_end. TUMBLE is one to one. HOP is one to many. SESSION is many to many. Thus it's hard to fit them into a single "computeWindow(current_row)". I tried to share most of the code and just implemented the windowing part (computing window_start and window_end). Later I gave it up cause hopping need call one function to return a list of hopping's window_start and window_end, and we won't know the size of the list so we cannot really write a for loop in Java. (note that I need to build a list of lin4j expressions and you can check discussion here: [link|https://lists.apache.org/thread.html/86e5aa132de0656419843cab6c1f4fbea5941d4401dbde36cc11827e%40%3Cdev.calcite.apache.org%3E]). Also considering later I will add per-key sessionazation and bucket_gap_filling table functions, they will have even more complicated code to write and is also less sharable. For example, per-key sessionazation will need know all data first and then apply sorting to find window start and window end. Thus I will prefer implement those by the way that implements hopping (e.g. provide a AbstractEnumerable implementation). As I am building more table functions and add more support for streaming sql, if I find a better way to unify table function implementations, I will send patches for that. >Changes to reference.md need some copy-editing. I tried to check the changes in reference.md and made some changes. However I am not a native English speaker so I might not really fix what in your mind before. was (Author: amaliujia): Addressed your comments and have two responses to two of the comments: > Can HOP and TUMBLE share implementation? I tried to share most of the code and just implemented the windowing part (computing window_start and window_end). Later I gave it up cause hopping need call one function to return a list of hopping's window_start and window_end, and we won't know the size of the list so we cannot really write a for loop in Java. (note that I need to build a list of lin4j expressions and you can check discussion here: [link|https://lists.apache.org/thread.html/86e5aa132de0656419843cab6c1f4fbea5941d4401dbde36cc11827e%40%3Cdev.calcite.apache.org%3E]). Also considering later I will add per-key sessionazation and bucket_gap_filling table functions, they will have even more complicated code to write and is also less sharable. For example, per-key sessionazation will need know all data first and then apply sorting to find window start and window end. Thus I will prefer implement those by the way that implements hopping (e.g. provide a AbstractEnumerable implementation). As I am building more table functions and add support for streaming sql, if I want better way to unified table functions implementation, I will add patches for that. >Changes to reference.md need some copy-editing. I tried to check the changes in reference.md and made some changes. However I am not a native English speaker so I might not really fix what in your mind before. > HOP Table-valued Function > - > > Key: CALCITE-3737 > URL: https://issues.apache.org/jira/browse/CALCITE-3737 > Project: Calcite > Issue Type: Sub-task >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > Labels: pull-request-available > Time Spent: 2h > Remaining Estimate: 0h > > Hopping windows place intervals of a fixed size evenly spaced across event > time. Most importantly, in the most common use a given event time timestamp > will generally fall into more than one window. > The table-valued function Hop may produce zero, one, or multiple rows > corresponding to each row of input. Hop takes four required parameters and > one optional parameter. All parameters are analogous to those for Tumble > except for hopsize, which specifies the duration between the starting points > (and endpoints) of the hopping windows, allowing for overlapping windows > (hopsize < dur, common) or gaps in the data (hopsize > dur, rarely useful). > {code:java} > Hop (data , timecol , dur, hopsize) > {code} > The return value of Hop is a relation that includes all columns of data as > well as additional event time columns wstart and wend. Here is an example > (from https://s.apache.org/streaming-beam-sql ): > {code:sql} > SELECT * > FROM Hop ( > data=> TABLE Bids , > timecol => DESCRIPTOR ( bidtime ) , > dur
[jira] [Comment Edited] (CALCITE-3737) HOP Table-valued Function
[ https://issues.apache.org/jira/browse/CALCITE-3737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17022551#comment-17022551 ] Rui Wang edited comment on CALCITE-3737 at 1/23/20 10:15 PM: - Addressed your comments and have two responses to two of the comments: > Can HOP and TUMBLE share implementation? I tried to share most of the code and just implemented the windowing part (computing window_start and window_end). Later I gave it up cause hopping need call one function to return a list of hopping's window_start and window_end, and we won't know the size of the list so we cannot really write a for loop in Java. (note that I need to build a list of lin4j expressions and you can check discussion here: [link|https://lists.apache.org/thread.html/86e5aa132de0656419843cab6c1f4fbea5941d4401dbde36cc11827e%40%3Cdev.calcite.apache.org%3E]). Also considering later I will add per-key sessionazation and bucket_gap_filling table functions, they will have even more complicated code to write and is also less sharable. For example, per-key sessionazation will need know all data first and then apply sorting to find window start and window end. Thus I will prefer implement those by the way that implements hopping (e.g. provide a AbstractEnumerable implementation). As I am building more table functions and add support for streaming sql, if I want better way to unified table functions implementation, I will add patches for that. >Changes to reference.md need some copy-editing. I tried to check the changes in reference.md and made some changes. However I am not a native English speaker so I might not really fix what in your mind before. was (Author: amaliujia): Addressed your comments and have two responses to two of the comments: > Can HOP and TUMBLE share implementation? I tried to share most of the code and just implemented the windowing part (computing window_start and window_end). Later I gave it up cause hopping need call one function to return a list of hopping's window_start and window_end, and we won't know the size of the list so we cannot really write a for loop in Java. (note that I need to build a list of lin4j expressions and you can check discussion here: [link|https://lists.apache.org/thread.html/86e5aa132de0656419843cab6c1f4fbea5941d4401dbde36cc11827e%40%3Cdev.calcite.apache.org%3E]). Also considering later I will add per-key sessionazation and bucket_gap_filling table functions, they will have even more complicated code to write thus I will prefer implement those by the way that implements hopping (e.g. provide a AbstractEnumerable implementation). >Changes to reference.md need some copy-editing. I tried to check the changes in reference.md and made some changes. However I am not a native English speaker so I might not really fix what in your mind before. > HOP Table-valued Function > - > > Key: CALCITE-3737 > URL: https://issues.apache.org/jira/browse/CALCITE-3737 > Project: Calcite > Issue Type: Sub-task >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > Labels: pull-request-available > Time Spent: 2h > Remaining Estimate: 0h > > Hopping windows place intervals of a fixed size evenly spaced across event > time. Most importantly, in the most common use a given event time timestamp > will generally fall into more than one window. > The table-valued function Hop may produce zero, one, or multiple rows > corresponding to each row of input. Hop takes four required parameters and > one optional parameter. All parameters are analogous to those for Tumble > except for hopsize, which specifies the duration between the starting points > (and endpoints) of the hopping windows, allowing for overlapping windows > (hopsize < dur, common) or gaps in the data (hopsize > dur, rarely useful). > {code:java} > Hop (data , timecol , dur, hopsize) > {code} > The return value of Hop is a relation that includes all columns of data as > well as additional event time columns wstart and wend. Here is an example > (from https://s.apache.org/streaming-beam-sql ): > {code:sql} > SELECT * > FROM Hop ( > data=> TABLE Bids , > timecol => DESCRIPTOR ( bidtime ) , > dur => INTERVAL '10' MINUTES , > hopsize => INTERVAL '5' MINUTES ); > -- > | wstart | wend | bidtime | price | item | > -- > | 8:00 | 8:10 | 8:07| $2| A| > | 8:05 | 8:15 | 8:07| $2| A| > | 8:05 | 8:15 | 8:11| $3| B| > | 8:10 | 8:20 | 8:11| $3| B| > | 8:00 | 8:10 | 8:05| $4| C| > | 8:05 | 8:15 | 8:05| $4| C| > | 8:00 | 8:10 | 8:09| $5| D| > | 8:05 | 8:15 | 8:09| $5