[jira] [Comment Edited] (CALCITE-3737) HOP Table-valued Function

2020-05-09 Thread Feng Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-3737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17103283#comment-17103283
 ] 

Feng Zhu edited comment on CALCITE-3737 at 5/9/20, 12:26 PM:
-

Fixed via 
[https://github.com/apache/calcite/commit/40e588de5f999034e5030b12cdbc90f4073808fe],
 thanks for your PR [~amaliujia]!


was (Author: donnyzone):
Fixed via 
[https://github.com/apache/calcite/commit/890eb61ef486e2192110cefe4cac5aa6f150],
 thanks for your PR [~amaliujia]!

> HOP Table-valued Function
> -
>
> Key: CALCITE-3737
> URL: https://issues.apache.org/jira/browse/CALCITE-3737
> Project: Calcite
>  Issue Type: Sub-task
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.23.0
>
>  Time Spent: 20h
>  Remaining Estimate: 0h
>
> Hopping windows place intervals of a fixed size evenly spaced across event 
> time. Most importantly, in the most common use a given event time timestamp 
> will generally fall into more than one window.
> The table-valued function Hop may produce zero, one, or multiple rows 
> corresponding to each row of input.  Hop takes four required parameters and 
> one optional parameter. All parameters are analogous to those for Tumble 
> except for hopsize, which specifies the duration between the starting points 
> (and endpoints) of the hopping windows, allowing for overlapping windows 
> (hopsize < dur, common) or gaps in the data (hopsize > dur, rarely useful).
> {code:java}
> Hop (data , timecol , dur, hopsize)
> {code}
> The return value of Hop is a relation that includes all columns of data as 
> well as additional event time columns wstart and wend. Here is an example 
> (from https://s.apache.org/streaming-beam-sql ):
> {code:sql}
> SELECT *
>   FROM Hop (
> data=> TABLE Bids ,
> timecol => DESCRIPTOR ( bidtime ) ,
> dur => INTERVAL '10' MINUTES ,
> hopsize => INTERVAL '5' MINUTES );
> --
> | wstart | wend | bidtime | price | item |
> --
> | 8:00   | 8:10 | 8:07| $2| A|
> | 8:05   | 8:15 | 8:07| $2| A|
> | 8:05   | 8:15 | 8:11| $3| B|
> | 8:10   | 8:20 | 8:11| $3| B|
> | 8:00   | 8:10 | 8:05| $4| C|
> | 8:05   | 8:15 | 8:05| $4| C|
> | 8:00   | 8:10 | 8:09| $5| D|
> | 8:05   | 8:15 | 8:09| $5| D|
> | 8:05   | 8:15 | 8:13| $1| E|
> | 8:10   | 8:20 | 8:13| $1| E|
> | 8:10   | 8:20 | 8:17| $6| F|
> | 8:15   | 8:25 | 8:17| $6| F|
> --
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (CALCITE-3737) HOP Table-valued Function

2020-02-19 Thread Rui Wang (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-3737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17040664#comment-17040664
 ] 

Rui Wang edited comment on CALCITE-3737 at 2/20/20 6:29 AM:


[~danny0405]

It's less likely be in 1.22.0. Remove 1.22 from the fix version.


was (Author: amaliujia):
[~danny0405]

It's less likely been in 1.22.0. Remove 1.22 from the fix version.

> HOP Table-valued Function
> -
>
> Key: CALCITE-3737
> URL: https://issues.apache.org/jira/browse/CALCITE-3737
> Project: Calcite
>  Issue Type: Sub-task
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Hopping windows place intervals of a fixed size evenly spaced across event 
> time. Most importantly, in the most common use a given event time timestamp 
> will generally fall into more than one window.
> The table-valued function Hop may produce zero, one, or multiple rows 
> corresponding to each row of input.  Hop takes four required parameters and 
> one optional parameter. All parameters are analogous to those for Tumble 
> except for hopsize, which specifies the duration between the starting points 
> (and endpoints) of the hopping windows, allowing for overlapping windows 
> (hopsize < dur, common) or gaps in the data (hopsize > dur, rarely useful).
> {code:java}
> Hop (data , timecol , dur, hopsize)
> {code}
> The return value of Hop is a relation that includes all columns of data as 
> well as additional event time columns wstart and wend. Here is an example 
> (from https://s.apache.org/streaming-beam-sql ):
> {code:sql}
> SELECT *
>   FROM Hop (
> data=> TABLE Bids ,
> timecol => DESCRIPTOR ( bidtime ) ,
> dur => INTERVAL '10' MINUTES ,
> hopsize => INTERVAL '5' MINUTES );
> --
> | wstart | wend | bidtime | price | item |
> --
> | 8:00   | 8:10 | 8:07| $2| A|
> | 8:05   | 8:15 | 8:07| $2| A|
> | 8:05   | 8:15 | 8:11| $3| B|
> | 8:10   | 8:20 | 8:11| $3| B|
> | 8:00   | 8:10 | 8:05| $4| C|
> | 8:05   | 8:15 | 8:05| $4| C|
> | 8:00   | 8:10 | 8:09| $5| D|
> | 8:05   | 8:15 | 8:09| $5| D|
> | 8:05   | 8:15 | 8:13| $1| E|
> | 8:10   | 8:20 | 8:13| $1| E|
> | 8:10   | 8:20 | 8:17| $6| F|
> | 8:15   | 8:25 | 8:17| $6| F|
> --
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (CALCITE-3737) HOP Table-valued Function

2020-02-08 Thread Rui Wang (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-3737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17033135#comment-17033135
 ] 

Rui Wang edited comment on CALCITE-3737 at 2/9/20 6:38 AM:
---

[~julianhyde]

To illustrate the difference of implementation among TUMBLE, HOP, SESSION, I 
add an implementation of SESSION table function to #1761. Hopefully from this 
PR we can tell what's the better way to unify implementations of windowing 
table functions. 


was (Author: amaliujia):
[~julianhyde]

To illustrate the difference of implementation among TUMBLE, HOP, SESSION, I 
add SESSION table function to #1761. Hopefully from this PR we can tell what's 
the better way to unify implementations of windowing table functions. 

> HOP Table-valued Function
> -
>
> Key: CALCITE-3737
> URL: https://issues.apache.org/jira/browse/CALCITE-3737
> Project: Calcite
>  Issue Type: Sub-task
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.22.0
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Hopping windows place intervals of a fixed size evenly spaced across event 
> time. Most importantly, in the most common use a given event time timestamp 
> will generally fall into more than one window.
> The table-valued function Hop may produce zero, one, or multiple rows 
> corresponding to each row of input.  Hop takes four required parameters and 
> one optional parameter. All parameters are analogous to those for Tumble 
> except for hopsize, which specifies the duration between the starting points 
> (and endpoints) of the hopping windows, allowing for overlapping windows 
> (hopsize < dur, common) or gaps in the data (hopsize > dur, rarely useful).
> {code:java}
> Hop (data , timecol , dur, hopsize)
> {code}
> The return value of Hop is a relation that includes all columns of data as 
> well as additional event time columns wstart and wend. Here is an example 
> (from https://s.apache.org/streaming-beam-sql ):
> {code:sql}
> SELECT *
>   FROM Hop (
> data=> TABLE Bids ,
> timecol => DESCRIPTOR ( bidtime ) ,
> dur => INTERVAL '10' MINUTES ,
> hopsize => INTERVAL '5' MINUTES );
> --
> | wstart | wend | bidtime | price | item |
> --
> | 8:00   | 8:10 | 8:07| $2| A|
> | 8:05   | 8:15 | 8:07| $2| A|
> | 8:05   | 8:15 | 8:11| $3| B|
> | 8:10   | 8:20 | 8:11| $3| B|
> | 8:00   | 8:10 | 8:05| $4| C|
> | 8:05   | 8:15 | 8:05| $4| C|
> | 8:00   | 8:10 | 8:09| $5| D|
> | 8:05   | 8:15 | 8:09| $5| D|
> | 8:05   | 8:15 | 8:13| $1| E|
> | 8:10   | 8:20 | 8:13| $1| E|
> | 8:10   | 8:20 | 8:17| $6| F|
> | 8:15   | 8:25 | 8:17| $6| F|
> --
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (CALCITE-3737) HOP Table-valued Function

2020-01-23 Thread Rui Wang (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-3737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17022551#comment-17022551
 ] 

Rui Wang edited comment on CALCITE-3737 at 1/23/20 10:19 PM:
-

Addressed your comments and have two responses to two of the comments:

> Can HOP and TUMBLE share implementation?
TUMBLE, HOP, SESSION have different models of computing window_start and 
window_end. TUMBLE is one to one. HOP is one to many. SESSION is many to many. 
Thus it's hard to fit them into a single "computeWindow(current_row)". 

I tried to share most of the code and just implemented the windowing part 
(computing window_start and window_end). Later I gave it up cause hopping need 
call one function to return a list of hopping's window_start and window_end, 
and we won't know the size of the list so we cannot really write a for loop in 
Java. (note that I need to build a list of lin4j expressions and you can check 
discussion here: 
[link|https://lists.apache.org/thread.html/86e5aa132de0656419843cab6c1f4fbea5941d4401dbde36cc11827e%40%3Cdev.calcite.apache.org%3E]).

Also considering later I will add per-key sessionazation and bucket_gap_filling 
table functions, they will have even more complicated code to write and is also 
less sharable. For example, per-key sessionazation will need know all data 
first and then apply sorting to find window start and window end. Thus I will 
prefer implement those by the way that implements hopping (e.g. provide a 
AbstractEnumerable implementation).

As I am building more table functions and add more support for streaming sql, 
if I find a better way to unify table function implementations, I will send 
patches for that.

>Changes to reference.md need some copy-editing.
I tried to check the changes in reference.md and made some changes. However I 
am not a native English speaker so I might not really fix what in your mind 
before. 




was (Author: amaliujia):
Addressed your comments and have two responses to two of the comments:

> Can HOP and TUMBLE share implementation?
I tried to share most of the code and just implemented the windowing part 
(computing window_start and window_end). Later I gave it up cause hopping need 
call one function to return a list of hopping's window_start and window_end, 
and we won't know the size of the list so we cannot really write a for loop in 
Java. (note that I need to build a list of lin4j expressions and you can check 
discussion here: 
[link|https://lists.apache.org/thread.html/86e5aa132de0656419843cab6c1f4fbea5941d4401dbde36cc11827e%40%3Cdev.calcite.apache.org%3E]).

Also considering later I will add per-key sessionazation and bucket_gap_filling 
table functions, they will have even more complicated code to write and is also 
less sharable. For example, per-key sessionazation will need know all data 
first and then apply sorting to find window start and window end. Thus I will 
prefer implement those by the way that  implements hopping (e.g. provide a 
AbstractEnumerable implementation).

As I am building more table functions and add support for streaming sql, if I 
want better way to unified table functions implementation, I will add patches 
for that.

>Changes to reference.md need some copy-editing.
I tried to check the changes in reference.md and made some changes. However I 
am not a native English speaker so I might not really fix what in your mind 
before. 



> HOP Table-valued Function
> -
>
> Key: CALCITE-3737
> URL: https://issues.apache.org/jira/browse/CALCITE-3737
> Project: Calcite
>  Issue Type: Sub-task
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Hopping windows place intervals of a fixed size evenly spaced across event 
> time. Most importantly, in the most common use a given event time timestamp 
> will generally fall into more than one window.
> The table-valued function Hop may produce zero, one, or multiple rows 
> corresponding to each row of input.  Hop takes four required parameters and 
> one optional parameter. All parameters are analogous to those for Tumble 
> except for hopsize, which specifies the duration between the starting points 
> (and endpoints) of the hopping windows, allowing for overlapping windows 
> (hopsize < dur, common) or gaps in the data (hopsize > dur, rarely useful).
> {code:java}
> Hop (data , timecol , dur, hopsize)
> {code}
> The return value of Hop is a relation that includes all columns of data as 
> well as additional event time columns wstart and wend. Here is an example 
> (from https://s.apache.org/streaming-beam-sql ):
> {code:sql}
> SELECT *
>   FROM Hop (
> data=> TABLE Bids ,
> timecol => DESCRIPTOR ( bidtime ) ,
> dur 

[jira] [Comment Edited] (CALCITE-3737) HOP Table-valued Function

2020-01-23 Thread Rui Wang (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-3737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17022551#comment-17022551
 ] 

Rui Wang edited comment on CALCITE-3737 at 1/23/20 10:15 PM:
-

Addressed your comments and have two responses to two of the comments:

> Can HOP and TUMBLE share implementation?
I tried to share most of the code and just implemented the windowing part 
(computing window_start and window_end). Later I gave it up cause hopping need 
call one function to return a list of hopping's window_start and window_end, 
and we won't know the size of the list so we cannot really write a for loop in 
Java. (note that I need to build a list of lin4j expressions and you can check 
discussion here: 
[link|https://lists.apache.org/thread.html/86e5aa132de0656419843cab6c1f4fbea5941d4401dbde36cc11827e%40%3Cdev.calcite.apache.org%3E]).

Also considering later I will add per-key sessionazation and bucket_gap_filling 
table functions, they will have even more complicated code to write and is also 
less sharable. For example, per-key sessionazation will need know all data 
first and then apply sorting to find window start and window end. Thus I will 
prefer implement those by the way that  implements hopping (e.g. provide a 
AbstractEnumerable implementation).

As I am building more table functions and add support for streaming sql, if I 
want better way to unified table functions implementation, I will add patches 
for that.

>Changes to reference.md need some copy-editing.
I tried to check the changes in reference.md and made some changes. However I 
am not a native English speaker so I might not really fix what in your mind 
before. 




was (Author: amaliujia):
Addressed your comments and have two responses to two of the comments:

> Can HOP and TUMBLE share implementation?
I tried to share most of the code and just implemented the windowing part 
(computing window_start and window_end). Later I gave it up cause hopping need 
call one function to return a list of hopping's window_start and window_end, 
and we won't know the size of the list so we cannot really write a for loop in 
Java. (note that I need to build a list of lin4j expressions and you can check 
discussion here: 
[link|https://lists.apache.org/thread.html/86e5aa132de0656419843cab6c1f4fbea5941d4401dbde36cc11827e%40%3Cdev.calcite.apache.org%3E]).

Also considering later I will add per-key sessionazation and bucket_gap_filling 
table functions, they will have even more complicated code to write thus I will 
prefer implement those by the way that  implements hopping (e.g. provide a 
AbstractEnumerable implementation).


>Changes to reference.md need some copy-editing.
I tried to check the changes in reference.md and made some changes. However I 
am not a native English speaker so I might not really fix what in your mind 
before. 



> HOP Table-valued Function
> -
>
> Key: CALCITE-3737
> URL: https://issues.apache.org/jira/browse/CALCITE-3737
> Project: Calcite
>  Issue Type: Sub-task
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Hopping windows place intervals of a fixed size evenly spaced across event 
> time. Most importantly, in the most common use a given event time timestamp 
> will generally fall into more than one window.
> The table-valued function Hop may produce zero, one, or multiple rows 
> corresponding to each row of input.  Hop takes four required parameters and 
> one optional parameter. All parameters are analogous to those for Tumble 
> except for hopsize, which specifies the duration between the starting points 
> (and endpoints) of the hopping windows, allowing for overlapping windows 
> (hopsize < dur, common) or gaps in the data (hopsize > dur, rarely useful).
> {code:java}
> Hop (data , timecol , dur, hopsize)
> {code}
> The return value of Hop is a relation that includes all columns of data as 
> well as additional event time columns wstart and wend. Here is an example 
> (from https://s.apache.org/streaming-beam-sql ):
> {code:sql}
> SELECT *
>   FROM Hop (
> data=> TABLE Bids ,
> timecol => DESCRIPTOR ( bidtime ) ,
> dur => INTERVAL '10' MINUTES ,
> hopsize => INTERVAL '5' MINUTES );
> --
> | wstart | wend | bidtime | price | item |
> --
> | 8:00   | 8:10 | 8:07| $2| A|
> | 8:05   | 8:15 | 8:07| $2| A|
> | 8:05   | 8:15 | 8:11| $3| B|
> | 8:10   | 8:20 | 8:11| $3| B|
> | 8:00   | 8:10 | 8:05| $4| C|
> | 8:05   | 8:15 | 8:05| $4| C|
> | 8:00   | 8:10 | 8:09| $5| D|
> | 8:05   | 8:15 | 8:09| $5