[jira] [Commented] (FLINK-6082) Support window definition for SQL Queries based on WHERE clause with time condition

2021-04-29 Thread Flink Jira Bot (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-6082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17336786#comment-17336786
 ] 

Flink Jira Bot commented on FLINK-6082:
---

This issue was labeled "stale-major" 7 ago and has not received any updates so 
it is being deprioritized. If this ticket is actually Major, please raise the 
priority and ask a committer to assign you the issue or revive the public 
discussion.


> Support window definition for SQL Queries based on WHERE clause with time 
> condition
> ---
>
> Key: FLINK-6082
> URL: https://issues.apache.org/jira/browse/FLINK-6082
> Project: Flink
>  Issue Type: New Feature
>  Components: Table SQL / API
>Reporter: radu
>Priority: Major
>  Labels: stale-major
>
> Time target: Proc Time
> Calcite documentation refers to query examples where the (time)
> boundaries are defined as condition within the WHERE clause. As Flink
> community targets compatibility with Calcite, it makes sense to support
> the definition of windows via this method as well as corresponding
> aggregation on top of them.
> SQL targeted query examples:
> 
> ```SELECT productId, count(\*) FROM stream1 WHERE proctime BETWEEN current\_ 
> timestamp - INTERVAL '1' HOUR AND current\_timestamp```
> General comment:
> 1)  window boundaries are defined as conditions in WHERE clause.
> 2)  For indicating the usage of different stream times, rowtime and
> proctime can be used
> 3)  The boundaries are defined based on special construct provided by
> calcite: current\_timestamp and time operations
> Description:
> 
> The logic of this operator is strictly related to supporting aggregates
> over sliding windows defined with OVER
> ([FLINK-5653](https://issues.apache.org/jira/browse/FLINK-5653),
> [FLINK-5654](https://issues.apache.org/jira/browse/FLINK-5654),
> [FLINK-5655](https://issues.apache.org/jira/browse/FLINK-5655),
> [FLINK-5658](https://issues.apache.org/jira/browse/FLINK-5658),
> [FLINK-5656](https://issues.apache.org/jira/browse/FLINK-5656)). In this
> issue the design considered queries where the window is defined with the
> syntax of OVER clause and aggregates are applied over this period. This
> is similar in behavior with the only exception that the window
> boundaries are defined with respect to the WHERE conditions. Besides
> this the logic and the types of aggregates to be supported should be the
> same (sum, count, avg, min, max). Supporting these types of query is
> related to the pie chart problem tackled by calcite.
> Similar as for the OVER windows, the construct should build rolling
> windows (i.e., windows that are triggered and move with every incoming
> event).
> Functionality example
> -
> We exemplify below the functionality of the IN/Exists when working with
> streams.
> `SELECT a, count( * ) FROM stream1 WHERE proctime BETWEEN current_ timestamp 
> - INTERVAL '1' HOUR AND current_timestamp;`
> ||IngestionTime(Event)||  Stream1||   Output||
> |10:00:01 |Id1,10 |Id1,1|
> |10:02:00 |Id2,2  |Id2,2|
> |11:25:00 |Id3,2  |Id3,1|
> |12:03:00 |Id4,15 |Id4,2|
> |12:05:00 |Id5,11 |Id5,3|
> |12:56:00 |Id6,20 |Id6,3|
> |...|
> Implementation option
> -
> Considering that the query follows the same functionality as for the
> aggregates over window, the implementation should follow the same
> implementation as for the OVER clause. Considering that the WHERE
> condition are typically related to timing, this means that in case of
> one unbound boundary the
> [FLINK-5658](https://issues.apache.org/jira/browse/FLINK-5658) should be
> used, while for bounded time windows the
> [FLINK-5654](https://issues.apache.org/jira/browse/FLINK-5654) design
> should be used.
> The window boundaries will be extracted from the WHERE condition.
> The rule will not be mapped anymore to a LogicalWindow, which means that
> the conversion to this would need to happen from the current
> DataStreamCalc rule. In this sense, a dedicated condition will be added
> such that in case the WHERE clause has time conditions, the operator
> implementation of the Over clause (used in the previous issues) should
> be used.
> ```
> class DataStreamCalcRule
>   
> ---
>   {
>   --- 
> ---
>   
>   def convert(rel: RelNode): RelNode = {
>   val calc: LogicalCalc = rel.asInstanceOf\[LogicalCalc\]
>   val traitSet: RelTraitSet = 
> rel.getTraitSet.replace(DataStreamConvention.INSTANCE)
>   val convInput: RelNode = RelOptRule.convert(calc.getInput, 
> 

[jira] [Commented] (FLINK-6082) Support window definition for SQL Queries based on WHERE clause with time condition

2021-04-22 Thread Flink Jira Bot (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-6082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17328812#comment-17328812
 ] 

Flink Jira Bot commented on FLINK-6082:
---

This major issue is unassigned and itself and all of its Sub-Tasks have not 
been updated for 30 days. So, it has been labeled "stale-major". If this ticket 
is indeed "major", please either assign yourself or give an update. Afterwards, 
please remove the label. In 7 days the issue will be deprioritized.

> Support window definition for SQL Queries based on WHERE clause with time 
> condition
> ---
>
> Key: FLINK-6082
> URL: https://issues.apache.org/jira/browse/FLINK-6082
> Project: Flink
>  Issue Type: New Feature
>  Components: Table SQL / API
>Reporter: radu
>Priority: Major
>  Labels: stale-major
>
> Time target: Proc Time
> Calcite documentation refers to query examples where the (time)
> boundaries are defined as condition within the WHERE clause. As Flink
> community targets compatibility with Calcite, it makes sense to support
> the definition of windows via this method as well as corresponding
> aggregation on top of them.
> SQL targeted query examples:
> 
> ```SELECT productId, count(\*) FROM stream1 WHERE proctime BETWEEN current\_ 
> timestamp - INTERVAL '1' HOUR AND current\_timestamp```
> General comment:
> 1)  window boundaries are defined as conditions in WHERE clause.
> 2)  For indicating the usage of different stream times, rowtime and
> proctime can be used
> 3)  The boundaries are defined based on special construct provided by
> calcite: current\_timestamp and time operations
> Description:
> 
> The logic of this operator is strictly related to supporting aggregates
> over sliding windows defined with OVER
> ([FLINK-5653](https://issues.apache.org/jira/browse/FLINK-5653),
> [FLINK-5654](https://issues.apache.org/jira/browse/FLINK-5654),
> [FLINK-5655](https://issues.apache.org/jira/browse/FLINK-5655),
> [FLINK-5658](https://issues.apache.org/jira/browse/FLINK-5658),
> [FLINK-5656](https://issues.apache.org/jira/browse/FLINK-5656)). In this
> issue the design considered queries where the window is defined with the
> syntax of OVER clause and aggregates are applied over this period. This
> is similar in behavior with the only exception that the window
> boundaries are defined with respect to the WHERE conditions. Besides
> this the logic and the types of aggregates to be supported should be the
> same (sum, count, avg, min, max). Supporting these types of query is
> related to the pie chart problem tackled by calcite.
> Similar as for the OVER windows, the construct should build rolling
> windows (i.e., windows that are triggered and move with every incoming
> event).
> Functionality example
> -
> We exemplify below the functionality of the IN/Exists when working with
> streams.
> `SELECT a, count( * ) FROM stream1 WHERE proctime BETWEEN current_ timestamp 
> - INTERVAL '1' HOUR AND current_timestamp;`
> ||IngestionTime(Event)||  Stream1||   Output||
> |10:00:01 |Id1,10 |Id1,1|
> |10:02:00 |Id2,2  |Id2,2|
> |11:25:00 |Id3,2  |Id3,1|
> |12:03:00 |Id4,15 |Id4,2|
> |12:05:00 |Id5,11 |Id5,3|
> |12:56:00 |Id6,20 |Id6,3|
> |...|
> Implementation option
> -
> Considering that the query follows the same functionality as for the
> aggregates over window, the implementation should follow the same
> implementation as for the OVER clause. Considering that the WHERE
> condition are typically related to timing, this means that in case of
> one unbound boundary the
> [FLINK-5658](https://issues.apache.org/jira/browse/FLINK-5658) should be
> used, while for bounded time windows the
> [FLINK-5654](https://issues.apache.org/jira/browse/FLINK-5654) design
> should be used.
> The window boundaries will be extracted from the WHERE condition.
> The rule will not be mapped anymore to a LogicalWindow, which means that
> the conversion to this would need to happen from the current
> DataStreamCalc rule. In this sense, a dedicated condition will be added
> such that in case the WHERE clause has time conditions, the operator
> implementation of the Over clause (used in the previous issues) should
> be used.
> ```
> class DataStreamCalcRule
>   
> ---
>   {
>   --- 
> ---
>   
>   def convert(rel: RelNode): RelNode = {
>   val calc: LogicalCalc = rel.asInstanceOf\[LogicalCalc\]
>   val traitSet: RelTraitSet = 
> rel.getTraitSet.replace(DataStreamConvention.INSTANCE)
>   val convInput: 

[jira] [Commented] (FLINK-6082) Support window definition for SQL Queries based on WHERE clause with time condition

2017-03-30 Thread radu (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15949878#comment-15949878
 ] 

radu commented on FLINK-6082:
-

[~fhueske] [~shijinkui] [~Yuhong_kyo] [~sunjincheng121] [~twalthr] 
[~stefano.bortoli]

The logic of the WHERE condition based on time would need to be implemented 
based on proccessingFunction (even if the same DataStreamCalcRule) would be 
used to identify this case. 

The key points would be that:
1) the accumulation of the data to be outputted would need to be treated just 
like any other aggregation. Therefore to support it a dedicated aggregation 
would be implemented to support retraction and accumulation of data. This would 
be used within the processingFunction to keep the data under focus
2) As retraction would also imply a signalling upstream, the key target is to 
consider this within the retraction branch of flink.

> Support window definition for SQL Queries based on WHERE clause with time 
> condition
> ---
>
> Key: FLINK-6082
> URL: https://issues.apache.org/jira/browse/FLINK-6082
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API & SQL
>Reporter: radu
>
> Time target: Proc Time
> Calcite documentation refers to query examples where the (time)
> boundaries are defined as condition within the WHERE clause. As Flink
> community targets compatibility with Calcite, it makes sense to support
> the definition of windows via this method as well as corresponding
> aggregation on top of them.
> SQL targeted query examples:
> 
> ```SELECT productId, count(\*) FROM stream1 WHERE proctime BETWEEN current\_ 
> timestamp - INTERVAL '1' HOUR AND current\_timestamp```
> General comment:
> 1)  window boundaries are defined as conditions in WHERE clause.
> 2)  For indicating the usage of different stream times, rowtime and
> proctime can be used
> 3)  The boundaries are defined based on special construct provided by
> calcite: current\_timestamp and time operations
> Description:
> 
> The logic of this operator is strictly related to supporting aggregates
> over sliding windows defined with OVER
> ([FLINK-5653](https://issues.apache.org/jira/browse/FLINK-5653),
> [FLINK-5654](https://issues.apache.org/jira/browse/FLINK-5654),
> [FLINK-5655](https://issues.apache.org/jira/browse/FLINK-5655),
> [FLINK-5658](https://issues.apache.org/jira/browse/FLINK-5658),
> [FLINK-5656](https://issues.apache.org/jira/browse/FLINK-5656)). In this
> issue the design considered queries where the window is defined with the
> syntax of OVER clause and aggregates are applied over this period. This
> is similar in behavior with the only exception that the window
> boundaries are defined with respect to the WHERE conditions. Besides
> this the logic and the types of aggregates to be supported should be the
> same (sum, count, avg, min, max). Supporting these types of query is
> related to the pie chart problem tackled by calcite.
> Similar as for the OVER windows, the construct should build rolling
> windows (i.e., windows that are triggered and move with every incoming
> event).
> Functionality example
> -
> We exemplify below the functionality of the IN/Exists when working with
> streams.
> `SELECT a, count( * ) FROM stream1 WHERE proctime BETWEEN current_ timestamp 
> - INTERVAL '1' HOUR AND current_timestamp;`
> ||IngestionTime(Event)||  Stream1||   Output||
> |10:00:01 |Id1,10 |Id1,1|
> |10:02:00 |Id2,2  |Id2,2|
> |11:25:00 |Id3,2  |Id3,1|
> |12:03:00 |Id4,15 |Id4,2|
> |12:05:00 |Id5,11 |Id5,3|
> |12:56:00 |Id6,20 |Id6,3|
> |...|
> Implementation option
> -
> Considering that the query follows the same functionality as for the
> aggregates over window, the implementation should follow the same
> implementation as for the OVER clause. Considering that the WHERE
> condition are typically related to timing, this means that in case of
> one unbound boundary the
> [FLINK-5658](https://issues.apache.org/jira/browse/FLINK-5658) should be
> used, while for bounded time windows the
> [FLINK-5654](https://issues.apache.org/jira/browse/FLINK-5654) design
> should be used.
> The window boundaries will be extracted from the WHERE condition.
> The rule will not be mapped anymore to a LogicalWindow, which means that
> the conversion to this would need to happen from the current
> DataStreamCalc rule. In this sense, a dedicated condition will be added
> such that in case the WHERE clause has time conditions, the operator
> implementation of the Over clause (used in the previous issues) should
> be used.
> ```
> class DataStreamCalcRule
>   
> ---
>   {

[jira] [Commented] (FLINK-6082) Support window definition for SQL Queries based on WHERE clause with time condition

2017-03-30 Thread Fabian Hueske (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15948874#comment-15948874
 ] 

Fabian Hueske commented on FLINK-6082:
--

Hi [~Yuhong_kyo], 

yes, join predicates (where both sides of the condition relate to tables) are a 
different story than local predicates.
A query like the one you defined above should be able to be executed.
The semantics would be to do a join between {{Orders}} and {{Shipments}} where 
the join conditions evaluate to true. The time predicate would join each order 
row with all shipment rows that arrive at most 1 hour later.

I think this can be implemented once we have support for the time indicators 
(FLINK-5884).

> Support window definition for SQL Queries based on WHERE clause with time 
> condition
> ---
>
> Key: FLINK-6082
> URL: https://issues.apache.org/jira/browse/FLINK-6082
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API & SQL
>Reporter: radu
>
> Time target: Proc Time
> Calcite documentation refers to query examples where the (time)
> boundaries are defined as condition within the WHERE clause. As Flink
> community targets compatibility with Calcite, it makes sense to support
> the definition of windows via this method as well as corresponding
> aggregation on top of them.
> SQL targeted query examples:
> 
> ```SELECT productId, count(\*) FROM stream1 WHERE proctime BETWEEN current\_ 
> timestamp - INTERVAL '1' HOUR AND current\_timestamp```
> General comment:
> 1)  window boundaries are defined as conditions in WHERE clause.
> 2)  For indicating the usage of different stream times, rowtime and
> proctime can be used
> 3)  The boundaries are defined based on special construct provided by
> calcite: current\_timestamp and time operations
> Description:
> 
> The logic of this operator is strictly related to supporting aggregates
> over sliding windows defined with OVER
> ([FLINK-5653](https://issues.apache.org/jira/browse/FLINK-5653),
> [FLINK-5654](https://issues.apache.org/jira/browse/FLINK-5654),
> [FLINK-5655](https://issues.apache.org/jira/browse/FLINK-5655),
> [FLINK-5658](https://issues.apache.org/jira/browse/FLINK-5658),
> [FLINK-5656](https://issues.apache.org/jira/browse/FLINK-5656)). In this
> issue the design considered queries where the window is defined with the
> syntax of OVER clause and aggregates are applied over this period. This
> is similar in behavior with the only exception that the window
> boundaries are defined with respect to the WHERE conditions. Besides
> this the logic and the types of aggregates to be supported should be the
> same (sum, count, avg, min, max). Supporting these types of query is
> related to the pie chart problem tackled by calcite.
> Similar as for the OVER windows, the construct should build rolling
> windows (i.e., windows that are triggered and move with every incoming
> event).
> Functionality example
> -
> We exemplify below the functionality of the IN/Exists when working with
> streams.
> `SELECT a, count( * ) FROM stream1 WHERE proctime BETWEEN current_ timestamp 
> - INTERVAL '1' HOUR AND current_timestamp;`
> ||IngestionTime(Event)||  Stream1||   Output||
> |10:00:01 |Id1,10 |Id1,1|
> |10:02:00 |Id2,2  |Id2,2|
> |11:25:00 |Id3,2  |Id3,1|
> |12:03:00 |Id4,15 |Id4,2|
> |12:05:00 |Id5,11 |Id5,3|
> |12:56:00 |Id6,20 |Id6,3|
> |...|
> Implementation option
> -
> Considering that the query follows the same functionality as for the
> aggregates over window, the implementation should follow the same
> implementation as for the OVER clause. Considering that the WHERE
> condition are typically related to timing, this means that in case of
> one unbound boundary the
> [FLINK-5658](https://issues.apache.org/jira/browse/FLINK-5658) should be
> used, while for bounded time windows the
> [FLINK-5654](https://issues.apache.org/jira/browse/FLINK-5654) design
> should be used.
> The window boundaries will be extracted from the WHERE condition.
> The rule will not be mapped anymore to a LogicalWindow, which means that
> the conversion to this would need to happen from the current
> DataStreamCalc rule. In this sense, a dedicated condition will be added
> such that in case the WHERE clause has time conditions, the operator
> implementation of the Over clause (used in the previous issues) should
> be used.
> ```
> class DataStreamCalcRule
>   
> ---
>   {
>   --- 
> ---
>   
>   def convert(rel: RelNode): RelNode = {
>   val calc: LogicalCalc = 

[jira] [Commented] (FLINK-6082) Support window definition for SQL Queries based on WHERE clause with time condition

2017-03-28 Thread hongyuhong (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15946417#comment-15946417
 ] 

hongyuhong commented on FLINK-6082:
---

Hi [~fhueske], i think what radu means about using time-condition to define 
windows is mainly the situation like:
{code}
SELECT STREAM o.rowtime, o.productId, o.orderId, s.proctime AS shipTime 
FROM Orders AS o 
JOIN Shipments AS s 
ON o.orderId = s.orderId 
AND o.proctime BETWEEN s.proctime AND s.proctime + INTERVAL '1' HOUR; 
{code}
Since we can not hold all records of Orders and shipments, we may need an 
window(just logical concept) to hold the latest records.

> Support window definition for SQL Queries based on WHERE clause with time 
> condition
> ---
>
> Key: FLINK-6082
> URL: https://issues.apache.org/jira/browse/FLINK-6082
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table API & SQL
>Reporter: radu
>
> Time target: Proc Time
> Calcite documentation refers to query examples where the (time)
> boundaries are defined as condition within the WHERE clause. As Flink
> community targets compatibility with Calcite, it makes sense to support
> the definition of windows via this method as well as corresponding
> aggregation on top of them.
> SQL targeted query examples:
> 
> ```SELECT productId, count(\*) FROM stream1 WHERE proctime BETWEEN current\_ 
> timestamp - INTERVAL '1' HOUR AND current\_timestamp```
> General comment:
> 1)  window boundaries are defined as conditions in WHERE clause.
> 2)  For indicating the usage of different stream times, rowtime and
> proctime can be used
> 3)  The boundaries are defined based on special construct provided by
> calcite: current\_timestamp and time operations
> Description:
> 
> The logic of this operator is strictly related to supporting aggregates
> over sliding windows defined with OVER
> ([FLINK-5653](https://issues.apache.org/jira/browse/FLINK-5653),
> [FLINK-5654](https://issues.apache.org/jira/browse/FLINK-5654),
> [FLINK-5655](https://issues.apache.org/jira/browse/FLINK-5655),
> [FLINK-5658](https://issues.apache.org/jira/browse/FLINK-5658),
> [FLINK-5656](https://issues.apache.org/jira/browse/FLINK-5656)). In this
> issue the design considered queries where the window is defined with the
> syntax of OVER clause and aggregates are applied over this period. This
> is similar in behavior with the only exception that the window
> boundaries are defined with respect to the WHERE conditions. Besides
> this the logic and the types of aggregates to be supported should be the
> same (sum, count, avg, min, max). Supporting these types of query is
> related to the pie chart problem tackled by calcite.
> Similar as for the OVER windows, the construct should build rolling
> windows (i.e., windows that are triggered and move with every incoming
> event).
> Functionality example
> -
> We exemplify below the functionality of the IN/Exists when working with
> streams.
> `SELECT a, count( * ) FROM stream1 WHERE proctime BETWEEN current_ timestamp 
> - INTERVAL '1' HOUR AND current_timestamp;`
> ||IngestionTime(Event)||  Stream1||   Output||
> |10:00:01 |Id1,10 |Id1,1|
> |10:02:00 |Id2,2  |Id2,2|
> |11:25:00 |Id3,2  |Id3,1|
> |12:03:00 |Id4,15 |Id4,2|
> |12:05:00 |Id5,11 |Id5,3|
> |12:56:00 |Id6,20 |Id6,3|
> |...|
> Implementation option
> -
> Considering that the query follows the same functionality as for the
> aggregates over window, the implementation should follow the same
> implementation as for the OVER clause. Considering that the WHERE
> condition are typically related to timing, this means that in case of
> one unbound boundary the
> [FLINK-5658](https://issues.apache.org/jira/browse/FLINK-5658) should be
> used, while for bounded time windows the
> [FLINK-5654](https://issues.apache.org/jira/browse/FLINK-5654) design
> should be used.
> The window boundaries will be extracted from the WHERE condition.
> The rule will not be mapped anymore to a LogicalWindow, which means that
> the conversion to this would need to happen from the current
> DataStreamCalc rule. In this sense, a dedicated condition will be added
> such that in case the WHERE clause has time conditions, the operator
> implementation of the Over clause (used in the previous issues) should
> be used.
> ```
> class DataStreamCalcRule
>   
> ---
>   {
>   --- 
> ---
>   
>   def convert(rel: RelNode): RelNode = {
>   val calc: LogicalCalc = rel.asInstanceOf\[LogicalCalc\]
>   val traitSet: RelTraitSet = 

[jira] [Commented] (FLINK-6082) Support window definition for SQL Queries based on WHERE clause with time condition

2017-03-28 Thread radu (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15945151#comment-15945151
 ] 

radu commented on FLINK-6082:
-

[~fhueske] - Fine for me for the time condition to select the subset of the 
stream. But for me this is mapped to a window  :)...as if the condition is 
between current time and 1 hour before...that is a window. Nevertheless. Coming 
back to finding the right query semantic, what do you think about supporting 
the one you proposed:

SELECT data FROM stream1 WHERE proctime BETWEEN current_ timestamp - INTERVAL 
'1' MINUTE AND current_timestamp
..with the behavior you showed before. I am fine with that.

> Support window definition for SQL Queries based on WHERE clause with time 
> condition
> ---
>
> Key: FLINK-6082
> URL: https://issues.apache.org/jira/browse/FLINK-6082
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table API & SQL
>Reporter: radu
>
> Time target: Proc Time
> Calcite documentation refers to query examples where the (time)
> boundaries are defined as condition within the WHERE clause. As Flink
> community targets compatibility with Calcite, it makes sense to support
> the definition of windows via this method as well as corresponding
> aggregation on top of them.
> SQL targeted query examples:
> 
> ```SELECT productId, count(\*) FROM stream1 WHERE proctime BETWEEN current\_ 
> timestamp - INTERVAL '1' HOUR AND current\_timestamp```
> General comment:
> 1)  window boundaries are defined as conditions in WHERE clause.
> 2)  For indicating the usage of different stream times, rowtime and
> proctime can be used
> 3)  The boundaries are defined based on special construct provided by
> calcite: current\_timestamp and time operations
> Description:
> 
> The logic of this operator is strictly related to supporting aggregates
> over sliding windows defined with OVER
> ([FLINK-5653](https://issues.apache.org/jira/browse/FLINK-5653),
> [FLINK-5654](https://issues.apache.org/jira/browse/FLINK-5654),
> [FLINK-5655](https://issues.apache.org/jira/browse/FLINK-5655),
> [FLINK-5658](https://issues.apache.org/jira/browse/FLINK-5658),
> [FLINK-5656](https://issues.apache.org/jira/browse/FLINK-5656)). In this
> issue the design considered queries where the window is defined with the
> syntax of OVER clause and aggregates are applied over this period. This
> is similar in behavior with the only exception that the window
> boundaries are defined with respect to the WHERE conditions. Besides
> this the logic and the types of aggregates to be supported should be the
> same (sum, count, avg, min, max). Supporting these types of query is
> related to the pie chart problem tackled by calcite.
> Similar as for the OVER windows, the construct should build rolling
> windows (i.e., windows that are triggered and move with every incoming
> event).
> Functionality example
> -
> We exemplify below the functionality of the IN/Exists when working with
> streams.
> `SELECT a, count( * ) FROM stream1 WHERE proctime BETWEEN current_ timestamp 
> - INTERVAL '1' HOUR AND current_timestamp;`
> ||IngestionTime(Event)||  Stream1||   Output||
> |10:00:01 |Id1,10 |Id1,1|
> |10:02:00 |Id2,2  |Id2,2|
> |11:25:00 |Id3,2  |Id3,1|
> |12:03:00 |Id4,15 |Id4,2|
> |12:05:00 |Id5,11 |Id5,3|
> |12:56:00 |Id6,20 |Id6,3|
> |...|
> Implementation option
> -
> Considering that the query follows the same functionality as for the
> aggregates over window, the implementation should follow the same
> implementation as for the OVER clause. Considering that the WHERE
> condition are typically related to timing, this means that in case of
> one unbound boundary the
> [FLINK-5658](https://issues.apache.org/jira/browse/FLINK-5658) should be
> used, while for bounded time windows the
> [FLINK-5654](https://issues.apache.org/jira/browse/FLINK-5654) design
> should be used.
> The window boundaries will be extracted from the WHERE condition.
> The rule will not be mapped anymore to a LogicalWindow, which means that
> the conversion to this would need to happen from the current
> DataStreamCalc rule. In this sense, a dedicated condition will be added
> such that in case the WHERE clause has time conditions, the operator
> implementation of the Over clause (used in the previous issues) should
> be used.
> ```
> class DataStreamCalcRule
>   
> ---
>   {
>   --- 
> ---
>   
>   def convert(rel: RelNode): RelNode = {
>   val calc: LogicalCalc = rel.asInstanceOf\[LogicalCalc\]
>   val 

[jira] [Commented] (FLINK-6082) Support window definition for SQL Queries based on WHERE clause with time condition

2017-03-28 Thread radu (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15945136#comment-15945136
 ] 

radu commented on FLINK-6082:
-

I add also [~Yuhong_kyo] as she works on the join design

> Support window definition for SQL Queries based on WHERE clause with time 
> condition
> ---
>
> Key: FLINK-6082
> URL: https://issues.apache.org/jira/browse/FLINK-6082
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table API & SQL
>Reporter: radu
>
> Time target: Proc Time
> Calcite documentation refers to query examples where the (time)
> boundaries are defined as condition within the WHERE clause. As Flink
> community targets compatibility with Calcite, it makes sense to support
> the definition of windows via this method as well as corresponding
> aggregation on top of them.
> SQL targeted query examples:
> 
> ```SELECT productId, count(\*) FROM stream1 WHERE proctime BETWEEN current\_ 
> timestamp - INTERVAL '1' HOUR AND current\_timestamp```
> General comment:
> 1)  window boundaries are defined as conditions in WHERE clause.
> 2)  For indicating the usage of different stream times, rowtime and
> proctime can be used
> 3)  The boundaries are defined based on special construct provided by
> calcite: current\_timestamp and time operations
> Description:
> 
> The logic of this operator is strictly related to supporting aggregates
> over sliding windows defined with OVER
> ([FLINK-5653](https://issues.apache.org/jira/browse/FLINK-5653),
> [FLINK-5654](https://issues.apache.org/jira/browse/FLINK-5654),
> [FLINK-5655](https://issues.apache.org/jira/browse/FLINK-5655),
> [FLINK-5658](https://issues.apache.org/jira/browse/FLINK-5658),
> [FLINK-5656](https://issues.apache.org/jira/browse/FLINK-5656)). In this
> issue the design considered queries where the window is defined with the
> syntax of OVER clause and aggregates are applied over this period. This
> is similar in behavior with the only exception that the window
> boundaries are defined with respect to the WHERE conditions. Besides
> this the logic and the types of aggregates to be supported should be the
> same (sum, count, avg, min, max). Supporting these types of query is
> related to the pie chart problem tackled by calcite.
> Similar as for the OVER windows, the construct should build rolling
> windows (i.e., windows that are triggered and move with every incoming
> event).
> Functionality example
> -
> We exemplify below the functionality of the IN/Exists when working with
> streams.
> `SELECT a, count( * ) FROM stream1 WHERE proctime BETWEEN current_ timestamp 
> - INTERVAL '1' HOUR AND current_timestamp;`
> ||IngestionTime(Event)||  Stream1||   Output||
> |10:00:01 |Id1,10 |Id1,1|
> |10:02:00 |Id2,2  |Id2,2|
> |11:25:00 |Id3,2  |Id3,1|
> |12:03:00 |Id4,15 |Id4,2|
> |12:05:00 |Id5,11 |Id5,3|
> |12:56:00 |Id6,20 |Id6,3|
> |...|
> Implementation option
> -
> Considering that the query follows the same functionality as for the
> aggregates over window, the implementation should follow the same
> implementation as for the OVER clause. Considering that the WHERE
> condition are typically related to timing, this means that in case of
> one unbound boundary the
> [FLINK-5658](https://issues.apache.org/jira/browse/FLINK-5658) should be
> used, while for bounded time windows the
> [FLINK-5654](https://issues.apache.org/jira/browse/FLINK-5654) design
> should be used.
> The window boundaries will be extracted from the WHERE condition.
> The rule will not be mapped anymore to a LogicalWindow, which means that
> the conversion to this would need to happen from the current
> DataStreamCalc rule. In this sense, a dedicated condition will be added
> such that in case the WHERE clause has time conditions, the operator
> implementation of the Over clause (used in the previous issues) should
> be used.
> ```
> class DataStreamCalcRule
>   
> ---
>   {
>   --- 
> ---
>   
>   def convert(rel: RelNode): RelNode = {
>   val calc: LogicalCalc = rel.asInstanceOf\[LogicalCalc\]
>   val traitSet: RelTraitSet = 
> rel.getTraitSet.replace(DataStreamConvention.INSTANCE)
>   val convInput: RelNode = RelOptRule.convert(calc.getInput, 
> DataStreamConvention.INSTANCE)
>   
>   IF(WHERE contains TIME limits)
>   
>   {
>   
>  >   IF(bounded)
> >   
> >   new DataStreamProcTimeTimeAggregate
> >   
> >   ELSE
> >   
> >   new DataStreamSlideEventTimeRowAgg
> >   
> >   }
> >  
>   

[jira] [Commented] (FLINK-6082) Support window definition for SQL Queries based on WHERE clause with time condition

2017-03-28 Thread Fabian Hueske (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15945036#comment-15945036
 ] 

Fabian Hueske commented on FLINK-6082:
--

I think time-based local predicates (i.e., checking against a literal or 
function) cannot be used to define windows. Instead they select a subset of the 
stream (usually the tail).

In order to support windowed joins, we should start by specifying two streams 
and an expected result.
Based on these tables (input and output), we should specify a batch query that 
produces the output table given the input tables.
The resulting query will be based on time-based join predicates (both sides of 
the condition refer to the time attribute of one of both input stream).

> Support window definition for SQL Queries based on WHERE clause with time 
> condition
> ---
>
> Key: FLINK-6082
> URL: https://issues.apache.org/jira/browse/FLINK-6082
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table API & SQL
>Reporter: radu
>
> Time target: Proc Time
> Calcite documentation refers to query examples where the (time)
> boundaries are defined as condition within the WHERE clause. As Flink
> community targets compatibility with Calcite, it makes sense to support
> the definition of windows via this method as well as corresponding
> aggregation on top of them.
> SQL targeted query examples:
> 
> ```SELECT productId, count(\*) FROM stream1 WHERE proctime BETWEEN current\_ 
> timestamp - INTERVAL '1' HOUR AND current\_timestamp```
> General comment:
> 1)  window boundaries are defined as conditions in WHERE clause.
> 2)  For indicating the usage of different stream times, rowtime and
> proctime can be used
> 3)  The boundaries are defined based on special construct provided by
> calcite: current\_timestamp and time operations
> Description:
> 
> The logic of this operator is strictly related to supporting aggregates
> over sliding windows defined with OVER
> ([FLINK-5653](https://issues.apache.org/jira/browse/FLINK-5653),
> [FLINK-5654](https://issues.apache.org/jira/browse/FLINK-5654),
> [FLINK-5655](https://issues.apache.org/jira/browse/FLINK-5655),
> [FLINK-5658](https://issues.apache.org/jira/browse/FLINK-5658),
> [FLINK-5656](https://issues.apache.org/jira/browse/FLINK-5656)). In this
> issue the design considered queries where the window is defined with the
> syntax of OVER clause and aggregates are applied over this period. This
> is similar in behavior with the only exception that the window
> boundaries are defined with respect to the WHERE conditions. Besides
> this the logic and the types of aggregates to be supported should be the
> same (sum, count, avg, min, max). Supporting these types of query is
> related to the pie chart problem tackled by calcite.
> Similar as for the OVER windows, the construct should build rolling
> windows (i.e., windows that are triggered and move with every incoming
> event).
> Functionality example
> -
> We exemplify below the functionality of the IN/Exists when working with
> streams.
> `SELECT a, count( * ) FROM stream1 WHERE proctime BETWEEN current_ timestamp 
> - INTERVAL '1' HOUR AND current_timestamp;`
> ||IngestionTime(Event)||  Stream1||   Output||
> |10:00:01 |Id1,10 |Id1,1|
> |10:02:00 |Id2,2  |Id2,2|
> |11:25:00 |Id3,2  |Id3,1|
> |12:03:00 |Id4,15 |Id4,2|
> |12:05:00 |Id5,11 |Id5,3|
> |12:56:00 |Id6,20 |Id6,3|
> |...|
> Implementation option
> -
> Considering that the query follows the same functionality as for the
> aggregates over window, the implementation should follow the same
> implementation as for the OVER clause. Considering that the WHERE
> condition are typically related to timing, this means that in case of
> one unbound boundary the
> [FLINK-5658](https://issues.apache.org/jira/browse/FLINK-5658) should be
> used, while for bounded time windows the
> [FLINK-5654](https://issues.apache.org/jira/browse/FLINK-5654) design
> should be used.
> The window boundaries will be extracted from the WHERE condition.
> The rule will not be mapped anymore to a LogicalWindow, which means that
> the conversion to this would need to happen from the current
> DataStreamCalc rule. In this sense, a dedicated condition will be added
> such that in case the WHERE clause has time conditions, the operator
> implementation of the Over clause (used in the previous issues) should
> be used.
> ```
> class DataStreamCalcRule
>   
> ---
>   {
>   --- 
> ---
>   
>   def convert(rel: RelNode): RelNode 

[jira] [Commented] (FLINK-6082) Support window definition for SQL Queries based on WHERE clause with time condition

2017-03-28 Thread radu (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15944957#comment-15944957
 ] 

radu commented on FLINK-6082:
-

Hi [~fhueske] thanks for the review. I agree with the modification proposal to 
take a out of the example query not to run in issues (or to go into query 
parsing modifications ). 
The key goal of supporting this is that once we would have the possibility to 
define the window also as time condition we can reuse this also in other cases 
- such as JOIN conditions (i.e extracting the window boundaries for joins).
Hence, with the modification you mentioned 1) considering the elements 
non-grouped, and supporting retraction for the elements/aggregates - is this ok 
from your side?

> Support window definition for SQL Queries based on WHERE clause with time 
> condition
> ---
>
> Key: FLINK-6082
> URL: https://issues.apache.org/jira/browse/FLINK-6082
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table API & SQL
>Reporter: radu
>
> Time target: Proc Time
> Calcite documentation refers to query examples where the (time)
> boundaries are defined as condition within the WHERE clause. As Flink
> community targets compatibility with Calcite, it makes sense to support
> the definition of windows via this method as well as corresponding
> aggregation on top of them.
> SQL targeted query examples:
> 
> ```SELECT productId, count(\*) FROM stream1 WHERE proctime BETWEEN current\_ 
> timestamp - INTERVAL '1' HOUR AND current\_timestamp```
> General comment:
> 1)  window boundaries are defined as conditions in WHERE clause.
> 2)  For indicating the usage of different stream times, rowtime and
> proctime can be used
> 3)  The boundaries are defined based on special construct provided by
> calcite: current\_timestamp and time operations
> Description:
> 
> The logic of this operator is strictly related to supporting aggregates
> over sliding windows defined with OVER
> ([FLINK-5653](https://issues.apache.org/jira/browse/FLINK-5653),
> [FLINK-5654](https://issues.apache.org/jira/browse/FLINK-5654),
> [FLINK-5655](https://issues.apache.org/jira/browse/FLINK-5655),
> [FLINK-5658](https://issues.apache.org/jira/browse/FLINK-5658),
> [FLINK-5656](https://issues.apache.org/jira/browse/FLINK-5656)). In this
> issue the design considered queries where the window is defined with the
> syntax of OVER clause and aggregates are applied over this period. This
> is similar in behavior with the only exception that the window
> boundaries are defined with respect to the WHERE conditions. Besides
> this the logic and the types of aggregates to be supported should be the
> same (sum, count, avg, min, max). Supporting these types of query is
> related to the pie chart problem tackled by calcite.
> Similar as for the OVER windows, the construct should build rolling
> windows (i.e., windows that are triggered and move with every incoming
> event).
> Functionality example
> -
> We exemplify below the functionality of the IN/Exists when working with
> streams.
> `SELECT a, count( * ) FROM stream1 WHERE proctime BETWEEN current_ timestamp 
> - INTERVAL '1' HOUR AND current_timestamp;`
> ||IngestionTime(Event)||  Stream1||   Output||
> |10:00:01 |Id1,10 |Id1,1|
> |10:02:00 |Id2,2  |Id2,2|
> |11:25:00 |Id3,2  |Id3,1|
> |12:03:00 |Id4,15 |Id4,2|
> |12:05:00 |Id5,11 |Id5,3|
> |12:56:00 |Id6,20 |Id6,3|
> |...|
> Implementation option
> -
> Considering that the query follows the same functionality as for the
> aggregates over window, the implementation should follow the same
> implementation as for the OVER clause. Considering that the WHERE
> condition are typically related to timing, this means that in case of
> one unbound boundary the
> [FLINK-5658](https://issues.apache.org/jira/browse/FLINK-5658) should be
> used, while for bounded time windows the
> [FLINK-5654](https://issues.apache.org/jira/browse/FLINK-5654) design
> should be used.
> The window boundaries will be extracted from the WHERE condition.
> The rule will not be mapped anymore to a LogicalWindow, which means that
> the conversion to this would need to happen from the current
> DataStreamCalc rule. In this sense, a dedicated condition will be added
> such that in case the WHERE clause has time conditions, the operator
> implementation of the Over clause (used in the previous issues) should
> be used.
> ```
> class DataStreamCalcRule
>   
> ---
>   {
>   --- 
> ---
>   
>   def convert(rel: RelNode): RelNode = 

[jira] [Commented] (FLINK-6082) Support window definition for SQL Queries based on WHERE clause with time condition

2017-03-27 Thread Fabian Hueske (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15943257#comment-15943257
 ] 

Fabian Hueske commented on FLINK-6082:
--

I think the semantics of the query in the example are not expected.

If you would try to execute the query 

{{SELECT a, count( * ) FROM stream1 WHERE proctime BETWEEN current_ timestamp - 
INTERVAL '1' HOUR AND current_timestamp}}

on a regular database, you would get an error because, {{a}} is not part of a 
{{GROUP BY}} clause.

IMO, a time-based predicate is not related to aggregations and should not be a 
shortcut for OVER windows. 
Instead, the predicate {{proctime BETWEEN current_ timestamp - INTERVAL '1' 
HOUR AND current_timestamp}} will ensure that the result of the query only 
contains results of the last hour. If you run the following query:

{{SELECT data FROM stream1 WHERE proctime BETWEEN current_ timestamp - INTERVAL 
'1' MINUTE AND current_timestamp}}

on this table 

|| time || data ||
| 12:00:00 | data1 |
| 12:00:20 | data2 |
| 12:00:30 | data3 |
| 12:01:10 | data4 |
| 12:01:40 | data5 |

The result at time 12:00:50 should be {{data1, data2, data3}}, at time 12:01:20 
{{data2, data3, data4}}, and at time 12:02:00 {{data4, data5}}. So the result 
is a tail of the stream of 1-minute length. All data that is moving out of the 
tail needs to be removed (retracted) from the queries result. Of course you can 
apply an aggregation on the table but with the same semantics as a batch query. 
For example you could compute the number of records with a {{COUNT(*)}} 
aggregate as in the following query:

{{SELECT COUNT( *) FROM stream1 WHERE proctime BETWEEN current_ timestamp - 
INTERVAL '1' MINUTE AND current_timestamp}}

The result of this query would be a single row which is constantly updated. The 
query simply counts how many rows have been received in the last minute. 

> Support window definition for SQL Queries based on WHERE clause with time 
> condition
> ---
>
> Key: FLINK-6082
> URL: https://issues.apache.org/jira/browse/FLINK-6082
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table API & SQL
>Reporter: radu
>
> Time target: Proc Time
> Calcite documentation refers to query examples where the (time)
> boundaries are defined as condition within the WHERE clause. As Flink
> community targets compatibility with Calcite, it makes sense to support
> the definition of windows via this method as well as corresponding
> aggregation on top of them.
> SQL targeted query examples:
> 
> ```SELECT productId, count(\*) FROM stream1 WHERE proctime BETWEEN current\_ 
> timestamp - INTERVAL '1' HOUR AND current\_timestamp```
> General comment:
> 1)  window boundaries are defined as conditions in WHERE clause.
> 2)  For indicating the usage of different stream times, rowtime and
> proctime can be used
> 3)  The boundaries are defined based on special construct provided by
> calcite: current\_timestamp and time operations
> Description:
> 
> The logic of this operator is strictly related to supporting aggregates
> over sliding windows defined with OVER
> ([FLINK-5653](https://issues.apache.org/jira/browse/FLINK-5653),
> [FLINK-5654](https://issues.apache.org/jira/browse/FLINK-5654),
> [FLINK-5655](https://issues.apache.org/jira/browse/FLINK-5655),
> [FLINK-5658](https://issues.apache.org/jira/browse/FLINK-5658),
> [FLINK-5656](https://issues.apache.org/jira/browse/FLINK-5656)). In this
> issue the design considered queries where the window is defined with the
> syntax of OVER clause and aggregates are applied over this period. This
> is similar in behavior with the only exception that the window
> boundaries are defined with respect to the WHERE conditions. Besides
> this the logic and the types of aggregates to be supported should be the
> same (sum, count, avg, min, max). Supporting these types of query is
> related to the pie chart problem tackled by calcite.
> Similar as for the OVER windows, the construct should build rolling
> windows (i.e., windows that are triggered and move with every incoming
> event).
> Functionality example
> -
> We exemplify below the functionality of the IN/Exists when working with
> streams.
> `SELECT a, count( * ) FROM stream1 WHERE proctime BETWEEN current_ timestamp 
> - INTERVAL '1' HOUR AND current_timestamp;`
> ||IngestionTime(Event)||  Stream1||   Output||
> |10:00:01 |Id1,10 |Id1,1|
> |10:02:00 |Id2,2  |Id2,2|
> |11:25:00 |Id3,2  |Id3,1|
> |12:03:00 |Id4,15 |Id4,2|
> |12:05:00 |Id5,11 |Id5,3|
> |12:56:00 |Id6,20 |Id6,3|
> |...|
> Implementation option
> -
> Considering that the query follows the same functionality as for the
> aggregates over window, the