[GitHub] flink issue #4263: [FLINK-6075] - Support Limit/Top(Sort) for Stream SQL

2017-07-20 Thread rtudoran
Github user rtudoran commented on the issue:

https://github.com/apache/flink/pull/4263
  
@fhueske ,@wuchong I am closing this PR as i opened a new one for 
offset/fetch without retraction 
Check #4380


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #4263: [FLINK-6075] - Support Limit/Top(Sort) for Stream SQL

2017-07-20 Thread rtudoran
Github user rtudoran commented on the issue:

https://github.com/apache/flink/pull/4263
  
@wuchong sounds good to me. Please ping me when you have the PR if you want 
me to take a look


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #4263: [FLINK-6075] - Support Limit/Top(Sort) for Stream SQL

2017-07-19 Thread wuchong
Github user wuchong commented on the issue:

https://github.com/apache/flink/pull/4263
  
@rtudoran I have exactly the same idea with you. Calcite doesn't push 
partition fields to LogicalSort. Actually, LogicalSort doesn't have a partition 
field. So I make a custom rule to transform LogicalWindow into a LogicalRank (I 
customize it) when the query pattern matched. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #4263: [FLINK-6075] - Support Limit/Top(Sort) for Stream SQL

2017-07-19 Thread rtudoran
Github user rtudoran commented on the issue:

https://github.com/apache/flink/pull/4263
  
@fhueske ,@wuchong  thanks for the feedback. I will modify all these today 
- tomorrow and ping you when they are pushed.

@wuchong - TopN for each group should not be hard to implement. In fact if 
you look at the logic of how things are implemented we are in a similar setup 
as for OVER windows. We have the input stream and we apply a keyBy and the 
processfunction (for order by / fetch/offset). Thus we would only need the 
Calcite syntax to make sure the partition fields are pushed in the LogicalSort 
object. Did you try to see if you have directly a group by if this is pushed in 
the sort object?
Otherwise we can make a rule to combine a Group object with the sort object


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #4263: [FLINK-6075] - Support Limit/Top(Sort) for Stream SQL

2017-07-18 Thread wuchong
Github user wuchong commented on the issue:

https://github.com/apache/flink/pull/4263
  
Hi @fhueske ,  I left this comment under the JIRA. To make sure you know 
our situation, I post it here:

>I think the most useful feature is PR3 "ORDER BY  OFFSET FETCH" 
(i.e. TopN). It is an important requirement. And there is something more 
complex such as TopN for each group (e.g. 
https://stackoverflow.com/questions/176964/select-top-10-records-for-each-category).
 We have an idea (prototype) to support it, and planning to make a whole design 
for the TopN (any-attribute), and hope that to meet soon.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #4263: [FLINK-6075] - Support Limit/Top(Sort) for Stream SQL

2017-07-18 Thread fhueske
Github user fhueske commented on the issue:

https://github.com/apache/flink/pull/4263
  
Thanks @rtudoran. 

Regarding PR it is up to you whether you'd like to update the current or 
open a new one. I think the new version will have less changes (LOCs and files) 
because we only need to add a few counters and conditions to 
`ProcTimeSortProcessFunction` and `RowTimeSortProcessFunction`. 

Regarding the JIRA, I would change the title and description of FLINK-6075 
describe the `ORDER BY *time ASC` case (i.e., what is already implemented and 
merged) and open new JIRAs for 
- `ORDER BY *time ASC OFFSET FETCH`
- `ORDER BY *time DESC OFFSET FETCH`
- `ORDER BY * OFFSET FETCH`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #4263: [FLINK-6075] - Support Limit/Top(Sort) for Stream SQL

2017-07-18 Thread rtudoran
Github user rtudoran commented on the issue:

https://github.com/apache/flink/pull/4263
  
@fhueske @wuchong
Thanks for the feedback. Fine for me to eliminate the retraction for the 
*time ASC cases. I will modify the PR (to support *time ASC cases) over the 
next days and re-submit it. I am not sure if you prefer it to be a new commit 
here or if i should open a new PR?

@fhueske
Regarding the other cases you mentioned - to support time DESC ... do you 
want to modify the JIRA as well? can you modify it or do you want me to add the 
sub-jiras for the cases you mentioned?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #4263: [FLINK-6075] - Support Limit/Top(Sort) for Stream SQL

2017-07-17 Thread fhueske
Github user fhueske commented on the issue:

https://github.com/apache/flink/pull/4263
  
Hi @rtudoran, you are right, I said we will need retraction for OFFSET and 
FETCH. However, it is only required for the general case and the special case 
of `ORDER BY *time ASC` without updates should be implemented without. 

IMO, the semantics of a query are fixed and not really up for discussion. 
Just imagine a regular database would execute the same query on a table that 
holds the same data as the stream. This defines the result of the streaming 
query that our operators must produce.

Moving forward, I would modify this PR to support `ORDER BY` with `OFFSET` 
and `FETCH` for append-only input by extending the existing operators with 
counters and open JIRAs for the other two cases (`ORDER BY *time DESC` and 
`ORDER BY *`).

Best, Fabian


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #4263: [FLINK-6075] - Support Limit/Top(Sort) for Stream SQL

2017-07-17 Thread wuchong
Github user wuchong commented on the issue:

https://github.com/apache/flink/pull/4263
  
@rtudoran  "SELECT x FROM stream ORDER BY *time FETCH 2" do not need 
retraction. Because it is order by ascending time. The query will only emit the 
first 2 rows and after that drop all rows. We need retraction if the query is 
order by time desc. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #4263: [FLINK-6075] - Support Limit/Top(Sort) for Stream SQL

2017-07-17 Thread rtudoran
Github user rtudoran commented on the issue:

https://github.com/apache/flink/pull/4263
  
@fhueske You were the one  that argue that retraction support is needed for 
offset and fetch. When we were discussing this i was not into having the 
retraction. I think finally it depends on what behavior we want to give for the 
function. If we have a SELECT x FROM stream ORDER BY *time FETCH 2 when we emit 
the new latest data - do we still want to invalidate what we have emitted 
previously (in this case we would emit the retraction), otherwise if we 
consider that that was the valid input at that point in time - we do not need 
the retraction. I am fine either way. Let me know. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #4263: [FLINK-6075] - Support Limit/Top(Sort) for Stream SQL

2017-07-16 Thread fhueske
Github user fhueske commented on the issue:

https://github.com/apache/flink/pull/4263
  
Hi @rtudoran, I had a look at the PR and left a comment on 
[FLINK-6075](https://issues.apache.org/jira/browse/FLINK-6075).

I think we do not need retraction for `OFFSET` and `FETCH` on `ORDER BY 
*time ASC` and can implement this by extending the existing `ProcessFunction` 
for sorting. We only need to buffer and retract rows if we support `OFFSET` and 
`FETCH` for arbitrary sort orders.

Let me know what you think,
Fabian



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #4263: [FLINK-6075] - Support Limit/Top(Sort) for Stream SQL

2017-07-06 Thread rtudoran
Github user rtudoran commented on the issue:

https://github.com/apache/flink/pull/4263
  
@wuchong @fhueske 
I closed that old PR (i forgot about it).
Thanks for looking into this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #4263: [FLINK-6075] - Support Limit/Top(Sort) for Stream SQL

2017-07-05 Thread wuchong
Github user wuchong commented on the issue:

https://github.com/apache/flink/pull/4263
  
Thank your for the work , I will also look into this in the next days. BTW, 
can you please close #3714 @rtudoran ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #4263: [FLINK-6075] - Support Limit/Top(Sort) for Stream SQL

2017-07-05 Thread rtudoran
Github user rtudoran commented on the issue:

https://github.com/apache/flink/pull/4263
  
@fhueske
I have implemented the support for offset and fetch for both rowtime and 
proctime. 
This PR will close the JIRA issue. Please have a look


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---