[GitHub] [flink] godfreyhe commented on issue #8051: [FLINK-12018] [table-planner-blink] Add support for generating optimized logical plan for Sort and Rank

2019-03-31 Thread GitBox
godfreyhe commented on issue #8051:  [FLINK-12018] [table-planner-blink] Add 
support for generating optimized logical plan for Sort and Rank
URL: https://github.com/apache/flink/pull/8051#issuecomment-478419996
 
 
   > > > > > Thanks for the refactoring and simplification. But it seems the 
name "FirstLastRow" causes some misunderstandings. Not sure which of us has 
misunderstood the purpose to introduce "FirstLastRow". From my understanding, 
this operator is more like "RowDuplicate" operator (maybe we should rename it 
if i was right), it will duplicate rows by some fields, and thus we have an 
option whether to keep the first row with the same key or the last row in 
streaming case. I don't think the "order by limit x" clause should fall into 
this kind of operator. Maybe i was wrong. What do you think?
   > > > > 
   > > > > 
   > > > > Yes, I also thing it more clear if we rename this operator to 
`RowDeduplicate` and keep `first_row` and `last_row` as aggregate call
   > > > 
   > > > 
   > > > Isn't a boolean flag `keep_first` or `keep_last` enough?
   > > 
   > > 
   > > a boolean flag is enough, or a enum type: FIRST_ROW and LAST_ROW
   > 
   > I would prefer `keep_xxx`, it's more explicit to show what exactly we did.
   
   It makes sense to me.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] godfreyhe commented on issue #8051: [FLINK-12018] [table-planner-blink] Add support for generating optimized logical plan for Sort and Rank

2019-03-31 Thread GitBox
godfreyhe commented on issue #8051:  [FLINK-12018] [table-planner-blink] Add 
support for generating optimized logical plan for Sort and Rank
URL: https://github.com/apache/flink/pull/8051#issuecomment-478418449
 
 
   > > > Thanks for the refactoring and simplification. But it seems the name 
"FirstLastRow" causes some misunderstandings. Not sure which of us has 
misunderstood the purpose to introduce "FirstLastRow". From my understanding, 
this operator is more like "RowDuplicate" operator (maybe we should rename it 
if i was right), it will duplicate rows by some fields, and thus we have an 
option whether to keep the first row with the same key or the last row in 
streaming case. I don't think the "order by limit x" clause should fall into 
this kind of operator. Maybe i was wrong. What do you think?
   > > 
   > > 
   > > Yes, I also thing it more clear if we rename this operator to 
`RowDeduplicate` and keep `first_row` and `last_row` as aggregate call
   > 
   > Isn't a boolean flag `keep_first` or `keep_last` enough?
   
   a boolean flag is enough, or a enum type: FIRST_ROW and LAST_ROW


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] godfreyhe commented on issue #8051: [FLINK-12018] [table-planner-blink] Add support for generating optimized logical plan for Sort and Rank

2019-03-31 Thread GitBox
godfreyhe commented on issue #8051:  [FLINK-12018] [table-planner-blink] Add 
support for generating optimized logical plan for Sort and Rank
URL: https://github.com/apache/flink/pull/8051#issuecomment-478416766
 
 
   > Thanks for the refactoring and simplification. But it seems the name 
"FirstLastRow" causes some misunderstandings. Not sure which of us has 
misunderstood the purpose to introduce "FirstLastRow". From my understanding, 
this operator is more like "RowDuplicate" operator (maybe we should rename it 
if i was right), it will duplicate rows by some fields, and thus we have an 
option whether to keep the first row with the same key or the last row in 
streaming case. I don't think the "order by limit x" clause should fall into 
this kind of operator. Maybe i was wrong. What do you think?
   
   Yes, I also thing it more clear if we rename this operator to 
`RowDeduplicate` and keep `first_row` and `last_row` as aggregate call 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] godfreyhe commented on issue #8051: [FLINK-12018] [table-planner-blink] Add support for generating optimized logical plan for Sort and Rank

2019-03-29 Thread GitBox
godfreyhe commented on issue #8051:  [FLINK-12018] [table-planner-blink] Add 
support for generating optimized logical plan for Sort and Rank
URL: https://github.com/apache/flink/pull/8051#issuecomment-477894159
 
 
   > I'm a bit confused once i discovered `Sort`, `Limit` and `Rank` can 
somehow convert between them?
   
   For Batch, only OVER query with filter can be converted to Rank now, because 
`BatchExecRank` supports two-stage rank to reduce data-shuffling.
   For Stream, `StreamExecLimit` does not exist. Queries with LIMIT will be 
converted to `StreamExecRank`, and the only meaning of `StreamExecSort`'s 
existence is for testing with bounded source now, which supports the queries 
that `StreamExecRank` cannot support. 
   So they could not be converted between each other.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services