[GitHub] [flink] godfreyhe commented on issue #8051: [FLINK-12018] [table-planner-blink] Add support for generating optimized logical plan for Sort and Rank
godfreyhe commented on issue #8051: [FLINK-12018] [table-planner-blink] Add support for generating optimized logical plan for Sort and Rank URL: https://github.com/apache/flink/pull/8051#issuecomment-478419996 > > > > > Thanks for the refactoring and simplification. But it seems the name "FirstLastRow" causes some misunderstandings. Not sure which of us has misunderstood the purpose to introduce "FirstLastRow". From my understanding, this operator is more like "RowDuplicate" operator (maybe we should rename it if i was right), it will duplicate rows by some fields, and thus we have an option whether to keep the first row with the same key or the last row in streaming case. I don't think the "order by limit x" clause should fall into this kind of operator. Maybe i was wrong. What do you think? > > > > > > > > > > > > Yes, I also thing it more clear if we rename this operator to `RowDeduplicate` and keep `first_row` and `last_row` as aggregate call > > > > > > > > > Isn't a boolean flag `keep_first` or `keep_last` enough? > > > > > > a boolean flag is enough, or a enum type: FIRST_ROW and LAST_ROW > > I would prefer `keep_xxx`, it's more explicit to show what exactly we did. It makes sense to me. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] godfreyhe commented on issue #8051: [FLINK-12018] [table-planner-blink] Add support for generating optimized logical plan for Sort and Rank
godfreyhe commented on issue #8051: [FLINK-12018] [table-planner-blink] Add support for generating optimized logical plan for Sort and Rank URL: https://github.com/apache/flink/pull/8051#issuecomment-478418449 > > > Thanks for the refactoring and simplification. But it seems the name "FirstLastRow" causes some misunderstandings. Not sure which of us has misunderstood the purpose to introduce "FirstLastRow". From my understanding, this operator is more like "RowDuplicate" operator (maybe we should rename it if i was right), it will duplicate rows by some fields, and thus we have an option whether to keep the first row with the same key or the last row in streaming case. I don't think the "order by limit x" clause should fall into this kind of operator. Maybe i was wrong. What do you think? > > > > > > Yes, I also thing it more clear if we rename this operator to `RowDeduplicate` and keep `first_row` and `last_row` as aggregate call > > Isn't a boolean flag `keep_first` or `keep_last` enough? a boolean flag is enough, or a enum type: FIRST_ROW and LAST_ROW This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] godfreyhe commented on issue #8051: [FLINK-12018] [table-planner-blink] Add support for generating optimized logical plan for Sort and Rank
godfreyhe commented on issue #8051: [FLINK-12018] [table-planner-blink] Add support for generating optimized logical plan for Sort and Rank URL: https://github.com/apache/flink/pull/8051#issuecomment-478416766 > Thanks for the refactoring and simplification. But it seems the name "FirstLastRow" causes some misunderstandings. Not sure which of us has misunderstood the purpose to introduce "FirstLastRow". From my understanding, this operator is more like "RowDuplicate" operator (maybe we should rename it if i was right), it will duplicate rows by some fields, and thus we have an option whether to keep the first row with the same key or the last row in streaming case. I don't think the "order by limit x" clause should fall into this kind of operator. Maybe i was wrong. What do you think? Yes, I also thing it more clear if we rename this operator to `RowDeduplicate` and keep `first_row` and `last_row` as aggregate call This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] godfreyhe commented on issue #8051: [FLINK-12018] [table-planner-blink] Add support for generating optimized logical plan for Sort and Rank
godfreyhe commented on issue #8051: [FLINK-12018] [table-planner-blink] Add support for generating optimized logical plan for Sort and Rank URL: https://github.com/apache/flink/pull/8051#issuecomment-477894159 > I'm a bit confused once i discovered `Sort`, `Limit` and `Rank` can somehow convert between them? For Batch, only OVER query with filter can be converted to Rank now, because `BatchExecRank` supports two-stage rank to reduce data-shuffling. For Stream, `StreamExecLimit` does not exist. Queries with LIMIT will be converted to `StreamExecRank`, and the only meaning of `StreamExecSort`'s existence is for testing with bounded source now, which supports the queries that `StreamExecRank` cannot support. So they could not be converted between each other. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services