zhengruifeng commented on pull request #34504: URL: https://github.com/apache/spark/pull/34504#issuecomment-983419947
> > @zhengruifeng can you highlight the differences between your PR and this one? > > IMHO, there are two main differences: > > 1, a new node `RankLimit` is introduced, and it supports both the empty partitionSpec cases and non-empty partitionSpec cases. It could support `rank` and `dense_rank` as the rank function in the future; > > 2, Normally,`TakeOrderedAndProjectExec` performs the top-K filtering in both mappers and the reducers, while `RankLimitExec` only filters rows in mappers. update on https://github.com/apache/spark/pull/34367/commits/877558e439663d1028028e9a332a5e4e6a18ad6c 1, `RankLimit` now supports row_number/rank/dense_rank, empty and non-empty partitionSpec; 2, two `RankLimitExec` nodes are inserted now, one on the map side and one on the reduce side; if there is no shuffle between the two `RankLimitExec` nodes, the filtering in the second node will be skiped; -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org