zhengruifeng commented on pull request #34504:
URL: https://github.com/apache/spark/pull/34504#issuecomment-983419947


   > > @zhengruifeng can you highlight the differences between your PR and this 
one?
   > 
   > IMHO, there are two main differences:
   > 
   > 1, a new node `RankLimit` is introduced, and it supports both the empty 
partitionSpec cases and non-empty partitionSpec cases. It could support `rank` 
and `dense_rank` as the rank function in the future;
   > 
   > 2, Normally,`TakeOrderedAndProjectExec` performs the top-K filtering in 
both mappers and the reducers, while `RankLimitExec` only filters rows in 
mappers.
   
   update on 
https://github.com/apache/spark/pull/34367/commits/877558e439663d1028028e9a332a5e4e6a18ad6c
   
   1, `RankLimit` now supports row_number/rank/dense_rank, empty and non-empty 
partitionSpec;
   2, two `RankLimitExec` nodes are inserted now, one on the map side and one 
on the reduce side; if there is no shuffle between the two `RankLimitExec` 
nodes, the filtering in the second node will be skiped;


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to