Github user mridulm commented on the issue:
https://github.com/apache/spark/pull/16677
I am not sure if I am missing something - the count's obtained are at map
side output per (map-side) partition; while limit is being computed at reduce
side (after some arbitrary partitioning/shuffle has been applied). The number
of records per partition obtained from map side need not match what is at
reduce side anymore.
Ofcourse, I am not very familiar with spark sql, so could be wrong in my
understanding.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]