Github user mridulm commented on the issue:

    https://github.com/apache/spark/pull/16677
  
    I am not sure if I am missing something - the count's obtained are at map 
side output per (map-side) partition; while limit is being computed at reduce 
side (after some arbitrary partitioning/shuffle has been applied). The number 
of records per partition obtained from map side need not match what is at 
reduce side anymore.
    
    Ofcourse, I am not very familiar with spark sql, so could be wrong in my 
understanding.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to