[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistics to improve ...

mridulm Sat, 23 Jun 2018 02:17:16 -0700

Github user mridulm commented on the issue:

    https://github.com/apache/spark/pull/16677
  
    I am not sure if I am missing something - the count's obtained are at map 
side output per (map-side) partition; while limit is being computed at reduce 
side (after some arbitrary partitioning/shuffle has been applied). The number 
of records per partition obtained from map side need not match what is at 
reduce side anymore.
    
    Ofcourse, I am not very familiar with spark sql, so could be wrong in my 
understanding.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistics to improve ...

Reply via email to