Github user holdenk commented on the issue:

    https://github.com/apache/spark/pull/22010
  
    So by running `sc.parallelize(1.to(1000)).map(x => (x % 10, 
x)).sortByKey().distinct().count()` in 2.3.0 and my PR we can see the 
difference:
    ![240_proposed_distinct_screenshot from 2018-09-26 
11-41-13](https://user-images.githubusercontent.com/59893/46101578-317cbb00-c181-11e8-8fa0-6f6b90383aa5.png)
    ![230_distinct_screenshot from 2018-09-26 
11-40-51](https://user-images.githubusercontent.com/59893/46101583-33df1500-c181-11e8-9142-a83e8be65ee4.png)
    And see one less shuffle.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to