[GitHub] spark issue #21498: [SPARK-24410][SQL][Core] Optimization for Union outputPa...

viirya Wed, 06 Jun 2018 06:24:28 -0700

Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/21498
  
    > In aggregation we are replacing a needed shuffle with gathering only the 
needed rows from the other partitions.
    
    I don't know what this means actually. If we decided we don't need a 
shuffle because the partitioning satisfies the need, I'm not sure why we still 
need to gather rows from other partitions. I think it is simple, if we need 
rows from other partitions, we do shuffle, if not, we avoid shuffle.
    
    But I think this is the point we have different understanding. So as you 
said, it is better to hear others opinion.
    
    > Probably we can wait for others' opinion, but it would be also great to 
have some performance tests on both cases and different scenarios in order to 
better evaluate this change. What do you think?
    
    Yeah, I think so. I will have some tests.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #21498: [SPARK-24410][SQL][Core] Optimization for Union outputPa...

Reply via email to