[GitHub] [spark] bmarcott edited a comment on issue #27096: [SPARK-28148][SQL] Repartition after join is not optimized away

GitBox Tue, 14 Jan 2020 22:10:31 -0800

bmarcott edited a comment on issue #27096: [SPARK-28148][SQL] Repartition after 
join is not optimized away
URL: https://github.com/apache/spark/pull/27096#issuecomment-574475407
 
 
   I started to try including #26946, but it feels a little messy.
   
   Notice that the numPartitions of the previous shuffle are maintained, but 
the distribution/partitioning is changed:
   `case (ShuffleExchangeExec(partitioning, child, _), distribution: 
OrderedDistribution) =>
           
ShuffleExchangeExec(distribution.createPartitioning(partitioning.numPartitions),
 child)`
   
   If we try to remove this code from EnsureRequirements, it'll create a new 
shuffle node with `defaultNumPreShufflePartitions`. I'm not sure it makes sense 
to have a general rule that if we have a shuffle with range partitioning with 
another shuffle as child, we eliminate the child but reuse it's numpartitions?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] bmarcott edited a comment on issue #27096: [SPARK-28148][SQL] Repartition after join is not optimized away

Reply via email to