[GitHub] [spark] cloud-fan commented on pull request #28846: [SPARK-32012][SQL] Incrementally create and materialize query stage to avoid unnecessary local shuffle

GitBox Sun, 21 Jun 2020 21:35:26 -0700


cloud-fan commented on pull request #28846:
URL: https://github.com/apache/spark/pull/28846#issuecomment-647269450



   > If first stage uses all resources, I think later stage still needs to held 
off?
   
   That's true, but that's an assumption. It's also possible that these 2 jobs 
indeed run together.
   
   > the speed-up of AQE is gained by triggering all stages (not holding off 
other stage as you said) together, or optimizing join from SMJ to BHJ (if we 
only consider join case)
   
   In the benchmark, the default parallelism takes all the CPU cores. I think 
the most perf gain should be from shuffle partition coalescing and SMJ -> BHJ. 
cc @JkSelf 
   
   That said, by design AQE triggers all independent stages at the same time, 
to maximize the parallelism. And it's helpful if the resource is sufficient (or 
auto-scaling). I don't think we should change this design.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] cloud-fan commented on pull request #28846: [SPARK-32012][SQL] Incrementally create and materialize query stage to avoid unnecessary local shuffle

Reply via email to