[GitHub] [spark] c21 edited a comment on pull request #32210: [SPARK-32634][SQL] Introduce sort-based fallback for shuffled hash join (non-code-gen path)

GitBox Sun, 25 Apr 2021 22:00:47 -0700


c21 edited a comment on pull request #32210:
URL: https://github.com/apache/spark/pull/32210#issuecomment-826493892



   After skew join handling, the output partitioning is destroyed, but this 
approach keeps output partitioning. [Skew join handling will not be enabled if 
it introduces extra shuffle in plan 
now](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeSkewedJoin.scala#L279).
 But I agree the change in AQE for skew join handling is more incremental and 
less intrusive. But as we see here, I don't see major intrusive API change here 
for this PR neither. I am just brainstorming the pros and cons, and I think we 
should pick the direction towards the eventual goal - enabling shuffled hash 
join by default.
   
   @cloud-fan - as you mentioned earlier, I agree with that (1). run-time 
sort-based fallback in shuffled hash join itself & (2). AQE skew join handling 
/ hybrid join features, to be orthogonal with each other. AQE is great to cover 
a lot of cases, but as we all know it has some limitations (listed some above 
and here).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] c21 edited a comment on pull request #32210: [SPARK-32634][SQL] Introduce sort-based fallback for shuffled hash join (non-code-gen path)

Reply via email to