ulysses-you commented on pull request #32328: URL: https://github.com/apache/spark/pull/32328#issuecomment-826626797
@c21 thanks for the input. Yes, we can not optimize skew with all type of joins's build side in AQE. But at least, currently we can handle the skew inner like join with both stream and build side. > If that's true, it sounds to me that it may potentially introduce more OOM on build side, as tasks are sharing executor's off-heap memory to build hash maps Yes, it's a side effect for `OptimizeSkewedJoin`, but smj's advantage is it could spill. IMO, if user specify the shuffled hash join to do execution that means they know the benefit and issue of it. And in the other hand, we can easily increase the menroy but hard to make skew join fast. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
