c21 commented on pull request #32210: URL: https://github.com/apache/spark/pull/32210#issuecomment-826251579
btw if we worry about if too many sort merge join converted to shuffled hash join if enabling shuffled hash join by default. Please note we will only enable shuffled hash join if the estimated/run-time size of one side is less than `"spark.sql.autoBroadcastJoinThreshold" * "spark.sql.shuffle.partitions"`, and this side is 3x smaller than the other side (https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala#L258-L263). This is somewhat restricted rule, and from out side in practice, enabling shuffled hash join by default only incurs 25% of sort merge join getting converted. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
