ulysses-you commented on a change in pull request #32550:
URL: https://github.com/apache/spark/pull/32550#discussion_r634114080
##########
File path:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/hints.scala
##########
@@ -172,6 +190,14 @@ case object NO_BROADCAST_HASH extends JoinStrategyHint {
override def hintAliases: Set[String] = Set.empty
}
+/**
+ * An internal hint to encourage shuffle hash join, used by adaptive query
execution.
+ */
+case object PREFER_SHUFFLE_HASH extends JoinStrategyHint {
Review comment:
If we want to make a `SHUFFLE_HASH` hint, we need to do more check.
The priority of join strategy is in `JoinSelection` which is `BHJ` > `SMJ` >
`SHJ`. So `SHUFFLE_HASH` need a check to make sure we cannot build a `BHJ`. The
the idea of `PREFER_SHUFFLE_HASH` is for skipping this extra check at AQE side
and let `JoinSelection` to decide the final join strategy.
Logiacally, we can mix `JoinSelection` in AQE and use redundant code to
choose join strategy directly using a specific hint. But the issue is the code
maintain @cloud-fan
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]