ulysses-you commented on a change in pull request #32550:
URL: https://github.com/apache/spark/pull/32550#discussion_r634114080



##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/hints.scala
##########
@@ -172,6 +190,14 @@ case object NO_BROADCAST_HASH extends JoinStrategyHint {
   override def hintAliases: Set[String] = Set.empty
 }
 
+/**
+ * An internal hint to encourage shuffle hash join, used by adaptive query 
execution.
+ */
+case object PREFER_SHUFFLE_HASH extends JoinStrategyHint {

Review comment:
       If we want to make a `SHUFFLE_HASH` hint,  we need to do more check.
   
   The priority of join strategy is in `JoinSelection` which is `BHJ` > `SMJ` > 
`SHJ`. So `SHUFFLE_HASH` need a check to make sure we cannot build a `BHJ`. The 
the idea of `PREFER_SHUFFLE_HASH` is for skipping this extra check at AQE side 
and let `JoinSelection` to decide the final join strategy.
   
   Logiacally, we can mix `JoinSelection` in AQE and use redundant code to 
choose join strategy directly using a specific hint. But the issue is the code 
maintain @cloud-fan 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to