Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/19714#discussion_r153679188
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala ---
@@ -91,10 +91,10 @@ abstract class SparkStrategies extends
QueryPlanner[SparkPlan] {
* predicates can be evaluated by matching join keys. If found, Join
implementations are chosen
* with the following precedence:
*
- * - Broadcast: if one side of the join has an estimated physical size
that is smaller than the
- * user-configurable [[SQLConf.AUTO_BROADCASTJOIN_THRESHOLD]]
threshold
- * or if that side has an explicit broadcast hint (e.g. the user
applied the
- * [[org.apache.spark.sql.functions.broadcast()]] function to a
DataFrame), then that side
+ * - Broadcast: if one side of the join has an explicit broadcast hint
(e.g. the user applied the
--- End diff --
```
Broadcast: We prefer to broadcast the join side with an explicit broadcast
hint(e.g. the user applied the [[org.apache.spark.sql.functions.broadcast()]]
function to a DataFrame). If both sides have the broadcast hint, we prefer to
broadcast the side with a smaller estimated physical size. If neither one of
the sides has the broadcast hint, we only broadcast the join side if its
estimated physical size that is smaller than the user-configurable
[[SQLConf.AUTO_BROADCASTJOIN_THRESHOLD]] threshold.
```
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]