nchammas commented on code in PR #45374:
URL: https://github.com/apache/spark/pull/45374#discussion_r1512030136
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala:
##########
@@ -582,11 +582,7 @@ object SQLConf {
val AUTO_BROADCASTJOIN_THRESHOLD =
buildConf("spark.sql.autoBroadcastJoinThreshold")
.doc("Configures the maximum size in bytes for a table that will be
broadcast to all worker " +
- "nodes when performing a join. By setting this value to -1 broadcasting
can be disabled. " +
- "Note that currently statistics are only supported for Hive Metastore
tables where the " +
Review Comment:
Fair question. I removed it because I don't think it explains anything.
Across all of Spark, statistics come from one of the three sources I
described in this PR: data source, catalog, and runtime. And this applies to
all cost-based optimizations, not just to auto-broadcast. Isn't that so?
So I thought it would be better to remove this comment since it indirectly
suggests that there is something special about auto-broadcast and statistics,
when that isn't the case.
But I confess I am concluding this based on a high-level understanding of
the optimizer. I didn't dig in to the details of this particular optimization
to see if there is anything really special about it.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]