Github user marmbrus commented on a diff in the pull request:
https://github.com/apache/spark/pull/1238#discussion_r15492240
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLConf.scala ---
@@ -39,29 +43,34 @@ trait SQLConf {
/**
* Upper bound on the sizes (in bytes) of the tables qualified for the
auto conversion to
- * a broadcast value during the physical executions of join operations.
Setting this to 0
+ * a broadcast value during the physical executions of join operations.
Setting this to -1
* effectively disables auto conversion.
- * Hive setting: hive.auto.convert.join.noconditionaltask.size.
+ *
+ * Hive setting: hive.auto.convert.join.noconditionaltask.size, whose
default value is also 10000.
*/
private[spark] def autoConvertJoinSize: Int =
get("spark.sql.auto.convert.join.size", "10000").toInt
- /** A comma-separated list of table names marked to be broadcasted
during joins. */
- private[spark] def joinBroadcastTables: String =
get("spark.sql.join.broadcastTables", "")
+ /**
+ * The default size in bytes to assign to a logical operator's
estimation statistics. By default,
+ * it is set to a larger value than `autoConvertJoinSize`, hence any
logical operator without a
+ * properly implemented estimation of this statistic will not be
incorrectly broadcasted in joins.
+ */
+ private[spark] def statsDefaultSizeInBytes: Long =
+ getOption("spark.sql.catalyst.stats.sizeInBytes").map(_.toLong)
--- End diff --
`defaultSizeInBytes`?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---