[GitHub] spark pull request: [SPARK-2393][SQL] Cost estimation optimization...

marmbrus Mon, 28 Jul 2014 14:28:39 -0700

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1238#discussion_r15492240
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLConf.scala ---
    @@ -39,29 +43,34 @@ trait SQLConf {
     
       /**
        * Upper bound on the sizes (in bytes) of the tables qualified for the 
auto conversion to
    -   * a broadcast value during the physical executions of join operations.  
Setting this to 0
    +   * a broadcast value during the physical executions of join operations.  
Setting this to -1
        * effectively disables auto conversion.
    -   * Hive setting: hive.auto.convert.join.noconditionaltask.size.
    +   *
    +   * Hive setting: hive.auto.convert.join.noconditionaltask.size, whose 
default value is also 10000.
        */
       private[spark] def autoConvertJoinSize: Int =
         get("spark.sql.auto.convert.join.size", "10000").toInt
     
    -  /** A comma-separated list of table names marked to be broadcasted 
during joins. */
    -  private[spark] def joinBroadcastTables: String = 
get("spark.sql.join.broadcastTables", "")
    +  /**
    +   * The default size in bytes to assign to a logical operator's 
estimation statistics.  By default,
    +   * it is set to a larger value than `autoConvertJoinSize`, hence any 
logical operator without a
    +   * properly implemented estimation of this statistic will not be 
incorrectly broadcasted in joins.
    +   */
    +  private[spark] def statsDefaultSizeInBytes: Long =
    +    getOption("spark.sql.catalyst.stats.sizeInBytes").map(_.toLong)
    --- End diff --
    
    `defaultSizeInBytes`?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2393][SQL] Cost estimation optimization...

Reply via email to