[GitHub] spark pull request #20407: [SPARK-23124][SQL] Allow to disable BroadcastNest...

gatorsmile Sat, 10 Feb 2018 00:29:31 -0800

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20407#discussion_r167392901
  
    --- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
    @@ -156,6 +156,15 @@ object SQLConf {
         .booleanConf
         .createWithDefault(true)
     
    +  val ALLOW_NESTEDJOIN_FALLBACK = 
buildConf("spark.sql.join.broadcastJoinFallback.enabled")
    +    .internal()
    +    .doc("When true (default), if the other options are not available, 
fallback to try and use " +
    +      "BroadcastNestedLoopJoin as join strategy. This can cause OOM which 
can be a problem " +
    +      "in some scenarios, eg. when running the thriftserver. Turn to false 
to disable it: an " +
    +      "AnalysisException will be thrown.")
    --- End diff --
    
    `AUTO_BROADCASTJOIN_THRESHOLD` is not the threshold for avoiding OOM. It is 
normally used for performance tuning. 
    
    `BroadcastNestedLoopJoin ` is used for most cases when we do not have the 
equi join keys. Thus, disabling this join algorithm makes Spark SQL unable to 
handle many join cases. Thus, I do not think it makes sense to add this conf



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20407: [SPARK-23124][SQL] Allow to disable BroadcastNest...

Reply via email to