liupc commented on a change in pull request #20407: [SPARK-23124][SQL] Allow to 
disable BroadcastNestedLoopJoin fallback
URL: https://github.com/apache/spark/pull/20407#discussion_r245213408
 
 

 ##########
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
 ##########
 @@ -156,6 +156,15 @@ object SQLConf {
     .booleanConf
     .createWithDefault(true)
 
+  val ALLOW_NESTEDJOIN_FALLBACK = 
buildConf("spark.sql.join.broadcastJoinFallback.enabled")
+    .internal()
+    .doc("When true (default), if the other options are not available, 
fallback to try and use " +
+      "BroadcastNestedLoopJoin as join strategy. This can cause OOM which can 
be a problem " +
+      "in some scenarios, eg. when running the thriftserver. Turn to false to 
disable it: an " +
+      "AnalysisException will be thrown.")
 
 Review comment:
   @gatorsmile I think it's not just solve specific case, this is general in 
our production environment, users use sparksql and the might not so clear about 
the size of table, I think it does make sense to provide an option for disable 
BroadcastNestedLoopJoin for large dataset other than let it go which may cause 
OOM and make other user's query aborted.
   
   I think this PR should also consider the sizeInBytes of LogicalPlan
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to