Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20407#discussion_r167392674
  
    --- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
    @@ -156,6 +156,15 @@ object SQLConf {
         .booleanConf
         .createWithDefault(true)
     
    +  val ALLOW_NESTEDJOIN_FALLBACK = 
buildConf("spark.sql.join.broadcastJoinFallback.enabled")
    +    .internal()
    +    .doc("When true (default), if the other options are not available, 
fallback to try and use " +
    +      "BroadcastNestedLoopJoin as join strategy. This can cause OOM which 
can be a problem " +
    +      "in some scenarios, eg. when running the thriftserver. Turn to false 
to disable it: an " +
    +      "AnalysisException will be thrown.")
    --- End diff --
    
    I see and in general I agree with you, but in this case 
`BroadcastNestedLoopJoin` is used as a fallback even though we are over the 
threshold for the broadcast joins. So it is very likely that this is going to 
throw an OOM.
    So the rationale of this choice seems to me: I don't have any working join 
implementation, then just go for this one even though I know in advance that a 
OOM is likely to occur; anyway I don't have other choices, so if the OOM won't 
occur I have been lucky and I have been able to run it, otherwise it throws an 
OOM instead of an `AnalysisException` (so I am not able to run this scenario 
anyway). And this is fine if we think to normal Spark application, while for 
long running ones like STS an OOM is much worse than an `AnalysisException`.
    Therefore, I know that this is not the solution to all OOM problem, but the 
reason of this PR is that I think that when this rule was introduced, the 
choice was made "forgetting" of use cases like STS. Do you agree with me?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to