Github user MaxGekk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21899#discussion_r210664024
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/BroadcastExchangeExec.scala
 ---
    @@ -118,12 +119,19 @@ case class BroadcastExchangeExec(
               // SparkFatalException, which is a subclass of Exception. 
ThreadUtils.awaitResult
               // will catch this exception and re-throw the wrapped fatal 
throwable.
               case oe: OutOfMemoryError =>
    -            throw new SparkFatalException(
    +            val sizeMessage = if (dataSize != -1) {
    +              s"; Size of table is $dataSize"
    +            } else {
    +              ""
    +            }
    +            val oome =
                   new OutOfMemoryError(s"Not enough memory to build and 
broadcast the table to " +
                   s"all worker nodes. As a workaround, you can either disable 
broadcast by setting " +
                   s"${SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key} to -1 or 
increase the spark driver " +
    -              s"memory by setting ${SparkLauncher.DRIVER_MEMORY} to a 
higher value")
    -              .initCause(oe.getCause))
    +              s"memory by setting ${SparkLauncher.DRIVER_MEMORY} to a 
higher value$sizeMessage")
    --- End diff --
    
    My concern is mostly about collect - it fully materialize collected rows in 
the memory: 
https://github.com/apache/spark/blob/e0b63832181464453f753649623a24cb567a73d4/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala#L304


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to