Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/21899#discussion_r210664024
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/BroadcastExchangeExec.scala
---
@@ -118,12 +119,19 @@ case class BroadcastExchangeExec(
// SparkFatalException, which is a subclass of Exception.
ThreadUtils.awaitResult
// will catch this exception and re-throw the wrapped fatal
throwable.
case oe: OutOfMemoryError =>
- throw new SparkFatalException(
+ val sizeMessage = if (dataSize != -1) {
+ s"; Size of table is $dataSize"
+ } else {
+ ""
+ }
+ val oome =
new OutOfMemoryError(s"Not enough memory to build and
broadcast the table to " +
s"all worker nodes. As a workaround, you can either disable
broadcast by setting " +
s"${SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key} to -1 or
increase the spark driver " +
- s"memory by setting ${SparkLauncher.DRIVER_MEMORY} to a
higher value")
- .initCause(oe.getCause))
+ s"memory by setting ${SparkLauncher.DRIVER_MEMORY} to a
higher value$sizeMessage")
--- End diff --
My concern is mostly about collect - it fully materialize collected rows in
the memory:
https://github.com/apache/spark/blob/e0b63832181464453f753649623a24cb567a73d4/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala#L304
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]