Github user bersprockets commented on a diff in the pull request:
https://github.com/apache/spark/pull/21899#discussion_r212756302
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/BroadcastExchangeExec.scala
---
@@ -118,12 +119,20 @@ case class BroadcastExchangeExec(
// SparkFatalException, which is a subclass of Exception.
ThreadUtils.awaitResult
// will catch this exception and re-throw the wrapped fatal
throwable.
case oe: OutOfMemoryError =>
- throw new SparkFatalException(
+ val sizeMessage = if (dataSize != -1) {
+ s"${SparkLauncher.DRIVER_MEMORY} by at least the estimated
size of the " +
+ s"relation ($dataSize bytes)"
--- End diff --
@rezasafi The dataSize appears to be inflated by 2-3 times, at least
relative to the size of the actual data in the table. That may be because these
relations are backed by map-like objects that have keys and (likely) other
internal structures.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]