[GitHub] spark pull request #21899: [SPARK-24912][SQL] Don't obscure source of OOM du...

bersprockets Fri, 17 Aug 2018 15:39:01 -0700

Github user bersprockets commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21899#discussion_r211047556
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/BroadcastExchangeExec.scala
 ---
    @@ -118,12 +119,20 @@ case class BroadcastExchangeExec(
               // SparkFatalException, which is a subclass of Exception. 
ThreadUtils.awaitResult
               // will catch this exception and re-throw the wrapped fatal 
throwable.
               case oe: OutOfMemoryError =>
    -            throw new SparkFatalException(
    +            val sizeMessage = if (dataSize != -1) {
    +              s"${SparkLauncher.DRIVER_MEMORY} by at least the estimated 
size of the " +
    --- End diff --
    
    @hvanhovell That's what was being obscured :).
    
    In testing this, I've seen various places. In the three cases I have seen 
first hand:
    
    <pre>
    java.lang.OutOfMemoryError: Not enough memory to build and broadcast the 
table to all worker nodes. As a workaround, you can either disable broadcast by 
setting spark.sql.autoBroadcastJoinThreshold to -1 or increase the spark driver 
memory by setting spark.driver.memory to a higher value.
      at 
org.apache.spark.sql.execution.joins.LongToUnsafeRowMap.grow(HashedRelation.scala:628)
      at 
org.apache.spark.sql.execution.joins.LongToUnsafeRowMap.append(HashedRelation.scala:570)
      at 
org.apache.spark.sql.execution.joins.LongHashedRelation$.apply(HashedRelation.scala:865)
    </pre>
    At that line is an allocation:
    <pre>
    val newPage = new Array[Long](newNumWords.toInt)
    </pre>
    2nd case:
    <pre>
    java.lang.OutOfMemoryError: Not enough memory to build and broadcast the 
table to all worker nodes. As a workaround, you can either disable broadcast by 
setting spark.sql.autoBroadcastJoinThreshold to -1 or increase 
spark.driver.memory by at least the estimated size of the relation (96468992 
bytes).
      at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57)
      at java.nio.ByteBuffer.allocate(ByteBuffer.java:335)
      at 
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$3.apply(TorrentBroadcast.scala:286)
      at 
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$3.apply(TorrentBroadcast.scala:286)
    </pre>
    3rd case:
    <pre>
    java.lang.OutOfMemoryError: Not enough memory to build and broadcast the 
table to all worker nodes. As a workaround, you can either disable broadcast by 
setting \
    spark.sql.autoBroadcastJoinThreshold to -1 or increase the spark driver 
memory by setting spark.driver.memory to a higher value.
      at 
org.apache.spark.unsafe.memory.MemoryBlock.allocateFromObject(MemoryBlock.java:118)
      at 
org.apache.spark.sql.catalyst.expressions.UnsafeRow.getUTF8String(UnsafeRow.java:420)
      at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown
 Source)
      at 
org.apache.spark.sql.execution.joins.UnsafeHashedRelation$.apply(HashedRelation.scala:311)
    </pre>
    At that line is also an allocation:
    <pre>
    mb = new ByteArrayMemoryBlock(array, offset, length);
    </pre>




---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #21899: [SPARK-24912][SQL] Don't obscure source of OOM du...

Reply via email to