[ https://issues.apache.org/jira/browse/SPARK-25704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Imran Rashid resolved SPARK-25704. ---------------------------------- Resolution: Fixed Fix Version/s: 2.4.0 Resolved by pr https://github.com/apache/spark/pull/22705 Commit to master https://github.com/apache/spark/commit/43717dee570dc41d71f0b27b8939f6297a029a02 to branch-2.4 https://github.com/apache/spark/commit/1001d2314275c902da519725da266a23b537e33a > Replication of > 2GB block fails due to bad config default > ---------------------------------------------------------- > > Key: SPARK-25704 > URL: https://issues.apache.org/jira/browse/SPARK-25704 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 2.4.0 > Reporter: Imran Rashid > Assignee: Imran Rashid > Priority: Major > Fix For: 2.4.0 > > > Replicating a block > 2GB currently fails because it tries to allocate a > bytebuffer that is just a *bit* too large, due to a bad default config. This > [line|https://github.com/apache/spark/blob/cd40655965072051dfae65eabd979edff0e4d398/core/src/main/scala/org/apache/spark/storage/BlockManager.scala#L454]: > {code} > ChunkedByteBuffer.fromFile(tmpFile, > conf.get(config.MEMORY_MAP_LIMIT_FOR_TESTS).toInt) > {code} > {{MEMORY_MAP_LIMIT_FOR_TESTS}} defaults to {{Integer.MAX_VALUE}}, but > unfortunately that is just a tiny bit too big. You'll see an exception like: > {noformat} > 18/10/09 21:21:54 WARN server.TransportChannelHandler: Exception in > connection from /172.31.118.153:53534 > java.lang.OutOfMemoryError: Requested array size exceeds VM limit > at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57) > at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) > at > org.apache.spark.util.io.ChunkedByteBuffer$$anonfun$8.apply(ChunkedByteBuffer.scala:199) > at > org.apache.spark.util.io.ChunkedByteBuffer$$anonfun$8.apply(ChunkedByteBuffer.scala:199) > at > org.apache.spark.util.io.ChunkedByteBufferOutputStream.allocateNewChunkIfNeeded(ChunkedByteBufferOutputStream.scala:87) > at > org.apache.spark.util.io.ChunkedByteBufferOutputStream.write(ChunkedByteBufferOutputStream.scala:75) > at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:2315) > at org.apache.commons.io.IOUtils.copy(IOUtils.java:2270) > at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:2291) > at org.apache.commons.io.IOUtils.copy(IOUtils.java:2246) > at > org.apache.spark.util.io.ChunkedByteBuffer$$anonfun$fromFile$1.apply$mcI$sp(ChunkedByteBuffer.scala:201) > at > org.apache.spark.util.io.ChunkedByteBuffer$$anonfun$fromFile$1.apply(ChunkedByteBuffer.scala:201) > at > org.apache.spark.util.io.ChunkedByteBuffer$$anonfun$fromFile$1.apply(ChunkedByteBuffer.scala:201) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) > at > org.apache.spark.util.io.ChunkedByteBuffer$.fromFile(ChunkedByteBuffer.scala:202) > at > org.apache.spark.util.io.ChunkedByteBuffer$.fromFile(ChunkedByteBuffer.scala:184) > at > org.apache.spark.storage.BlockManager$$anon$1.onComplete(BlockManager.scala:454) > {noformat} > at least on my system, its just 2 bytes too big :( > {noformat} > > scala -J-Xmx4G > import java.nio.ByteBuffer > scala> ByteBuffer.allocate(Integer.MAX_VALUE) > java.lang.OutOfMemoryError: Requested array size exceeds VM limit > at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57) > at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) > ... 30 elided > scala> ByteBuffer.allocate(Integer.MAX_VALUE - 1) > java.lang.OutOfMemoryError: Requested array size exceeds VM limit > at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57) > at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) > ... 30 elided > scala> ByteBuffer.allocate(Integer.MAX_VALUE - 2) > res3: java.nio.ByteBuffer = java.nio.HeapByteBuffer[pos=0 lim=2147483645 > cap=2147483645] > {noformat} > *Workaround*: Set to "spark.storage.memoryMapLimitForTests" something a bit > smaller, eg. 2147483135 (that's Integer.MAX_VALUE - 512, just in case its a > bit different on other systems). > This was introduced by SPARK-25422. I'll file a PR shortly. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org