It may be this issue: https://issues.apache.org/jira/browse/SPARK-6235 <https://issues.apache.org/jira/browse/SPARK-6235> which limits the size of the blocks in the file being written to disk to 2GB.
If so, the solution is for you to try tuning for smaller tasks. Try increasing the number of partitions, or using a more space-efficient data structure inside the RDD, or increasing the amount of memory available to spark and caching the data in memory. Make sure you are using Kryo serialization. Andrew > On Jul 23, 2016, at 9:00 PM, Ascot Moss <ascot.m...@gmail.com> wrote: > > > Hi, > > Please help! > > My spark: 1.6.2 > Java: java8_u40 > > I am trying random forest training, I got " Size exceeds Integer.MAX_VALUE". > > Any idea how to resolve it? > > > (the log) > 16/07/24 07:59:49 ERROR Executor: Exception in task 0.0 in stage 7.0 (TID 25) > > java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE > at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:836) > at > org.apache.spark.storage.DiskStore$$anonfun$getBytes$2.apply(DiskStore.scala:127) > > at > org.apache.spark.storage.DiskStore$$anonfun$getBytes$2.apply(DiskStore.scala:115) > > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1250) > at org.apache.spark.storage.DiskStore.getBytes(DiskStore.scala:129) > at org.apache.spark.storage.DiskStore.getBytes(DiskStore.scala:136) > at org.apache.spark.storage.BlockManager.doGetLocal(BlockManager.scala:503) > > at org.apache.spark.storage.BlockManager.getLocal(BlockManager.scala:420) > > at org.apache.spark.storage.BlockManager.get(BlockManager.scala:625) > at org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:154) > > at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:78) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:268) > at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) > at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) > > at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > > at org.apache.spark.scheduler.Task.run(Task.scala:89) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > > at java.lang.Thread.run(Thread.java:745) > 16/07/24 07:59:49 WARN TaskSetManager: Lost task 0.0 in stage 7.0 (TID 25, > localhost): java.lang.IllegalArgumentException: Size exceeds > Integer.MAX_VALUE > at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:836) > at > org.apache.spark.storage.DiskStore$$anonfun$getBytes$2.apply(DiskStore.scala:127) > > at > org.apache.spark.storage.DiskStore$$anonfun$getBytes$2.apply(DiskStore.scala:115) > > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1250) > at org.apache.spark.storage.DiskStore.getBytes(DiskStore.scala:129) > at org.apache.spark.storage.DiskStore.getBytes(DiskStore.scala:136) > at org.apache.spark.storage.BlockManager.doGetLocal(BlockManager.scala:503) > > at org.apache.spark.storage.BlockManager.getLocal(BlockManager.scala:420) > > at org.apache.spark.storage.BlockManager.get(BlockManager.scala:625) > at org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:154) > > at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:78) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:268) > at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) > at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) > > at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > > at org.apache.spark.scheduler.Task.run(Task.scala:89) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > > at java.lang.Thread.run(Thread.java:745) > > > Regards >