Thanks Matei! It worked. On 9 July 2015 at 19:43, Matei Zaharia <matei.zaha...@gmail.com> wrote:
> Thus means that one of your cached RDD partitions is bigger than 2 GB of > data. You can fix it by having more partitions. If you read data from a > file system like HDFS or S3, set the number of partitions higher in the > sc.textFile, hadoopFile, etc methods (it's an optional second parameter to > those methods). If you create it through parallelize or if this particular > RDD comes from a shuffle, use more tasks in the parallelize or shuffle. > > Matei > > On Jul 9, 2015, at 3:35 PM, Michal Čizmazia <mici...@gmail.com> wrote: > > Spark version 1.4.0 in the Standalone mode > > 2015-07-09 20:12:02 INFO (sparkDriver-akka.actor.default-dispatcher-3) > BlockManagerInfo:59 - Added rdd_0_0 on disk on localhost:51132 (size: 29.8 > GB) > 2015-07-09 20:12:02 ERROR (Executor task launch worker-0) Executor:96 - > Exception in task 0.0 in stage 0.0 (TID 0) > java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE > at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:836) > at > org.apache.spark.storage.DiskStore$$anonfun$getBytes$2.apply(DiskStore.scala:125) > at > org.apache.spark.storage.DiskStore$$anonfun$getBytes$2.apply(DiskStore.scala:113) > at > org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1285) > at org.apache.spark.storage.DiskStore.getBytes(DiskStore.scala:127) > at org.apache.spark.storage.DiskStore.getBytes(DiskStore.scala:134) > at > org.apache.spark.storage.BlockManager.doGetLocal(BlockManager.scala:509) > at > org.apache.spark.storage.BlockManager.getLocal(BlockManager.scala:427) > at > org.apache.spark.storage.BlockManager.get(BlockManager.scala:615) > at > org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:154) > at > org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:78) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:242) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) > at > org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:242) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) > at > org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:242) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:70) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:70) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > > > On 9 July 2015 at 18:11, Ted Yu <yuzhih...@gmail.com> wrote: > >> Which release of Spark are you using ? >> >> Can you show the complete stack trace ? >> >> getBytes() could be called from: >> getBytes(file, 0, file.length) >> or: >> getBytes(segment.file, segment.offset, segment.length) >> >> Cheers >> >> On Thu, Jul 9, 2015 at 2:50 PM, Michal Čizmazia <mici...@gmail.com> >> wrote: >> >>> Please could anyone give me pointers for appropriate SparkConf to work >>> around "Size exceeds Integer.MAX_VALUE"? >>> >>> Stacktrace: >>> >>> 2015-07-09 20:12:02 INFO (sparkDriver-akka.actor.default-dispatcher-3) >>> BlockManagerInfo:59 - Added rdd_0_0 on disk on localhost:51132 (size: 29.8 >>> GB) >>> 2015-07-09 20:12:02 ERROR (Executor task launch worker-0) Executor:96 - >>> Exception in task 0.0 in stage 0.0 (TID 0) >>> java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE >>> at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:836) >>> at >>> org.apache.spark.storage.DiskStore$$anonfun$getBytes$2.apply(DiskStore.scala:125) >>> ... >>> >>> >> > >