Sorry for the stream of consciousness but after thinking about this a bit more, I'm thinking that the FileNotFoundExceptions are due to tasks being cancelled/restarted and the root cause is the OutOfMemoryError.
If anyone has any insights on how to debug this more deeply or relevant config settings, that would be much appreciated. Otherwise, I figure next steps would be to enable more debugging levels in the spark code to see what much memory the code is trying to allocate. At this point, I'm wondering if the block could be in the GB range. -Suren On Mon, Jun 9, 2014 at 10:27 PM, Surendranauth Hiraman < suren.hira...@velos.io> wrote: > I don't know if this is related but a little earlier in stderr, I also > have the following stacktrace. But this stacktrace seems to be when the > code is grabbing RDD data from a remote node, which is different from the > above. > > > 14/06/09 21:33:26 ERROR executor.ExecutorUncaughtExceptionHandler: > Uncaught exception in thread Thread[Executor task launch worker-16,5,main] > java.lang.OutOfMemoryError: Java heap space > at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57) > at java.nio.ByteBuffer.allocate(ByteBuffer.java:329) > at org.apache.spark.storage.BlockMessage.set(BlockMessage.scala:94) > at > org.apache.spark.storage.BlockMessage$.fromByteBuffer(BlockMessage.scala:176) > at > org.apache.spark.storage.BlockMessageArray.set(BlockMessageArray.scala:63) > at > org.apache.spark.storage.BlockMessageArray$.fromBufferMessage(BlockMessageArray.scala:109) > at > org.apache.spark.storage.BlockManagerWorker$.syncGetBlock(BlockManagerWorker.scala:128) > at > org.apache.spark.storage.BlockManager$$anonfun$doGetRemote$2.apply(BlockManager.scala:489) > at > org.apache.spark.storage.BlockManager$$anonfun$doGetRemote$2.apply(BlockManager.scala:487) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) > at > org.apache.spark.storage.BlockManager.doGetRemote(BlockManager.scala:487) > at > org.apache.spark.storage.BlockManager.getRemote(BlockManager.scala:473) > at org.apache.spark.storage.BlockManager.get(BlockManager.scala:513) > at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:39) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:227) > at org.apache.spark.rdd.FilteredRDD.compute(FilteredRDD.scala:34) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) > at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:77) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:227) > at org.apache.spark.rdd.FilteredRDD.compute(FilteredRDD.scala:34) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) > at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:77) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:227) > at org.apache.spark.rdd.FilteredRDD.compute(FilteredRDD.scala:34) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) > at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:77) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:227) > at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) > at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:77) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:227) > > > > On Mon, Jun 9, 2014 at 10:05 PM, Surendranauth Hiraman < > suren.hira...@velos.io> wrote: > >> I have a dataset of about 10GB. I am using persist(DISK_ONLY) to avoid >> out of memory issues when running my job. >> >> When I run with a dataset of about 1 GB, the job is able to complete. >> >> But when I run with the larger dataset of 10 GB, I get the following >> error/stacktrace, which seems to be happening when the RDD is writing out >> to disk. >> >> Anyone have any ideas as to what is going on or if there is a setting I >> can tune? >> >> >> 14/06/09 21:33:55 ERROR executor.Executor: Exception in task ID 560 >> java.io.FileNotFoundException: >> /tmp/spark-local-20140609210741-0bb8/14/rdd_331_175 (No such file or >> directory) >> at java.io.FileOutputStream.open(Native Method) >> at java.io.FileOutputStream.<init>(FileOutputStream.java:209) >> at java.io.FileOutputStream.<init>(FileOutputStream.java:160) >> at org.apache.spark.storage.DiskStore.putValues(DiskStore.scala:79) >> at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:698) >> at org.apache.spark.storage.BlockManager.put(BlockManager.scala:546) >> at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:95) >> at org.apache.spark.rdd.RDD.iterator(RDD.scala:227) >> at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) >> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) >> at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:77) >> at org.apache.spark.rdd.RDD.iterator(RDD.scala:227) >> at >> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:158) >> at >> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) >> at org.apache.spark.scheduler.Task.run(Task.scala:51) >> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187) >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >> at java.lang.Thread.run(Thread.java:679) >> >> -- >> >> SUREN HIRAMAN, VP TECHNOLOGY >> Velos >> Accelerating Machine Learning >> >> 440 NINTH AVENUE, 11TH FLOOR >> NEW YORK, NY 10001 >> O: (917) 525-2466 ext. 105 >> F: 646.349.4063 >> E: suren.hiraman@v <suren.hira...@sociocast.com>elos.io >> W: www.velos.io >> >> > > > -- > > SUREN HIRAMAN, VP TECHNOLOGY > Velos > Accelerating Machine Learning > > 440 NINTH AVENUE, 11TH FLOOR > NEW YORK, NY 10001 > O: (917) 525-2466 ext. 105 > F: 646.349.4063 > E: suren.hiraman@v <suren.hira...@sociocast.com>elos.io > W: www.velos.io > > -- SUREN HIRAMAN, VP TECHNOLOGY Velos Accelerating Machine Learning 440 NINTH AVENUE, 11TH FLOOR NEW YORK, NY 10001 O: (917) 525-2466 ext. 105 F: 646.349.4063 E: suren.hiraman@v <suren.hira...@sociocast.com>elos.io W: www.velos.io