I have a dataset of about 10GB. I am using persist(DISK_ONLY) to avoid out of memory issues when running my job.
When I run with a dataset of about 1 GB, the job is able to complete. But when I run with the larger dataset of 10 GB, I get the following error/stacktrace, which seems to be happening when the RDD is writing out to disk. Anyone have any ideas as to what is going on or if there is a setting I can tune? 14/06/09 21:33:55 ERROR executor.Executor: Exception in task ID 560 java.io.FileNotFoundException: /tmp/spark-local-20140609210741-0bb8/14/rdd_331_175 (No such file or directory) at java.io.FileOutputStream.open(Native Method) at java.io.FileOutputStream.<init>(FileOutputStream.java:209) at java.io.FileOutputStream.<init>(FileOutputStream.java:160) at org.apache.spark.storage.DiskStore.putValues(DiskStore.scala:79) at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:698) at org.apache.spark.storage.BlockManager.put(BlockManager.scala:546) at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:95) at org.apache.spark.rdd.RDD.iterator(RDD.scala:227) at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:77) at org.apache.spark.rdd.RDD.iterator(RDD.scala:227) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:158) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) at org.apache.spark.scheduler.Task.run(Task.scala:51) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:679) -- SUREN HIRAMAN, VP TECHNOLOGY Velos Accelerating Machine Learning 440 NINTH AVENUE, 11TH FLOOR NEW YORK, NY 10001 O: (917) 525-2466 ext. 105 F: 646.349.4063 E: suren.hiraman@v <suren.hira...@sociocast.com>elos.io W: www.velos.io