Hi, I am running my spark job on Yarn; using latest code from master branch..synced few days back. I see this IO Exception during shuffle(in resource manager logs). What could be wrong and how to debug it? I have seen this few times before; I was suspecting that this could side effect of memory pressure..but I could never figure out the root cause.
-- 14/07/02 07:34:45 WARN TaskSetManager: Loss was due to java.io.FileNotFoundException java.io.FileNotFoundException: /var/storage/sda3/nm-local/usercache/nit/appcache/application_1403208801430_0183/spark-local-20140702065054-388d/0e/shuffle_3_193_787 (No such file or directory) at java.io.FileOutputStream.open(Native Method) at java.io.FileOutputStream.<init>(FileOutputStream.java:221) at org.apache.spark.storage.DiskBlockObjectWriter.open(BlockObjectWriter.scala:116) at org.apache.spark.storage.DiskBlockObjectWriter.write(BlockObjectWriter.scala:177) at org.apache.spark.scheduler.ShuffleMapTask$$anonfun$runTask$1.apply(ShuffleMapTask.scala:161) at org.apache.spark.scheduler.ShuffleMapTask$$anonfun$runTask$1.apply(ShuffleMapTask.scala:158) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:158) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) at org.apache.spark.scheduler.Task.run(Task.scala:51) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) -- -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/java-io-FileNotFoundException-shuffle-tp8644.html Sent from the Apache Spark User List mailing list archive at Nabble.com.