I think you need to increase ulimit to avoid 'too many open files' error, then FileNotFoundException should disappear.
On Wed, Dec 18, 2013 at 11:56 AM, Nathan Kronenfeld < [email protected]> wrote: > Hi, Folks. > > I was wondering if anyone has encountered the following error before; I've > been staring at this all day and can't figure out what it means. > > In my client log, I get: > [INFO] 17 Dec 2013 22:31:09 - org.apache.spark.Logging$class - Lost TID > 282 (task 3.0:63) > [INFO] 17 Dec 2013 22:31:09 - org.apache.spark.Logging$class - Loss was > due to java.io.FileNotFoundException > java.io.FileNotFoundException: > /tmp/spark-local-20131217192829-da6b/2e/shuffle_1_63_88 (Too many open > files) > at java.io.FileOutputStream.open(Native Method) > at java.io.FileOutputStream.<init>(FileOutputStream.java:221) > at > org.apache.spark.storage.DiskStore$DiskBlockObjectWriter.open(DiskStore.scala:58) > at > org.apache.spark.storage.DiskStore$DiskBlockObjectWriter.write(DiskStore.scala:107) > at > org.apache.spark.scheduler.ShuffleMapTask$$anonfun$run$1.apply(ShuffleMapTask.scala:152) > at > org.apache.spark.scheduler.ShuffleMapTask$$anonfun$run$1.apply(ShuffleMapTask.scala:149) > at scala.collection.Iterator$class.foreach(Iterator.scala:772) > at > scala.collection.JavaConversions$JMapWrapperLike$$anon$2.foreach(JavaConversions.scala:781) > at org.apache.spark.scheduler.ShuffleMapTask.run(ShuffleMapTask.scala:149) > at org.apache.spark.scheduler.ShuffleMapTask.run(ShuffleMapTask.scala:88) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:158) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > > In the worker log, I see > > 13/12/17 19:31:07 INFO executor.Executor: Running task ID 282 > 13/12/17 19:31:07 INFO rdd.HadoopRDD: Input split: > hdfs://dataset/part-00027:0+2311666 > 13/12/17 19:31:07 INFO spark.CacheManager: Computing partition > org.apache.spark.rdd.HadoopPartition@6b4 > 13/12/17 19:31:07 INFO spark.CacheManager: Computing partition > org.apache.spark.rdd.HadoopPartition@6af > 13/12/17 19:31:07 INFO rdd.HadoopRDD: Input split: > hdfs://dataset/part-00035:0+2311720 > 13/12/17 19:31:07 INFO rdd.HadoopRDD: Input split: > hdfs://dataset/part-00030:0+2311094 > > Any clue what this means? > > I've checked open files on the worker node while this task is going (by > running lsof | wc -l every 5 seconds) and I don't even see a blip - it > looks nice and steady, with no problems. > > > -- > Nathan Kronenfeld > Senior Visualization Developer > Oculus Info Inc > 2 Berkeley Street, Suite 600, > Toronto, Ontario M5A 4J5 > Phone: +1-416-203-3003 x 238 > Email: [email protected] >
