Hi, Folks.

I was wondering if anyone has encountered the following error before; I've
been staring at this all day and can't figure out what it means.

In my client log, I get:
[INFO] 17 Dec 2013 22:31:09 - org.apache.spark.Logging$class - Lost TID 282
(task 3.0:63)
[INFO] 17 Dec 2013 22:31:09 - org.apache.spark.Logging$class - Loss was due
to java.io.FileNotFoundException
java.io.FileNotFoundException:
/tmp/spark-local-20131217192829-da6b/2e/shuffle_1_63_88 (Too many open
files)
at java.io.FileOutputStream.open(Native Method)
at java.io.FileOutputStream.<init>(FileOutputStream.java:221)
at
org.apache.spark.storage.DiskStore$DiskBlockObjectWriter.open(DiskStore.scala:58)
at
org.apache.spark.storage.DiskStore$DiskBlockObjectWriter.write(DiskStore.scala:107)
at
org.apache.spark.scheduler.ShuffleMapTask$$anonfun$run$1.apply(ShuffleMapTask.scala:152)
at
org.apache.spark.scheduler.ShuffleMapTask$$anonfun$run$1.apply(ShuffleMapTask.scala:149)
at scala.collection.Iterator$class.foreach(Iterator.scala:772)
at
scala.collection.JavaConversions$JMapWrapperLike$$anon$2.foreach(JavaConversions.scala:781)
at org.apache.spark.scheduler.ShuffleMapTask.run(ShuffleMapTask.scala:149)
at org.apache.spark.scheduler.ShuffleMapTask.run(ShuffleMapTask.scala:88)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:158)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)

In the worker log, I see

13/12/17 19:31:07 INFO executor.Executor: Running task ID 282
13/12/17 19:31:07 INFO rdd.HadoopRDD: Input split:
hdfs://dataset/part-00027:0+2311666
13/12/17 19:31:07 INFO spark.CacheManager: Computing partition
org.apache.spark.rdd.HadoopPartition@6b4
13/12/17 19:31:07 INFO spark.CacheManager: Computing partition
org.apache.spark.rdd.HadoopPartition@6af
13/12/17 19:31:07 INFO rdd.HadoopRDD: Input split:
hdfs://dataset/part-00035:0+2311720
13/12/17 19:31:07 INFO rdd.HadoopRDD: Input split:
hdfs://dataset/part-00030:0+2311094

Any clue what this means?

I've checked open files on the worker node while this task is going (by
running lsof | wc -l every 5 seconds) and I don't even see a blip - it
looks nice and steady, with no problems.


-- 
Nathan Kronenfeld
Senior Visualization Developer
Oculus Info Inc
2 Berkeley Street, Suite 600,
Toronto, Ontario M5A 4J5
Phone:  +1-416-203-3003 x 238
Email:  [email protected]

Reply via email to