I think you need to increase ulimit to avoid 'too many open files' error,
then FileNotFoundException should disappear.


On Wed, Dec 18, 2013 at 11:56 AM, Nathan Kronenfeld <
[email protected]> wrote:

> Hi, Folks.
>
> I was wondering if anyone has encountered the following error before; I've
> been staring at this all day and can't figure out what it means.
>
> In my client log, I get:
> [INFO] 17 Dec 2013 22:31:09 - org.apache.spark.Logging$class - Lost TID
> 282 (task 3.0:63)
> [INFO] 17 Dec 2013 22:31:09 - org.apache.spark.Logging$class - Loss was
> due to java.io.FileNotFoundException
> java.io.FileNotFoundException:
> /tmp/spark-local-20131217192829-da6b/2e/shuffle_1_63_88 (Too many open
> files)
> at java.io.FileOutputStream.open(Native Method)
>  at java.io.FileOutputStream.<init>(FileOutputStream.java:221)
> at
> org.apache.spark.storage.DiskStore$DiskBlockObjectWriter.open(DiskStore.scala:58)
>  at
> org.apache.spark.storage.DiskStore$DiskBlockObjectWriter.write(DiskStore.scala:107)
> at
> org.apache.spark.scheduler.ShuffleMapTask$$anonfun$run$1.apply(ShuffleMapTask.scala:152)
>  at
> org.apache.spark.scheduler.ShuffleMapTask$$anonfun$run$1.apply(ShuffleMapTask.scala:149)
> at scala.collection.Iterator$class.foreach(Iterator.scala:772)
>  at
> scala.collection.JavaConversions$JMapWrapperLike$$anon$2.foreach(JavaConversions.scala:781)
> at org.apache.spark.scheduler.ShuffleMapTask.run(ShuffleMapTask.scala:149)
>  at org.apache.spark.scheduler.ShuffleMapTask.run(ShuffleMapTask.scala:88)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:158)
>  at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  at java.lang.Thread.run(Thread.java:744)
>
> In the worker log, I see
>
> 13/12/17 19:31:07 INFO executor.Executor: Running task ID 282
> 13/12/17 19:31:07 INFO rdd.HadoopRDD: Input split:
> hdfs://dataset/part-00027:0+2311666
> 13/12/17 19:31:07 INFO spark.CacheManager: Computing partition
> org.apache.spark.rdd.HadoopPartition@6b4
> 13/12/17 19:31:07 INFO spark.CacheManager: Computing partition
> org.apache.spark.rdd.HadoopPartition@6af
> 13/12/17 19:31:07 INFO rdd.HadoopRDD: Input split:
> hdfs://dataset/part-00035:0+2311720
> 13/12/17 19:31:07 INFO rdd.HadoopRDD: Input split:
> hdfs://dataset/part-00030:0+2311094
>
> Any clue what this means?
>
> I've checked open files on the worker node while this task is going (by
> running lsof | wc -l every 5 seconds) and I don't even see a blip - it
> looks nice and steady, with no problems.
>
>
> --
> Nathan Kronenfeld
> Senior Visualization Developer
> Oculus Info Inc
> 2 Berkeley Street, Suite 600,
> Toronto, Ontario M5A 4J5
> Phone:  +1-416-203-3003 x 238
> Email:  [email protected]
>

Reply via email to