On Fri, Aug 8, 2014 at 9:12 AM, Baoqiang Cao <bqcaom...@gmail.com> wrote: > Hi There > > I ran into a problem and can’t find a solution. > > I was running bin/pyspark < ../python/wordcount.py
you could use bin/spark-submit ../python/wordcount.py > The wordcount.py is here: > > ======================================== > import sys > from operator import add > > from pyspark import SparkContext > > datafile = '/mnt/data/m1.txt' > > sc = SparkContext() > outfile = datafile + '.freq' > lines = sc.textFile(datafile, 1) > counts = lines.flatMap(lambda x: x.split(' ')) \ > .map(lambda x: (x, 1)) \ > .reduceByKey(add) > output = counts.collect() > > outf = open(outfile, 'w') > > for (word, count) in output: > outf.write(word.encode('utf-8') + '\t' + str(count) + '\n') > outf.close() > ======================================== > > > The error message is here: > > 14/08/08 16:01:59 ERROR Executor: Exception in task 0.0 in stage 1.0 (TID 0) > java.io.FileNotFoundException: > /tmp/spark-local-20140808160150-d36b/12/shuffle_0_0_468 (Too many open > files) This message means that the Spark (JVM) had reach the max number of open files, there are fd leak some where, unfortunately I can not reproduce this problem. What is the version of Spark? > at java.io.FileOutputStream.open(Native Method) > at java.io.FileOutputStream.<init>(FileOutputStream.java:221) > at > org.apache.spark.storage.DiskBlockObjectWriter.open(BlockObjectWriter.scala:107) > at > org.apache.spark.storage.DiskBlockObjectWriter.write(BlockObjectWriter.scala:175) > at > org.apache.spark.shuffle.hash.HashShuffleWriter$$anonfun$write$1.apply(HashShuffleWriter.scala:67) > at > org.apache.spark.shuffle.hash.HashShuffleWriter$$anonfun$write$1.apply(HashShuffleWriter.scala:65) > at scala.collection.Iterator$class.foreach(Iterator.scala:727) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) > at > org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:65) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:54) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > > > The m1.txt is about 4G, and I have >120GB Ram and used -Xmx120GB. It is on > Ubuntu. Any help please? > > Best > Baoqiang Cao > Blog: http://baoqiang.org > Email: bqcaom...@gmail.com > > > > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org