Any clues? This looks like a bug, but I can't report it without more precise information.
On Tue, Jul 29, 2014 at 9:56 PM, Nick Chammas <nicholas.cham...@gmail.com> wrote: > I’m in the PySpark shell and I’m trying to do this: > > a = > sc.textFile('s3n://path-to-handful-of-very-large-files-totalling-1tb/*.json', > minPartitions=sc.defaultParallelism * 3).cache() > a.map(lambda x: len(x)).max() > > My job dies with the following: > > 14/07/30 01:46:28 WARN TaskSetManager: Loss was due to > org.apache.spark.api.python.PythonException > org.apache.spark.api.python.PythonException: Traceback (most recent call > last): > File "/root/spark/python/pyspark/worker.py", line 73, in main > command = pickleSer._read_with_length(infile) > File "/root/spark/python/pyspark/serializers.py", line 142, in > _read_with_length > length = read_int(stream) > File "/root/spark/python/pyspark/serializers.py", line 337, in read_int > raise EOFError > EOFError > > at org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:115) > at > org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:145) > at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:78) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111) > at org.apache.spark.scheduler.Task.run(Task.scala:51) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > 14/07/30 01:46:29 ERROR TaskSchedulerImpl: Lost executor 19 on > ip-10-190-171-217.ec2.internal: remote Akka client disassociated > > How do I debug this? I’m using 1.0.2-rc1 deployed to EC2. > > Nick > > > ------------------------------ > View this message in context: How do you debug a PythonException? > <http://apache-spark-user-list.1001560.n3.nabble.com/How-do-you-debug-a-PythonException-tp10906.html> > Sent from the Apache Spark User List mailing list archive > <http://apache-spark-user-list.1001560.n3.nabble.com/> at Nabble.com. >