Any clues? This looks like a bug, but I can't report it without more
precise information.


On Tue, Jul 29, 2014 at 9:56 PM, Nick Chammas <nicholas.cham...@gmail.com>
wrote:

> I’m in the PySpark shell and I’m trying to do this:
>
> a = 
> sc.textFile('s3n://path-to-handful-of-very-large-files-totalling-1tb/*.json', 
> minPartitions=sc.defaultParallelism * 3).cache()
> a.map(lambda x: len(x)).max()
>
> My job dies with the following:
>
> 14/07/30 01:46:28 WARN TaskSetManager: Loss was due to 
> org.apache.spark.api.python.PythonException
> org.apache.spark.api.python.PythonException: Traceback (most recent call 
> last):
>   File "/root/spark/python/pyspark/worker.py", line 73, in main
>     command = pickleSer._read_with_length(infile)
>   File "/root/spark/python/pyspark/serializers.py", line 142, in 
> _read_with_length
>     length = read_int(stream)
>   File "/root/spark/python/pyspark/serializers.py", line 337, in read_int
>     raise EOFError
> EOFError
>
>     at org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:115)
>     at 
> org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:145)
>     at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:78)
>     at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>     at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>     at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
>     at org.apache.spark.scheduler.Task.run(Task.scala:51)
>     at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183)
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>     at java.lang.Thread.run(Thread.java:745)
> 14/07/30 01:46:29 ERROR TaskSchedulerImpl: Lost executor 19 on 
> ip-10-190-171-217.ec2.internal: remote Akka client disassociated
>
> How do I debug this? I’m using 1.0.2-rc1 deployed to EC2.
>
> Nick
> ​
>
> ------------------------------
> View this message in context: How do you debug a PythonException?
> <http://apache-spark-user-list.1001560.n3.nabble.com/How-do-you-debug-a-PythonException-tp10906.html>
> Sent from the Apache Spark User List mailing list archive
> <http://apache-spark-user-list.1001560.n3.nabble.com/> at Nabble.com.
>

Reply via email to