PySpark : couldn't pickle object of type class T

Anoop Shiralige Thu, 11 Feb 2016 06:39:28 -0800

Hi All,

I am working with Spark 1.6.0 and pySpark shell specifically. I have an
JavaRDD[org.apache.avro.GenericRecord] which I have converted to pythonRDD
in the following way.


javaRDD = sc._jvm.java.package.loadJson("path to data", sc._jsc)
javaPython = sc._jvm.SerDe.javaToPython(javaRDD)
from pyspark.rdd import RDD
pythonRDD=RDD(javaPython,sc)

pythonRDD.first()

However everytime I am trying to call collect() or first() method on
pythonRDD I am getting the following error : 

16/02/11 06:19:19 ERROR python.PythonRunner: Python worker exited
unexpectedly (crashed)
org.apache.spark.api.python.PythonException: Traceback (most recent call
last):
  File
"/disk2/spark6/spark-1.6.0-bin-hadoop2.4/python/lib/pyspark.zip/pyspark/worker.py",
line 98, in main
    command = pickleSer._read_with_length(infile)
  File
"/disk2/spark6/spark-1.6.0-bin-hadoop2.4/python/lib/pyspark.zip/pyspark/serializers.py",
line 156, in _read_with_length
    length = read_int(stream)
  File
"/disk2/spark6/spark-1.6.0-bin-hadoop2.4/python/lib/pyspark.zip/pyspark/serializers.py",
line 545, in read_int
    raise EOFError
EOFError

        at
org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRDD.scala:166)
        at
org.apache.spark.api.python.PythonRunner$$anon$1.<init>(PythonRDD.scala:207)
        at
org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:125)
        at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:70)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
        at
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
        at org.apache.spark.scheduler.Task.run(Task.scala:89)
        at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:744)
Caused by: net.razorvine.pickle.PickleException: couldn't pickle object of
type class org.apache.avro.generic.GenericData$Record
        at net.razorvine.pickle.Pickler.save(Pickler.java:142)
        at net.razorvine.pickle.Pickler.put_arrayOfObjects(Pickler.java:493)
        at net.razorvine.pickle.Pickler.dispatch(Pickler.java:205)
        at net.razorvine.pickle.Pickler.save(Pickler.java:137)
        at net.razorvine.pickle.Pickler.dump(Pickler.java:107)
        at net.razorvine.pickle.Pickler.dumps(Pickler.java:92)
        at
org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.next(SerDeUtil.scala:121)
        at
org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.next(SerDeUtil.scala:110)
        at scala.collection.Iterator$class.foreach(Iterator.scala:727)
        at
org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.foreach(SerDeUtil.scala:110)
        at
org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:452)
        at
org.apache.spark.api.python.PythonRunner$WriterThread$$anonfun$run$3.apply(PythonRDD.scala:280)
        at
org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1741)
        at
org.apache.spark.api.python.PythonRunner$WriterThread.run(PythonRDD.scala:239)

Thanks for your time,
AnoopShiralige



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/PySpark-couldn-t-pickle-object-of-type-class-T-tp26204.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

PySpark : couldn't pickle object of type class T

Reply via email to