Hi, Although SPARK_HOME correctly set and build Z with -Dpyspark I still experience local class not compatible error when I running any pyspark in Z. Job was correctly send and processed on cluster but the error seems thrown after final stage (Python RDD deserialization?) I followed instruction from Jongyul Lee but it still not effective at all. Any workarounds?
-----Original Message----- 보낸사람:"Albert Yoon" <yoon...@kanizsalab.com> 받는사람:"Jongyoul Lee" <jongy...@gmail.com> 날짜: 2015.07.23 오후 02:52:13 제목: Re: Local class incompatible? Hi, I'd build Z with pyspark as like below after fresh git clone: mvn clean install -Pspark-1.4 -Ppyspark -Dhadoop.version=2.6.0 -Phadoop-2.6 -DskipTests mvn clean install -Pspark-1.4 -Ppyspark -Dhadoop.version=2.7.0 -Phadoop-2.6 -DskipTests and then setup environment values master spark://analytics-master:7077 spark.home /home/kanizsa.lab/spark/spark-1.4.1-bin-hadoop2.6 but all attempts give me the same error when using pyspark: Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 9, 10.32.10.97): java.io.InvalidClassException: org.apache.spark.api.python.PythonRDD; local class incompatible: stream classdesc serialVersionUID = 1521627685947625661, local class serialVersionUID = -2629548733386970437 at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:621) at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1623) at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:69) at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:95) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:58) at org.apache.spark.scheduler.Task.run(Task.scala:70) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Exception in Z interpreter Log file shown like below: INFO [2015-07-23 14:43:15,127] ({sparkDriver-akka.actor.default-dispatcher-19} Logging.scala[logInfo]:59) - Starting task 0.0 in stage 6.0 (TID 606, 10.32.10.97, ANY, 1567 bytes) INFO [2015-07-23 14:43:15,128] ({sparkDriver-akka.actor.default-dispatcher-19} Logging.scala[logInfo]:59) - Starting task 1.0 in stage 6.0 (TID 607, 10.32.10.97, ANY, 1567 bytes) INFO [2015-07-23 14:43:15,154] ({sparkDriver-akka.actor.default-dispatcher-19} Logging.scala[logInfo]:59) - Added broadcast_7_piece0 in memory on 10.32.10.97:39254 (size: 3.5 KB, free: 265.1 MB) INFO [2015-07-23 14:43:15,155] ({sparkDriver-akka.actor.default-dispatcher-19} Logging.scala[logInfo]:59) - Added broadcast_7_piece0 in memory on 10.32.10.97:46622 (size: 3.5 KB, free: 265.1 MB) INFO [2015-07-23 14:43:15,173] ({sparkDriver-akka.actor.default-dispatcher-19} Logging.scala[logInfo]:59) - Starting task 2.0 in stage 6.0 (TID 608, 10.32.10.97, ANY, 1567 bytes) WARN [2015-07-23 14:43:15,178] ({task-result-getter-2} Logging.scala[logWarning]:71) - Lost task 1.0 in stage 6.0 (TID 607, 10.32.10.97): java.io.InvalidClassException: org.apache.spark.api.python.PythonRDD; local class incompatible: stream classdesc serialVersionUID = 1521627685947625661, local class serialVersionUID = -2629548733386970437 at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:621) at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1623) at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:69) at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:95) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:58) at org.apache.spark.scheduler.Task.run(Task.scala:70) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) INFO [2015-07-23 14:43:15,179] ({sparkDriver-akka.actor.default-dispatcher-19} Logging.scala[logInfo]:59) - Starting task 1.1 in stage 6.0 (TID 609, 10.32.10.97, ANY, 1567 bytes) INFO [2015-07-23 14:43:15,180] ({task-result-getter-1} Logging.scala[logInfo]:59) - Lost task 0.0 in stage 6.0 (TID 606) on executor 10.32.10.97: java.io.InvalidClassException (org.apache.spark.api.python.PythonRDD; local class incompatible: stream classdesc serialVersionUID = 1521627685947625661, local class serialVersionUID = -2629548733386970437) [duplicate 1] INFO [2015-07-23 14:43:15,181] ({sparkDriver-akka.actor.default-dispatcher-19} Logging.scala[logInfo]:59) - Starting task 0.1 in stage 6.0 (TID 610, 10.32.10.97, ANY, 1567 bytes) INFO [2015-07-23 14:43:15,181] ({task-result-getter-3} Logging.scala[logInfo]:59) - Lost task 2.0 in stage 6.0 (TID 608) on executor 10.32.10.97: java.io.InvalidClassException (org.apache.spark.api.python.PythonRDD; local class incompatible: stream classdesc serialVersionUID = 1521627685947625661, local class serialVersionUID = -2629548733386970437) [duplicate 2] INFO [2015-07-23 14:43:15,182] ({sparkDriver-akka.actor.default-dispatcher-19} Logging.scala[logInfo]:59) - Starting task 2.1 in stage 6.0 (TID 611, 10.32.10.97, ANY, 1567 bytes) INFO [2015-07-23 14:43:15,183] ({task-result-getter-0} Logging.scala[logInfo]:59) - Lost task 1.1 in stage 6.0 (TID 609) on executor 10.32.10.97: java.io.InvalidClassException (org.apache.spark.api.python.PythonRDD; local class incompatible: stream classdesc serialVersionUID = 1521627685947625661, local class serialVersionUID = -2629548733386970437) [duplicate 3] INFO [2015-07-23 14:43:15,185] ({sparkDriver-akka.actor.default-dispatcher-19} Logging.scala[logInfo]:59) - Starting task 1.2 in stage 6.0 (TID 612, 10.32.10.97, ANY, 1567 bytes) INFO [2015-07-23 14:43:15,185] ({task-result-getter-2} Logging.scala[logInfo]:59) - Lost task 0.1 in stage 6.0 (TID 610) on executor 10.32.10.97: java.io.InvalidClassException (org.apache.spark.api.python.PythonRDD; local class incompatible: stream classdesc serialVersionUID = 1521627685947625661, local class serialVersionUID = -2629548733386970437) [duplicate 4] INFO [2015-07-23 14:43:15,186] ({sparkDriver-akka.actor.default-dispatcher-19} Logging.scala[logInfo]:59) - Starting task 0.2 in stage 6.0 (TID 613, 10.32.10.97, ANY, 1567 bytes) INFO [2015-07-23 14:43:15,187] ({task-result-getter-1} Logging.scala[logInfo]:59) - Lost task 2.1 in stage 6.0 (TID 611) on executor 10.32.10.97: java.io.InvalidClassException (org.apache.spark.api.python.PythonRDD; local class incompatible: stream classdesc serialVersionUID = 1521627685947625661, local class serialVersionUID = -2629548733386970437) [duplicate 5] INFO [2015-07-23 14:43:15,189] ({sparkDriver-akka.actor.default-dispatcher-19} Logging.scala[logInfo]:59) - Starting task 2.2 in stage 6.0 (TID 614, 10.32.10.97, ANY, 1567 bytes) INFO [2015-07-23 14:43:15,189] ({task-result-getter-3} Logging.scala[logInfo]:59) - Lost task 1.2 in stage 6.0 (TID 612) on executor 10.32.10.97: java.io.InvalidClassException (org.apache.spark.api.python.PythonRDD; local class incompatible: stream classdesc serialVersionUID = 1521627685947625661, local class serialVersionUID = -2629548733386970437) [duplicate 6] INFO [2015-07-23 14:43:15,190] ({sparkDriver-akka.actor.default-dispatcher-19} Logging.scala[logInfo]:59) - Starting task 1.3 in stage 6.0 (TID 615, 10.32.10.97, ANY, 1567 bytes) INFO [2015-07-23 14:43:15,190] ({task-result-getter-0} Logging.scala[logInfo]:59) - Lost task 0.2 in stage 6.0 (TID 613) on executor 10.32.10.97: java.io.InvalidClassException (org.apache.spark.api.python.PythonRDD; local class incompatible: stream classdesc serialVersionUID = 1521627685947625661, local class serialVersionUID = -2629548733386970437) [duplicate 7] INFO [2015-07-23 14:43:15,192] ({sparkDriver-akka.actor.default-dispatcher-19} Logging.scala[logInfo]:59) - Starting task 0.3 in stage 6.0 (TID 616, 10.32.10.97, ANY, 1567 bytes) INFO [2015-07-23 14:43:15,192] ({task-result-getter-2} Logging.scala[logInfo]:59) - Lost task 2.2 in stage 6.0 (TID 614) on executor 10.32.10.97: java.io.InvalidClassException (org.apache.spark.api.python.PythonRDD; local class incompatible: stream classdesc serialVersionUID = 1521627685947625661, local class serialVersionUID = -2629548733386970437) [duplicate 8] INFO [2015-07-23 14:43:15,193] ({sparkDriver-akka.actor.default-dispatcher-19} Logging.scala[logInfo]:59) - Starting task 2.3 in stage 6.0 (TID 617, 10.32.10.97, ANY, 1567 bytes) INFO [2015-07-23 14:43:15,193] ({task-result-getter-1} Logging.scala[logInfo]:59) - Lost task 1.3 in stage 6.0 (TID 615) on executor 10.32.10.97: java.io.InvalidClassException (org.apache.spark.api.python.PythonRDD; local class incompatible: stream classdesc serialVersionUID = 1521627685947625661, local class serialVersionUID = -2629548733386970437) [duplicate 9] ERROR [2015-07-23 14:43:15,194] ({task-result-getter-1} Logging.scala[logError]:75) - Task 1 in stage 6.0 failed 4 times; aborting job INFO [2015-07-23 14:43:15,197] ({task-result-getter-3} Logging.scala[logInfo]:59) - Lost task 0.3 in stage 6.0 (TID 616) on executor 10.32.10.97: java.io.InvalidClassException (org.apache.spark.api.python.PythonRDD; local class incompatible: stream classdesc serialVersionUID = 1521627685947625661, local class serialVersionUID = -2629548733386970437) [duplicate 10] INFO [2015-07-23 14:43:15,202] ({task-result-getter-0} Logging.scala[logInfo]:59) - Lost task 2.3 in stage 6.0 (TID 617) on executor 10.32.10.97: java.io.InvalidClassException (org.apache.spark.api.python.PythonRDD; local class incompatible: stream classdesc serialVersionUID = 1521627685947625661, local class serialVersionUID = -2629548733386970437) [duplicate 11] INFO [2015-07-23 14:43:15,202] ({task-result-getter-0} Logging.scala[logInfo]:59) - Removed TaskSet 6.0, whose tasks have all completed, from pool default INFO [2015-07-23 14:43:15,203] ({dag-scheduler-event-loop} Logging.scala[logInfo]:59) - Cancelling stage 6 INFO [2015-07-23 14:43:15,205] ({dag-scheduler-event-loop} Logging.scala[logInfo]:59) - ResultStage 6 (count at <string>:2) failed in 0.078 s INFO [2015-07-23 14:43:15,206] ({Thread-58} Logging.scala[logInfo]:59) - Job 3 failed: count at <string>:2, took 0.093380 s INFO [2015-07-23 14:43:15,216] ({pool-2-thread-4} SchedulerFactory.java[jobFinished]:138) - Job remoteInterpretJob_1437630194249 finished by scheduler interpreter_1373012994 -----Original Message----- From: "Jongyoul Lee"<jongy...@gmail.com> To: "users@zeppelin.incubator.apache.org"<users@zeppelin.incubator.apache.org>; "Albert Yoon"<yoon...@kanizsalab.com>; Cc: Sent: 2015-07-23 (목) 10:39:03 Subject: Re: Local class incompatible? Hi, You should add -Ppyspark when you want to use Z with pyspark. Regards,JL On Thursday, July 23, 2015, Albert Yoon <yoon...@kanizsalab.com> wrote: Hi, I have a test-running zeppelin connected to my spark cluster and cluster setup like this: Spark: spark-1.4.1-bin-hadoop2.6 Hadoop: hadoop-2.7.1 Zeppelin installed as: mvn clean install -Pspark-1.4 -Dhadoop.version=2.6.0 -Phadoop-2.6 -DskipTests and ran a test code like this: val textFile = sc.textFile("hdfs://analytics-master:9000/user/kanizsa.lab/gutenberg") print textFile.count() val textFile = sc.textFile("README.md") print textFile.count() this code works well with scala interpreter but I'd encountered error when executing similar code as %pyspark. I'd also ran a code in pyspark command line shell but I works like a charm. no error like below: How to resolve this error? It seems something inside of zeppelin not compatible with my spark cluster... Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 5.0 failed 4 times, most recent failure: Lost task 1.3 in stage 5.0 (TID 52, 10.32.10.97): java.io.InvalidClassException: org.apache.spark.api.python.PythonRDD; local class incompatible: stream classdesc serialVersionUID = 1521627685947625661, local class serialVersionUID = -2629548733386970437 at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:621) at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1623) at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:69) at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:95) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:58) at org.apache.spark.scheduler.Task.run(Task.scala:70) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1266) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1257) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1256) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1256) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:730) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1450) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1411) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) (<class 'py4j.protocol.Py4JJavaError'>, Py4JJavaError(u'An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.\n', JavaObject id=o93), <traceback object at 0x1bde290>) -- 이종열, Jongyoul Lee, 李宗烈 http://madeng.net