Thanks moon, Now It works like a charm. thank you for build a great software! And one more questions, Can I share my setup and settings experience in github documentation or zeppelin homepages? The working solution you have told me was not mentioned at all in github so newbies like me can be confused. I can also provide some documentation about secured nginx reverse proxy setup for zeppelin who feel headache to make zetty as HTTPS/Basic auth. Can you give me a hint to how to contribute as document? -----Original Message----- From: "Albert Yoon"<yoon...@kanizsalab.com> To: "moon soo Lee"<m...@apache.org>; Cc: "users@zeppelin. incubator. apache. org"<users@zeppelin.incubator.apache.org>; Sent: 2015-07-25 (토) 11:03:12 Subject: Re: Local class incompatible? -----Original Message----- From: "moon soo Lee"<m...@apache.org> To: <users@zeppelin.incubator.apache.org>; "Albert Yoon"<yoon...@kanizsalab.com>; Cc: Sent: 2015-07-25 (토) 06:48:18 Subject: Re: Local class incompatible? Hi, Could you try build Zeppelin with -Dspark.version=1.4.1 ? mvn clean install -Pspark-1.4 -Dspark.version=1.4.1 -Ppyspark -Dhadoop.version=2.6.0 -Phadoop-2.6 -DskipTests Thanks,moon
On Fri, Jul 24, 2015 at 1:45 PM Albert Yoon <yoon...@kanizsalab.com> wrote: Hi, Although SPARK_HOME correctly set and build Z with -Dpyspark I still experience local class not compatible error when I running any pyspark in Z. Job was correctly send and processed on cluster but the error seems thrown after final stage (Python RDD deserialization?) I followed instruction from Jongyul Lee but it still not effective at all. Any workarounds? -----Original Message----- 보낸사람:"Albert Yoon" <yoon...@kanizsalab.com> 받는사람:"Jongyoul Lee" <jongy...@gmail.com> 날짜: 2015.07.23 오후 02:52:13 제목: Re: Local class incompatible? Hi, I'd build Z with pyspark as like below after fresh git clone: mvn clean install -Pspark-1.4 -Ppyspark -Dhadoop.version=2.6.0 -Phadoop-2.6 -DskipTests mvn clean install -Pspark-1.4 -Ppyspark -Dhadoop.version=2.7.0 -Phadoop-2.6 -DskipTests and then setup environment values master spark://analytics-master:7077 spark.home /home/kanizsa.lab/spark/spark-1.4.1-bin-hadoop2.6 but all attempts give me the same error when using pyspark: Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 9, 10.32.10.97): java.io.InvalidClassException: org.apache.spark.api.python.PythonRDD; local class incompatible: stream classdesc serialVersionUID = 1521627685947625661, local class serialVersionUID = -2629548733386970437 at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:621) at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1623) at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:69) at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:95) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:58) at org.apache.spark.scheduler.Task.run(Task.scala:70) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Exception in Z interpreter Log file shown like below: INFO [2015-07-23 14:43:15,127] ({sparkDriver-akka.actor.default-dispatcher-19} Logging.scala[logInfo]:59) - Starting task 0.0 in stage 6.0 (TID 606, 10.32.10.97, ANY, 1567 bytes) INFO [2015-07-23 14:43:15,128] ({sparkDriver-akka.actor.default-dispatcher-19} Logging.scala[logInfo]:59) - Starting task 1.0 in stage 6.0 (TID 607, 10.32.10.97, ANY, 1567 bytes) INFO [2015-07-23 14:43:15,154] ({sparkDriver-akka.actor.default-dispatcher-19} Logging.scala[logInfo]:59) - Added broadcast_7_piece0 in memory on 10.32.10.97:39254 (size: 3.5 KB, free: 265.1 MB) INFO [2015-07-23 14:43:15,155] ({sparkDriver-akka.actor.default-dispatcher-19} Logging.scala[logInfo]:59) - Added broadcast_7_piece0 in memory on 10.32.10.97:46622 (size: 3.5 KB, free: 265.1 MB) INFO [2015-07-23 14:43:15,173] ({sparkDriver-akka.actor.default-dispatcher-19} Logging.scala[logInfo]:59) - Starting task 2.0 in stage 6.0 (TID 608, 10.32.10.97, ANY, 1567 bytes) WARN [2015-07-23 14:43:15,178] ({task-result-getter-2} Logging.scala[logWarning]:71) - Lost task 1.0 in stage 6.0 (TID 607, 10.32.10.97): java.io.InvalidClassException: org.apache.spark.api.python.PythonRDD; local class incompatible: stream classdesc serialVersionUID = 1521627685947625661, local class serialVersionUID = -2629548733386970437 at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:621) at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1623) at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:69) at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:95) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:58) at org.apache.spark.scheduler.Task.run(Task.scala:70) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) INFO [2015-07-23 14:43:15,179] ({sparkDriver-akka.actor.default-dispatcher-19} Logging.scala[logInfo]:59) - Starting task 1.1 in stage 6.0 (TID 609, 10.32.10.97, ANY, 1567 bytes) INFO [2015-07-23 14:43:15,180] ({task-result-getter-1} Logging.scala[logInfo]:59) - Lost task 0.0 in stage 6.0 (TID 606) on executor 10.32.10.97: java.io.InvalidClassException (org.apache.spark.api.python.PythonRDD; local class incompatible: stream classdesc serialVersionUID = 1521627685947625661, local class serialVersionUID = -2629548733386970437) [duplicate 1] INFO [2015-07-23 14:43:15,181] ({sparkDriver-akka.actor.default-dispatcher-19} Logging.scala[logInfo]:59) - Starting task 0.1 in stage 6.0 (TID 610, 10.32.10.97, ANY, 1567 bytes) INFO [2015-07-23 14:43:15,181] ({task-result-getter-3} Logging.scala[logInfo]:59) - Lost task 2.0 in stage 6.0 (TID 608) on executor 10.32.10.97: java.io.InvalidClassException (org.apache.spark.api.python.PythonRDD; local class incompatible: stream classdesc serialVersionUID = 1521627685947625661, local class serialVersionUID = -2629548733386970437) [duplicate 2] INFO [2015-07-23 14:43:15,182] ({sparkDriver-akka.actor.default-dispatcher-19} Logging.scala[logInfo]:59) - Starting task 2.1 in stage 6.0 (TID 611, 10.32.10.97, ANY, 1567 bytes) INFO [2015-07-23 14:43:15,183] ({task-result-getter-0} Logging.scala[logInfo]:59) - Lost task 1.1 in stage 6.0 (TID 609) on executor 10.32.10.97: java.io.InvalidClassException (org.apache.spark.api.python.PythonRDD; local class incompatible: stream classdesc serialVersionUID = 1521627685947625661, local class serialVersionUID = -2629548733386970437) [duplicate 3] INFO [2015-07-23 14:43:15,185] ({sparkDriver-akka.actor.default-dispatcher-19} Logging.scala[logInfo]:59) - Starting task 1.2 in stage 6.0 (TID 612, 10.32.10.97, ANY, 1567 bytes) INFO [2015-07-23 14:43:15,185] ({task-result-getter-2} Logging.scala[logInfo]:59) - Lost task 0.1 in stage 6.0 (TID 610) on executor 10.32.10.97: java.io.InvalidClassException (org.apache.spark.api.python.PythonRDD; local class incompatible: stream classdesc serialVersionUID = 1521627685947625661, local class serialVersionUID = -2629548733386970437) [duplicate 4] INFO [2015-07-23 14:43:15,186] ({sparkDriver-akka.actor.default-dispatcher-19} Logging.scala[logInfo]:59) - Starting task 0.2 in stage 6.0 (TID 613, 10.32.10.97, ANY, 1567 bytes) INFO [2015-07-23 14:43:15,187] ({task-result-getter-1} Logging.scala[logInfo]:59) - Lost task 2.1 in stage 6.0 (TID 611) on executor 10.32.10.97: java.io.InvalidClassException (org.apache.spark.api.python.PythonRDD; local class incompatible: stream classdesc serialVersionUID = 1521627685947625661, local class serialVersionUID = -2629548733386970437) [duplicate 5] INFO [2015-07-23 14:43:15,189] ({sparkDriver-akka.actor.default-dispatcher-19} Logging.scala[logInfo]:59) - Starting task 2.2 in stage 6.0 (TID 614, 10.32.10.97, ANY, 1567 bytes) INFO [2015-07-23 14:43:15,189] ({task-result-getter-3} Logging.scala[logInfo]:59) - Lost task 1.2 in stage 6.0 (TID 612) on executor 10.32.10.97: java.io.InvalidClassException (org.apache.spark.api.python.PythonRDD; local class incompatible: stream classdesc serialVersionUID = 1521627685947625661, local class serialVersionUID = -2629548733386970437) [duplicate 6] INFO [2015-07-23 14:43:15,190] ({sparkDriver-akka.actor.default-dispatcher-19} Logging.scala[logInfo]:59) - Starting task 1.3 in stage 6.0 (TID 615, 10.32.10.97, ANY, 1567 bytes) INFO [2015-07-23 14:43:15,190] ({task-result-getter-0} Logging.scala[logInfo]:59) - Lost task 0.2 in stage 6.0 (TID 613) on executor 10.32.10.97: java.io.InvalidClassException (org.apache.spark.api.python.PythonRDD; local class incompatible: stream classdesc serialVersionUID = 1521627685947625661, local class serialVersionUID = -2629548733386970437) [duplicate 7] INFO [2015-07-23 14:43:15,192] ({sparkDriver-akka.actor.default-dispatcher-19} Logging.scala[logInfo]:59) - Starting task 0.3 in stage 6.0 (TID 616, 10.32.10.97, ANY, 1567 bytes) INFO [2015-07-23 14:43:15,192] ({task-result-getter-2} Logging.scala[logInfo]:59) - Lost task 2.2 in stage 6.0 (TID 614) on executor 10.32.10.97: java.io.InvalidClassException (org.apache.spark.api.python.PythonRDD; local class incompatible: stream classdesc serialVersionUID = 1521627685947625661, local class serialVersionUID = -2629548733386970437) [duplicate 8] INFO [2015-07-23 14:43:15,193] ({sparkDriver-akka.actor.default-dispatcher-19} Logging.scala[logInfo]:59) - Starting task 2.3 in stage 6.0 (TID 617, 10.32.10.97, ANY, 1567 bytes) INFO [2015-07-23 14:43:15,193] ({task-result-getter-1} Logging.scala[logInfo]:59) - Lost task 1.3 in stage 6.0 (TID 615) on executor 10.32.10.97: java.io.InvalidClassException (org.apache.spark.api.python.PythonRDD; local class incompatible: stream classdesc serialVersionUID = 1521627685947625661, local class serialVersionUID = -2629548733386970437) [duplicate 9] ERROR [2015-07-23 14:43:15,194] ({task-result-getter-1} Logging.scala[logError]:75) - Task 1 in stage 6.0 failed 4 times; aborting job INFO [2015-07-23 14:43:15,197] ({task-result-getter-3} Logging.scala[logInfo]:59) - Lost task 0.3 in stage 6.0 (TID 616) on executor 10.32.10.97: java.io.InvalidClassException (org.apache.spark.api.python.PythonRDD; local class incompatible: stream classdesc serialVersionUID = 1521627685947625661, local class serialVersionUID = -2629548733386970437) [duplicate 10] INFO [2015-07-23 14:43:15,202] ({task-result-getter-0} Logging.scala[logInfo]:59) - Lost task 2.3 in stage 6.0 (TID 617) on executor 10.32.10.97: java.io.InvalidClassException (org.apache.spark.api.python.PythonRDD; local class incompatible: stream classdesc serialVersionUID = 1521627685947625661, local class serialVersionUID = -2629548733386970437) [duplicate 11] INFO [2015-07-23 14:43:15,202] ({task-result-getter-0} Logging.scala[logInfo]:59) - Removed TaskSet 6.0, whose tasks have all completed, from pool default INFO [2015-07-23 14:43:15,203] ({dag-scheduler-event-loop} Logging.scala[logInfo]:59) - Cancelling stage 6 INFO [2015-07-23 14:43:15,205] ({dag-scheduler-event-loop} Logging.scala[logInfo]:59) - ResultStage 6 (count at <string>:2) failed in 0.078 s INFO [2015-07-23 14:43:15,206] ({Thread-58} Logging.scala[logInfo]:59) - Job 3 failed: count at <string>:2, took 0.093380 s INFO [2015-07-23 14:43:15,216] ({pool-2-thread-4} SchedulerFactory.java[jobFinished]:138) - Job remoteInterpretJob_1437630194249 finished by scheduler interpreter_1373012994 -----Original Message----- From: "Jongyoul Lee"<jongy...@gmail.com> To: "users@zeppelin.incubator.apache.org"<users@zeppelin.incubator.apache.org>; "Albert Yoon"<yoon...@kanizsalab.com>; Cc: Sent: 2015-07-23 (목) 10:39:03 Subject: Re: Local class incompatible? Hi, You should add -Ppyspark when you want to use Z with pyspark. Regards,JL On Thursday, July 23, 2015, Albert Yoon <yoon...@kanizsalab.com> wrote: Hi, I have a test-running zeppelin connected to my spark cluster and cluster setup like this: Spark: spark-1.4.1-bin-hadoop2.6 Hadoop: hadoop-2.7.1 Zeppelin installed as: mvn clean install -Pspark-1.4 -Dhadoop.version=2.6.0 -Phadoop-2.6 -DskipTests and ran a test code like this: val textFile = sc.textFile("hdfs://analytics-master:9000/user/kanizsa.lab/gutenberg") print textFile.count() val textFile = sc.textFile("README.md") print textFile.count() this code works well with scala interpreter but I'd encountered error when executing similar code as %pyspark. I'd also ran a code in pyspark command line shell but I works like a charm. no error like below: How to resolve this error? It seems something inside of zeppelin not compatible with my spark cluster... Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 5.0 failed 4 times, most recent failure: Lost task 1.3 in stage 5.0 (TID 52, 10.32.10.97): java.io.InvalidClassException: org.apache.spark.api.python.PythonRDD; local class incompatible: stream classdesc serialVersionUID = 1521627685947625661, local class serialVersionUID = -2629548733386970437 at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:621) at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1623) at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:69) at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:95) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:58) at org.apache.spark.scheduler.Task.run(Task.scala:70) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1266) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1257) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1256) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1256) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:730) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1450) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1411) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) (<class 'py4j.protocol.Py4JJavaError'>, Py4JJavaError(u'An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.\n', JavaObject id=o93), <traceback object at 0x1bde290>) -- 이종열, Jongyoul Lee, 李宗烈 http://madeng.net