Hi, You should add -Ppyspark when you want to use Z with pyspark.
Regards, JL On Thursday, July 23, 2015, Albert Yoon <yoon...@kanizsalab.com> wrote: > Hi, I have a test-running zeppelin connected to my spark cluster and > cluster setup like this: > > > > *Spark*: spark-1.4.1-bin-hadoop2.6 > > *Hadoop*: hadoop-2.7.1 > > > > *Zeppelin installed as*: mvn clean install -Pspark-1.4 > -Dhadoop.version=2.6.0 -Phadoop-2.6 -DskipTests > > > > and ran a test code like this: > > > > val textFile = > sc.textFile("hdfs://analytics-master:9000/user/kanizsa.lab/gutenberg") > > print textFile.count() > > > > val textFile = sc.textFile("README.md") > > print textFile.count() > > > > this code works well with scala interpreter but I'd encountered error > when executing similar code as %pyspark. > > I'd also ran a code in pyspark command line shell but I works like a > charm. no error like below: > > > > How to resolve this error? It seems something inside of zeppelin not > compatible with my spark cluster... > > > > Py4JJavaError: An error occurred while calling > z:org.apache.spark.api.python.PythonRDD.collectAndServe. : > org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 > in stage 5.0 failed 4 times, most recent failure: Lost task 1.3 in stage > 5.0 (TID 52, 10.32.10.97): java.io.InvalidClassException: > org.apache.spark.api.python.PythonRDD; local class incompatible: stream > classdesc serialVersionUID = 1521627685947625661, local class > serialVersionUID = -2629548733386970437 at > java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:621) at > java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1623) at > java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518) at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) at > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993) at > java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918) at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) at > java.io.ObjectInputStream.readObject(ObjectInputStream.java:371) at > org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:69) > at > org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:95) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:58) at > org.apache.spark.scheduler.Task.run(Task.scala:70) at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) Driver stacktrace: at > org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1266) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1257) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1256) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at > org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1256) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730) > at scala.Option.foreach(Option.scala:236) at > org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:730) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1450) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1411) > at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) (<class > 'py4j.protocol.Py4JJavaError'>, Py4JJavaError(u'An error occurred while > calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.\n', > JavaObject id=o93), <traceback object at 0x1bde290>) > -- 이종열, Jongyoul Lee, 李宗烈 http://madeng.net