Can you try by adding this interpreter setting on the interpreter page: property: spark.executorEnv.PYTHONPATH value: /usr/lib/spark/python/lib/py4j-0.8.2.1-src.zip:/usr/lib/spark/python/:/usr/lib/spark/python/lib/pyspark.zip:/usr/lib/hustler/lib/py
I think this should solve the problem..... On Wed, Dec 9, 2015 at 7:50 AM, Fengdong Yu <fengdo...@everstring.com> wrote: > what’s your master? yarn-client or local? > > Error : > > Py4JJavaError: An error occurred while calling o76.showString. : > org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 > in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage > 1.0 (TID 7, dn05.prod2.everstring.com): org.apache.spark.SparkException: > Error from python worker: /usr/local/bin/python: No module named pyspark > PYTHONPATH was: > /media/ebs15/hadoop/yarn/local/usercache/hdfs/filecache/1455/spark-assembly-1.5.2-hadoop2.6.0.jar > java.io.EOFException at > java.io.DataInputStream.readInt(DataInputStream.java:392) at > org.apache.spark.api.python.PythonWorkerFactory.startDaemon(PythonWorkerFactory.scala:163) > at > org.apache.spark.api.python.PythonWorkerFactory.createThroughDaemon(PythonWorkerFactory.scala:86) > at > org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:62) > at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:135) at > org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:101) at > org.apache.spark.sql.execution.BatchPythonEvaluation$$anonfun$doExecute$1.apply(python.scala:397) > at > org.apache.spark.sql.execution.BatchPythonEvaluation$$anonfun$doExecute$1.apply(python.scala:362) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$17.apply(RDD.scala:710) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$17.apply(RDD.scala:710) > at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300) at > org.apache.spark.rdd.RDD.iterator(RDD.scala:264) at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300) at > org.apache.spark.rdd.RDD.iterator(RDD.scala:264) at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300) at > org.apache.spark.rdd.RDD.iterator(RDD.scala:264) at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300) at > org.apache.spark.rdd.RDD.iterator(RDD.scala:264) at > org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at > org.apache.spark.scheduler.Task.run(Task.scala:88) at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > > > > On Dec 8, 2015, at 10:51 PM, moon soo Lee <m...@apache.org> wrote: > > I can run > > %spark > case class Data(name:String) > val data = sc.parallelize(Array(Data("hello"), Data("world"))).toDF > data.registerTempTable("test_table") > > %pyspark > from pyspark.sql.types import BooleanType > sqlContext.udf.register("is_empty",lambda x : True if not x else False, > BooleanType()) > > %pyspark > sqlContext.sql("select is_empty(name) as name from test_table limit > 10").show() > > without error. Can you share what kind of error do you see? > > Thanks, > moon > > On Tue, Dec 8, 2015 at 10:29 PM Fengdong Yu <fengdo...@everstring.com> > wrote: > >> Moon, >> >> I can run the same code on the pyspark shell. but failed on Zeppelin. >> >> >> >> >> On Dec 8, 2015, at 7:43 PM, moon soo Lee <m...@apache.org> wrote: >> >> Tried with 0.5.5-incubating release after adding SPARK_1_5_2 in >> spark/src/main/java/org/apache/zeppelin/spark/SparkVersion.java. >> >> My conf/zeppelin-env.sh has only SPARK_HOME that points spark 1.5.2 >> distribution. And i could able to run %pyspark without any problem. >> >> when you run >> >> System.getenv("PYTHONPATH") >> >> in the notebook, what do you see? can you check those files and dirs are >> exists? >> >> Thanks, >> moon >> >> On Tue, Dec 8, 2015 at 6:22 PM Fengdong Yu <fengdo...@everstring.com> >> wrote: >> >>> I tried. the same error now. >>> >>> I even tried remove spark.yarn.jar in interpreter.json, it still the >>> same error. >>> >>> >>> >>> On Dec 8, 2015, at 5:07 PM, moon soo Lee <leemoon...@gmail.com> wrote: >>> >>> Can you not try to set PYTHONPATH but only SPARK_HOME? >>> >>> Thanks, >>> moon >>> >>> >>> On 2015년 12월 8일 (화) at 오후 6:04 Amjad ALSHABANI <ashshab...@gmail.com> >>> wrote: >>> >>>> Hello, >>>> >>>> Are you sure that you ve installed the module pyspark. >>>> >>>> Please check your spark installation directory if you could see the >>>> python sub-directory >>>> >>>> Amjad >>>> On Dec 8, 2015 9:55 AM, "Fengdong Yu" <fengdo...@everstring.com> wrote: >>>> >>>>> Hi >>>>> >>>>> I am using Zeppelin-0.5.5 with Spark 1.5.2 >>>>> >>>>> It cannot find pyspark module. >>>>> >>>>> >>>>> Error from python worker: >>>>> /usr/local/bin/python: No module named pyspark >>>>> PYTHONPATH was: >>>>> >>>>> >>>>> >>>>> I’ve configured pyspark in zeppelin-env.sh: >>>>> >>>>> export >>>>> PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/lib/py4j-0.8.2.1-src.zip:$SPARK_HOME/python/lib/pyspark.zip >>>>> >>>>> >>>>> any others I skipped? Thanks >>>>> >>>>> >>>>> >>>>> >>> >