Hello,

My name is Russell. My company is currently using Toree pyspark. However, I
encounter a problem that I couldn't figure it out:

here is the kernel setting:

{
  "language": "python",
  "display_name": "Apache Toree - PySpark",
  "env": {
    "__TOREE_SPARK_OPTS__": "--master yarn-client",
    "SPARK_HOME": "/usr/hdp/current/spark-client",
    "__TOREE_OPTS__": "",
    "DEFAULT_INTERPRETER": "PySpark",
    "PYTHONPATH": "/usr/hdp/current/spark-client/python:/usr/hdp/
current/spark-client/python/lib/py4j-0.9-src.zip",
    "PYTHON_EXEC": "python"
  },
  "argv": [
    "/usr/local/share/jupyter/kernels/apache_toree_pyspark/bin/run.sh",
    "--profile",
    "{connection_file}"
  ]
}

When I run a small pyspark program on jupyter note book, I got these error
message:

Error from python worker:
  /usr/bin/python: No module named pyspark
PYTHONPATH was:
  /data18/hadoop/yarn/local/filecache/10/spark-hdp-assembly.jar
java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at org.apache.spark.api.python.PythonWorkerFactory.startDaemon(
PythonWorkerFactory.scala:164)
at org.apache.spark.api.python.PythonWorkerFactory.createThroughDaemon(
PythonWorkerFactory.scala:87)
at org.apache.spark.api.python.PythonWorkerFactory.create(
PythonWorkerFactory.scala:63)
at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:134)
at org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:101)
at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:70)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
at java.util.concurrent.ThreadPoolExecutor.runWorker(
ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(
ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)


Could you give me some hints what would be the possible cause? I have tried
submitting python job using the command line in yarn-client mode, and there
is no problem getting the result. I think there must be some setting
problems.

Any help is greatly appreciated. Thanks ahead!

Regards,
Russell

Reply via email to