Github user andrewor14 commented on the pull request:
https://github.com/apache/spark/pull/30#issuecomment-41356120
@sryza I have tested this on a standalone cluster with success. However, I
haven't been able to get it working on a CDH cluster. I tried building both
with maven and SBT (the latter of which clearly doesn't work yet), but neither
was fruitful.
More specifically, I did
```
mvn -Pyarn -Dhadoop.version=2.3.0-cdh5.0.0 -Dyarn.version=2.3.0-cdh5.0.0
-DskipTests clean package
MASTER=yarn-client bin/pyspark
```
and ran into
```
14/04/25 03:16:54 INFO CoarseGrainedExecutorBackend: Got assigned task 0
14/04/25 03:16:55 INFO Executor: Running task ID 0
14/04/25 03:16:56 ERROR Executor: Exception in task ID 0
java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at
org.apache.spark.api.python.PythonWorkerFactory.startDaemon(PythonWorkerFactory.scala:183)
at
org.apache.spark.api.python.PythonWorkerFactory.createThroughDaemon(PythonWorkerFactory.scala:55)
at
org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:42)
at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:97)
at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:57)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
at
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
at org.apache.spark.scheduler.Task.run(Task.scala:51)
at
org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:210)
at
org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:43)
at
org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:42)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at
org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:42)
at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:175)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
14/04/25 03:16:56 ERROR Executor: Uncaught exception in thread
Thread[stderr reader for python,5,main]
java.lang.NullPointerException
at
org.apache.spark.api.python.PythonWorkerFactory$$anon$3$$anonfun$run$3.apply$mcV$sp(PythonWorkerFactory.scala:171)
at
org.apache.spark.api.python.PythonWorkerFactory$$anon$3$$anonfun$run$3.apply(PythonWorkerFactory.scala:169)
at
org.apache.spark.api.python.PythonWorkerFactory$$anon$3$$anonfun$run$3.apply(PythonWorkerFactory.scala:169)
```
I will spend some time digging into what the NPE is, but in the mean time
do you see anything obvious that I'm missing?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---