Hi,
I'm doing some data processing on pyspark, but I failed to reach JVM
in workers. Here is what I did:
$ bin/pyspark
>>> data = sc.parallelize(["123", "234"])
>>> numbers = data.map(lambda s:
>>> SparkContext._active_spark_context._jvm.java.lang.Integer.valueOf(s.strip()))
>>> numbers.collect()
I got,
Caused by: org.apache.spark.api.python.PythonException: Traceback
(most recent call last):
File
"/mnt/hgfs/lewis/Workspace/source-codes/spark/python/lib/pyspark.zip/pyspark/worker.py",
line 111, in main
process()
File
"/mnt/hgfs/lewis/Workspace/source-codes/spark/python/lib/pyspark.zip/pyspark/worker.py",
line 106, in process
serializer.dump_stream(func(split_index, iterator), outfile)
File
"/mnt/hgfs/lewis/Workspace/source-codes/spark/python/lib/pyspark.zip/pyspark/serializers.py",
line 263, in dump_stream
vs = list(itertools.islice(iterator, batch))
File "<stdin>", line 1, in <lambda>
AttributeError: 'NoneType' object has no attribute '_jvm'
at org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:138)
at org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:179)
at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:97)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:88)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
... 1 more
While _jvm at the driver end looks fine:
>>> SparkContext._active_spark_context._jvm.java.lang.Integer.valueOf("123".strip())
123
The program is trivial, I just wonder what is the right way to reach
JVM in python. Any help would be appreciated.
Thanks
--
Yizhi Liu
Senior Software Engineer / Data Mining
www.mvad.com, Shanghai, China
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]