>>> sc._jvm.java.lang.Integer.valueOf("12") 12 FYI
On Mon, Sep 28, 2015 at 8:08 PM, YiZhi Liu <javeli...@gmail.com> wrote: > Hi, > > I'm doing some data processing on pyspark, but I failed to reach JVM > in workers. Here is what I did: > > $ bin/pyspark > >>> data = sc.parallelize(["123", "234"]) > >>> numbers = data.map(lambda s: > SparkContext._active_spark_context._jvm.java.lang.Integer.valueOf(s.strip())) > >>> numbers.collect() > > I got, > > Caused by: org.apache.spark.api.python.PythonException: Traceback > (most recent call last): > File > "/mnt/hgfs/lewis/Workspace/source-codes/spark/python/lib/pyspark.zip/pyspark/worker.py", > line 111, in main > process() > File > "/mnt/hgfs/lewis/Workspace/source-codes/spark/python/lib/pyspark.zip/pyspark/worker.py", > line 106, in process > serializer.dump_stream(func(split_index, iterator), outfile) > File > "/mnt/hgfs/lewis/Workspace/source-codes/spark/python/lib/pyspark.zip/pyspark/serializers.py", > line 263, in dump_stream > vs = list(itertools.islice(iterator, batch)) > File "<stdin>", line 1, in <lambda> > AttributeError: 'NoneType' object has no attribute '_jvm' > > at org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:138) > at > org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:179) > at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:97) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) > at org.apache.spark.scheduler.Task.run(Task.scala:88) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > ... 1 more > > While _jvm at the driver end looks fine: > > >>> > SparkContext._active_spark_context._jvm.java.lang.Integer.valueOf("123".strip()) > 123 > > The program is trivial, I just wonder what is the right way to reach > JVM in python. Any help would be appreciated. > > Thanks > > -- > Yizhi Liu > Senior Software Engineer / Data Mining > www.mvad.com, Shanghai, China > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >