Re: PySpark with livy

Jeff Zhang Fri, 29 Sep 2017 16:27:26 -0700

It is more likely your spark configuration issue, could you run this code
in pyspark shell ?




Mauro Schneider <maur...@gmail.com>于2017年9月29日周五 下午11:24写道：

>
> Hi
>
> I'm trying execute PySpark code with Zeppelin and Livy but without
> success. With Scala and Livy work well but when I execute the code below I
> getting a Exception from Zeppelin.
>
> <code>
> %livy.pyspark
> txtFile = sc.textFile ("/data/staging/zeppelin_test/data.txt")
> counts = txtFile.flatMap(lambda line: line.split(" ")).map(lambda word:
> (word, 1)).reduceByKey(lambda a, b: a + b)
> counts.collect()
> </code>
>
> <excpetion>
> Version:0.9 StartHTML:0000000168 EndHTML:0000024858
> StartFragment:0000000204 EndFragment:0000024822 SourceURL:
> http://dtbhad02p.bvs.corp:8080/#/notebook/2CSK9TFZM
> An error occurred while calling o47.textFile.
> : java.lang.UnsatisfiedLinkError:
> org.xerial.snappy.SnappyNative.maxCompressedLength(I)I
> at org.xerial.snappy.SnappyNative.maxCompressedLength(Native Method)
> at org.xerial.snappy.Snappy.maxCompressedLength(Snappy.java:316)
> at org.xerial.snappy.SnappyOutputStream.<init>(SnappyOutputStream.java:79)
> at
> org.apache.spark.io.SnappyCompressionCodec.compressedOutputStream(CompressionCodec.scala:156)
> at
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$4.apply(TorrentBroadcast.scala:200)
> at
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$4.apply(TorrentBroadcast.scala:200)
> at scala.Option.map(Option.scala:145)
> at
> org.apache.spark.broadcast.TorrentBroadcast$.blockifyObject(TorrentBroadcast.scala:200)
> at
> org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:102)
> at
> org.apache.spark.broadcast.TorrentBroadcast.<init>(TorrentBroadcast.scala:85)
> at
> org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:34)
> at
> org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:63)
> at org.apache.spark.SparkContext.broadcast(SparkContext.scala:1334)
> at
> org.apache.spark.SparkContext$$anonfun$hadoopFile$1.apply(SparkContext.scala:1022)
> at
> org.apache.spark.SparkContext$$anonfun$hadoopFile$1.apply(SparkContext.scala:1019)
> at
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
> at
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
> at org.apache.spark.SparkContext.withScope(SparkContext.scala:722)
> at org.apache.spark.SparkContext.hadoopFile(SparkContext.scala:1019)
> at
> org.apache.spark.SparkContext$$anonfun$textFile$1.apply(SparkContext.scala:840)
> at
> org.apache.spark.SparkContext$$anonfun$textFile$1.apply(SparkContext.scala:838)
> at
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
> at
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
> at org.apache.spark.SparkContext.withScope(SparkContext.scala:722)
> at org.apache.spark.SparkContext.textFile(SparkContext.scala:838)
> at
> org.apache.spark.api.java.JavaSparkContext.textFile(JavaSparkContext.scala:188)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
> at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
> at py4j.Gateway.invoke(Gateway.java:259)
> at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
> at py4j.commands.CallCommand.execute(CallCommand.java:79)
> at py4j.GatewayConnection.run(GatewayConnection.java:209)
> at java.lang.Thread.run(Thread.java:745)
> </exception>
>
> I had too tested the code below with Curl and Livy and work correctly
>
> <code /user/mulisses/test.py>
> import sys
> from pyspark import SparkContext
>
> if __name__ == "__main__":
>         sc = SparkContext(appName="Hello Spark")
>         txtFile = sc.textFile ("/data/staging/zeppelin_test/data.txt")
>         counts = txtFile.flatMap(lambda line: line.split(" ")).map(lambda
> word: (word, 1)).reduceByKey(lambda a, b: a + b)
>         counts.saveAsTextFile("test_wc_py")
> </code>
>
> <cUrl>
> curl  -i --negotiate -u : -X POST --data '{"file":
> "/user/mulisses/test.py"}' -H "Content-Type: application/json"
> dtbhad02p.bvs.corp:8998/batches
> </cUrl>
>
> Is anyone know how to solve it ? I had tested with Zeppelin 0.7.2 and
> Zeppelin 0.7.3
> Am I forgetting some configuration?
>
> Best regards,
>
> Mauro Schneider
>
>
>

Re: PySpark with livy

Reply via email to