Hi all,

in one of previous threads I've described some problems with configuring
zeppelin + spark on EC2. Now I'm step further. On both server I have spark
1.6.1, just updated to this version but the version itself is not a problem.

What happens...

1. Pure environment, zeppelin connects to remote spark, some small piece of
of code is executed:

val NUM_SAMPLES = 10000000

val count = sc.parallelize(1 to NUM_SAMPLES).map{i =>
  val x = Math.random()
  val y = Math.random()
  if (x*x + y*y < 1) 1 else 0
}.reduce(_ + _)
println("Pi is roughly " + 4.0 * count / NUM_SAMPLES)

2. No problem at all. Step futher. I need to read data from Amazon S3.

sc.hadoopConfiguration.set("fs.s3n.awsAccessKeyId", "xxx")
sc.hadoopConfiguration.set("fs.s3n.awsSecretAccessKey", "xxx")
val rddFull = sc.textFile("xxx").zipWithIndex()

And...

java.io.IOException: No FileSystem for scheme: s3n
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2584)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91)
...

3. Clear. Hadoop AWS is missing in CP. I'm adding the following dependency:
org.apache.hadoop:hadoop-aws:2.6.0
4. Executing the piece of code again:

java.lang.NoSuchMethodError:
com.fasterxml.jackson.module.scala.deser.BigDecimalDeserializer$.handledType()Ljava/lang/Class;
at
com.fasterxml.jackson.module.scala.deser.NumberDeserializers$.<init>(ScalaNumberDeserializersModule.scala:49)
at
com.fasterxml.jackson.module.scala.deser.NumberDeserializers$.<clinit>(ScalaNumberDeserializersModule.scala)
at
com.fasterxml.jackson.module.scala.deser.ScalaNumberDeserializersModule$class.$init$(ScalaNumberDeserializersModule.scala:61)
at
com.fasterxml.jackson.module.scala.DefaultScalaModule.<init>(DefaultScalaModule.scala:19)
at
com.fasterxml.jackson.module.scala.DefaultScalaModule$.<init>(DefaultScalaModule.scala:35)
at
com.fasterxml.jackson.module.scala.DefaultScalaModule$.<clinit>(DefaultScalaModule.scala)
at
org.apache.spark.rdd.RDDOperationScope$.<init>(RDDOperationScope.scala:81)
at org.apache.spark.rdd.RDDOperationScope$.<clinit>(RDDOperationScope.scala)
at org.apache.spark.SparkContext.withScope(SparkContext.scala:714)
at org.apache.spark.SparkContext.textFile(SparkContext.scala:830)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:30)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:35)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:37)
at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:39)
at $iwC$$iwC$$iwC$$iwC.<init>(<console>:41)
at $iwC$$iwC$$iwC.<init>(<console>:43)
at $iwC$$iwC.<init>(<console>:45)
at $iwC.<init>(<console>:47)
at <init>(<console>:49)
at .<init>(<console>:53)
at .<clinit>(<console>)
at .<init>(<console>:7)
at .<clinit>(<console>)
at $print(<console>)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at
org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
at
org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346)
at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
at
org.apache.zeppelin.spark.SparkInterpreter.interpretInput(SparkInterpreter.java:813)
at
org.apache.zeppelin.spark.SparkInterpreter.interpret(SparkInterpreter.java:756)
at
org.apache.zeppelin.spark.SparkInterpreter.interpret(SparkInterpreter.java:748)
at
org.apache.zeppelin.interpreter.ClassloaderInterpreter.interpret(ClassloaderInterpreter.java:57)
at
org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:93)
at
org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:331)
at org.apache.zeppelin.scheduler.Job.run(Job.java:171)
at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

5. Small investigation in google shows that it's all about incompatible
versions of fastxml libraries. And the question: how to fix it????

What I've already done to fix it:
1. tried with spark 1.5.2 hadoop 2.6 - same error
2. tried with spark 1.5.2 hadoop 2.4 - same error
3. tried to recompile spark with upgraded version of fastxml (2.6) - same
error

Moreover, if I login onto the spark server and use the spark shell I'm able
to execute the piece of code without any problems. After a few seconds I've
got my file read and the basic count() shows the correct result.

Regards,
Marcin

Reply via email to