Hi!
I’m trying to run at least something using my spark / cassandra setup: prebuilt
spark 1.2.1-hadoop1, cassandra 2.0.14
Spark by itself is working fine, I have several tests in my project - they’re
working, I’m able to use spark shell to run my project - everything is fine.
So, the simplest thing I’m trying now is run zeppelin, then creating a
paragraph:
val rdd =
sc.textFile("/Users/emorozov/tools/apache-cassandra-2.0.14/conf/cassandra.yaml")
rdd.count
Here is what I see in the log files:
Zeppelin Interpreter log file:
INFO [2015-04-14 01:29:21,097] ({pool-1-thread-5} Logging.scala[logInfo]:59) -
Successfully started service 'SparkUI' on port 4045.
INFO [2015-04-14 01:29:21,097] ({pool-1-thread-5} Logging.scala[logInfo]:59) -
Started SparkUI at http://10.59.26.123:4045
INFO [2015-04-14 01:29:21,255] ({pool-1-thread-5} Logging.scala[logInfo]:59) -
Added JAR
file:/Users/emorozov/dev/analytics/analytics-jobs/target/analytics-jobs-5.2.0-SNAPSHOT-all.jar
at http://10.59.26.123:53658/jars/analytics-jobs-5.2.0-SNAPSHOT-all.jar with
timestamp 1429000161255
ERROR [2015-04-14 01:29:21,256] ({pool-1-thread-5}
ProcessFunction.java[process]:41) - Internal error processing getProgress
org.apache.zeppelin.interpreter.InterpreterException:
java.lang.NumberFormatException: For input string: ""
at
org.apache.zeppelin.interpreter.ClassloaderInterpreter.open(ClassloaderInterpreter.java:75)
at
org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:68)
at
org.apache.zeppelin.interpreter.LazyOpenInterpreter.getProgress(LazyOpenInterpreter.java:109)
at
org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer.getProgress(RemoteInterpreterServer.java:299)
at
org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Processor$getProgress.getResult(RemoteInterpreterService.java:938)
at
org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Processor$getProgress.getResult(RemoteInterpreterService.java:923)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NumberFormatException: For input string: ""
at
java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:504)
at java.lang.Integer.parseInt(Integer.java:527)
at
scala.collection.immutable.StringLike$class.toInt(StringLike.scala:229)
at scala.collection.immutable.StringOps.toInt(StringOps.scala:31)
at
org.apache.spark.scheduler.cluster.SparkDeploySchedulerBackend$$anonfun$2.apply(SparkDeploySchedulerBackend.scala:42)
at
org.apache.spark.scheduler.cluster.SparkDeploySchedulerBackend$$anonfun$2.apply(SparkDeploySchedulerBackend.scala:42)
at scala.Option.map(Option.scala:145)
at
org.apache.spark.scheduler.cluster.SparkDeploySchedulerBackend.<init>(SparkDeploySchedulerBackend.scala:42)
at
org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:1883)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:330)
at
org.apache.zeppelin.spark.SparkInterpreter.createSparkContext(SparkInterpreter.java:267)
at
org.apache.zeppelin.spark.SparkInterpreter.getSparkContext(SparkInterpreter.java:145)
at
org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:389)
at
org.apache.zeppelin.interpreter.ClassloaderInterpreter.open(ClassloaderInterpreter.java:73)
... 11 more
Zeppelin log file
INFO [2015-04-14 01:29:17,551] ({pool-1-thread-5} Paragraph.java[jobRun]:194) -
run paragraph 20150414-012109_673822021 using null
org.apache.zeppelin.interpreter.LazyOpenInterpreter@7cfec020
INFO [2015-04-14 01:29:17,551] ({pool-1-thread-5} Paragraph.java[jobRun]:211)
- RUN : val rdd =
sc.textFile("/Users/emorozov/tools/apache-cassandra-2.0.14/conf/cassandra.yaml")
rdd.count
INFO [2015-04-14 01:29:18,262] ({Thread-32}
NotebookServer.java[broadcast]:251) - SEND >> NOTE
ERROR [2015-04-14 01:29:19,273] ({pool-1-thread-5} Job.java[run]:183) - Job
failed
org.apache.zeppelin.interpreter.InterpreterException:
org.apache.thrift.TApplicationException: Internal error processing interpret
at
org.apache.zeppelin.interpreter.remote.RemoteInterpreter.interpret(RemoteInterpreter.java:222)
at
org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:93)
at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:212)
at org.apache.zeppelin.scheduler.Job.run(Job.java:170)
at
org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:293)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.thrift.TApplicationException: Internal error processing
interpret
at
org.apache.thrift.TApplicationException.read(TApplicationException.java:108)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:71)
at
org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.recv_interpret(RemoteInterpreterService.java:190)
at
org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.interpret(RemoteInterpreterService.java:175)
at
org.apache.zeppelin.interpreter.remote.RemoteInterpreter.interpret(RemoteInterpreter.java:205)
... 11 more
zeppelin-env.sh is the following:
export MASTER="spark://emorozov.local:7077"
export ZEPPELIN_PORT=8089
export
ZEPPELIN_JAVA_OPTS="-Dspark.jars=/Users/emorozov/dev/analytics/analytics-jobs/target/analytics-jobs-5.2.0-SNAPSHOT-all.jar"
export SPARK_HOME="/Users/emorozov/tools/spark-1.2.1-bin-hadoop1/"
export ZEPPELIN_HOME="/Users/emorozov/dev/zeppelin"
export ZEPPELIN_MEM="-Xmx4g"
I turned on debug in log4j.properties, hoped to see properties that are
provided into SparkConf (SparkInterpreter.java, line 263: logger.debug), but
there are no properties in the log file.
Although in the same notebook, I’m able to run smth like %sh echo blah. It
gives blah as a result.
--
Eugene Morozov
[email protected]