Zeppelin behaviour different from Spark

tog Sat, 12 Dec 2015 01:32:08 -0800

Hi

I have define SPARK_HOME in my env file and I am therefore assuming that
zeppelin is using spark-submit.


I have a scala class that I test using spark-submit and it works fine. I
have now split this class in a couple of paragraphs in a notebook. One is
for loading dependencies (%dep) and the others are containing the code.

Somehow I got that exception when running the last paragraph. The error
seems to be related to the fact that I am using Job in the following
snippet:
val hConf = sc.hadoopConfiguration

var job = new Job(hConf)

FileInputFormat.setInputPaths(job,new Path(path));

// construct RDD and proceed
var hRDD = new NewHadoopRDD(sc, classOf[RandomAccessInputFormat],
classOf[IntWritable], classOf[BytesWritable], job.getConfiguration())
  val result = hRDD.mapPartitionsWithInputSplit{ (split, iter) =>
extractCAS(split, iter)}.collect()

Has anyone faced a similar issue?

Cheers
Guillaume

path: String =
/Users/tog/Downloads/zeppelin-0.5.5-incubating_all_modified/data/MERGED_DAR.DAT_256
hConf: org.apache.hadoop.conf.Configuration = Configuration:
core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml,
yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml warning:
there were 1 deprecation warning(s); re-run with -deprecation for details
java.lang.IllegalStateException: Job in state DEFINE instead of RUNNING at
org.apache.hadoop.mapreduce.Job.ensureState(Job.java:294) at
org.apache.hadoop.mapreduce.Job.toString(Job.java:463) at
scala.runtime.ScalaRunTime$.scala$runtime$ScalaRunTime$$inner$1(ScalaRunTime.scala:324)
at scala.runtime.ScalaRunTime$.stringOf(ScalaRunTime.scala:329) at
scala.runtime.ScalaRunTime$.replStringOf(ScalaRunTime.scala:337) at
.<init>(<console>:10) at .<clinit>(<console>) at $print(<console>)

Zeppelin behaviour different from Spark

Reply via email to