well this is the spark-submit line from above: 2017-09-26T14:04:45,678 INFO [4cb82b6d-9568-4518-8e00-f0cf7ac58cd3 main] client.SparkClientImpl: Running client driver with argv: */usr/li/spark-2.2.0-bin-**hadoop2.6/bin/spark-submit*
and that's pretty clearly v2.2 I do have other versions of spark on the namenode so lemme remove those and see what happens.... A-HA! dang it! $ echo $SPARK_HOME /usr/local/spark well that clearly needs to be: */usr/lib/spark-2.2.0-bin-* *hadoop2.6 * how did i miss that? unbelievable. Thank you Sahil! Lets see what happens next! Cheers, Stephen On Tue, Sep 26, 2017 at 4:12 PM, Sahil Takiar <takiar.sa...@gmail.com> wrote: > Are you sure you are using Spark 2.2.0? Based on the stack-trace it looks > like your call to spark-submit it using an older version of Spark (looks > like some early 1.x version). Do you have SPARK_HOME set locally? Do you > have older versions of Spark installed locally? > > --Sahil > > On Tue, Sep 26, 2017 at 3:33 PM, Stephen Sprague <sprag...@gmail.com> > wrote: > >> thanks Sahil. here it is. >> >> Exception in thread "main" java.lang.NoClassDefFoundError: >> org/apache/spark/scheduler/SparkListenerInterface >> at java.lang.Class.forName0(Native Method) >> at java.lang.Class.forName(Class.java:344) >> at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit. >> scala:318) >> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala: >> 75) >> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) >> Caused by: java.lang.ClassNotFoundException: >> org.apache.spark.scheduler.SparkListenerInterface >> at java.net.URLClassLoader$1.run(URLClassLoader.java:372) >> at java.net.URLClassLoader$1.run(URLClassLoader.java:361) >> at java.security.AccessController.doPrivileged(Native Method) >> at java.net.URLClassLoader.findClass(URLClassLoader.java:360) >> at java.lang.ClassLoader.loadClass(ClassLoader.java:424) >> at java.lang.ClassLoader.loadClass(ClassLoader.java:357) >> ... 5 more >> >> at >> org.apache.hive.spark.client.rpc.RpcServer.cancelClient(RpcServer.java:212) >> ~[hive-exec-2.3.0.jar:2.3.0] >> at >> org.apache.hive.spark.client.SparkClientImpl$3.run(SparkClientImpl.java:500) >> ~[hive-exec-2.3.0.jar:2.3.0] >> at java.lang.Thread.run(Thread.java:745) ~[?:1.8.0_25] >> FAILED: SemanticException Failed to get a spark session: >> org.apache.hadoop.hive.ql.metadata.HiveException: Failed to create spark >> client. >> 2017-09-26T14:04:46,470 ERROR [4cb82b6d-9568-4518-8e00-f0cf7ac58cd3 >> main] ql.Driver: FAILED: SemanticException Failed to get a spark session: >> org.apache.hadoop.hive.ql.metadata.HiveException: Failed to create spark >> client. >> org.apache.hadoop.hive.ql.parse.SemanticException: Failed to get a spark >> session: org.apache.hadoop.hive.ql.metadata.HiveException: Failed to >> create spark client. >> at org.apache.hadoop.hive.ql.optimizer.spark.SetSparkReducerPar >> allelism.getSparkMemoryAndCores(SetSparkReducerParallelism.java:240) >> at org.apache.hadoop.hive.ql.optimizer.spark.SetSparkReducerPar >> allelism.process(SetSparkReducerParallelism.java:173) >> at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch >> (DefaultRuleDispatcher.java:90) >> at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAnd >> Return(DefaultGraphWalker.java:105) >> at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(De >> faultGraphWalker.java:89) >> at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWa >> lker.java:56) >> at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWa >> lker.java:61) >> at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWa >> lker.java:61) >> at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWa >> lker.java:61) >> at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalkin >> g(DefaultGraphWalker.java:120) >> at org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.runSetRe >> ducerParallelism(SparkCompiler.java:288) >> at org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.optimize >> OperatorPlan(SparkCompiler.java:122) >> at org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCom >> piler.java:140) >> at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInte >> rnal(SemanticAnalyzer.java:11253) >> at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeIntern >> al(CalcitePlanner.java:286) >> at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze >> (BaseSemanticAnalyzer.java:258) >> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:511) >> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java >> :1316) >> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1456) >> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1236) >> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1226) >> at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriv >> er.java:233) >> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver. >> java:184) >> at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver. >> java:403) >> at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver. >> java:336) >> at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver >> .java:787) >> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759) >> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:686) >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAcce >> ssorImpl.java:62) >> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMe >> thodAccessorImpl.java:43) >> at java.lang.reflect.Method.invoke(Method.java:483) >> at org.apache.hadoop.util.RunJar.run(RunJar.java:221) >> at org.apache.hadoop.util.RunJar.main(RunJar.java:136) >> >> >> I bugs me that that class is in spark-core_2.11-2.2.0.jar yet so >> seemingly out of reach. :( >> >> >> >> On Tue, Sep 26, 2017 at 2:44 PM, Sahil Takiar <takiar.sa...@gmail.com> >> wrote: >> >>> Hey Stephen, >>> >>> Can you send the full stack trace for the NoClassDefFoundError? For Hive >>> 2.3.0, we only support Spark 2.0.0. Hive may work with more recent versions >>> of Spark, but we only test with Spark 2.0.0. >>> >>> --Sahil >>> >>> On Tue, Sep 26, 2017 at 2:35 PM, Stephen Sprague <sprag...@gmail.com> >>> wrote: >>> >>>> * i've installed hive 2.3 and spark 2.2 >>>> >>>> * i've read this doc plenty of times -> https://cwiki.apache.org/confl >>>> uence/display/Hive/Hive+on+Spark%3A+Getting+Started >>>> >>>> * i run this query: >>>> >>>> hive --hiveconf hive.root.logger=DEBUG,console -e 'set >>>> hive.execution.engine=spark; select date_key, count(*) from >>>> fe_inventory.merged_properties_hist group by 1 order by 1;' >>>> >>>> >>>> * i get this error: >>>> >>>> * Exception in thread "main" java.lang.NoClassDefFoundError: >>>> org/apache/spark/scheduler/SparkListenerInterface* >>>> >>>> >>>> * this class in: >>>> /usr/lib/spark-2.2.0-bin-hadoop2.6/jars/spark-core_2.11-2.2.0.jar >>>> >>>> * i have copied all the spark jars to hdfs://dwrdevnn1/spark-2.2-jars >>>> >>>> * i have updated hive-site.xml to set spark.yarn.jars to it. >>>> >>>> * i see this is the console: >>>> >>>> 2017-09-26T13:34:15,505 INFO [334aa7db-ad0c-48c3-9ada-467aaf05cff3 >>>> main] spark.HiveSparkClientFactory: load spark property from hive >>>> configuration (spark.yarn.jars -> hdfs://dwrdevnn1.sv2.trulia.co >>>> m:8020/spark-2.2-jars/*). >>>> >>>> * i see this on the console >>>> >>>> 2017-09-26T14:04:45,678 INFO [4cb82b6d-9568-4518-8e00-f0cf7ac58cd3 >>>> main] client.SparkClientImpl: Running client driver with argv: >>>> /usr/lib/spark-2.2.0-bin-hadoop2.6/bin/spark-submit --properties-file >>>> /tmp/spark-submit.6105784757200912217.properties --class >>>> org.apache.hive.spark.client.RemoteDriver >>>> /usr/lib/apache-hive-2.3.0-bin/lib/hive-exec-2.3.0.jar --remote-host >>>> dwrdevnn1.sv2.trulia.com --remote-port 53393 --conf >>>> hive.spark.client.connect.timeout=1000 --conf >>>> hive.spark.client.server.connect.timeout=90000 --conf >>>> hive.spark.client.channel.log.level=null --conf >>>> hive.spark.client.rpc.max.size=52428800 --conf >>>> hive.spark.client.rpc.threads=8 --conf hive.spark.client.secret.bits=256 >>>> --conf hive.spark.client.rpc.server.address=null >>>> >>>> * i even print out CLASSPATH in this script: >>>> /usr/lib/spark-2.2.0-bin-hadoop2.6/bin/spark-submit >>>> >>>> and /usr/lib/spark-2.2.0-bin-hadoop2.6/jars/spark-core_2.11-2.2.0.jar >>>> is in it. >>>> >>>> so i ask... what am i missing? >>>> >>>> thanks, >>>> Stephen >>>> >>>> >>>> >>>> >>>> >>>> >>> >>> >>> -- >>> Sahil Takiar >>> Software Engineer at Cloudera >>> takiar.sa...@gmail.com | (510) 673-0309 >>> >> >> > > > -- > Sahil Takiar > Software Engineer at Cloudera > takiar.sa...@gmail.com | (510) 673-0309 >