* i've installed hive 2.3 and spark 2.2

* i've read this doc plenty of times ->
https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started

* i run this query:

   hive --hiveconf hive.root.logger=DEBUG,console -e 'set
hive.execution.engine=spark; select date_key, count(*) from
fe_inventory.merged_properties_hist group by 1 order by 1;'


* i get this error:

*   Exception in thread "main" java.lang.NoClassDefFoundError:
org/apache/spark/scheduler/SparkListenerInterface*


* this class in:
  /usr/lib/spark-2.2.0-bin-hadoop2.6/jars/spark-core_2.11-2.2.0.jar

* i have copied all the spark jars to hdfs://dwrdevnn1/spark-2.2-jars

* i have updated hive-site.xml to set spark.yarn.jars to it.

* i see this is the console:

2017-09-26T13:34:15,505  INFO [334aa7db-ad0c-48c3-9ada-467aaf05cff3 main]
spark.HiveSparkClientFactory: load spark property from hive configuration
(spark.yarn.jars -> hdfs://dwrdevnn1.sv2.trulia.com:8020/spark-2.2-jars/*).

* i see this on the console

2017-09-26T14:04:45,678  INFO [4cb82b6d-9568-4518-8e00-f0cf7ac58cd3 main]
client.SparkClientImpl: Running client driver with argv:
/usr/lib/spark-2.2.0-bin-hadoop2.6/bin/spark-submit --properties-file
/tmp/spark-submit.6105784757200912217.properties --class
org.apache.hive.spark.client.RemoteDriver
/usr/lib/apache-hive-2.3.0-bin/lib/hive-exec-2.3.0.jar --remote-host
dwrdevnn1.sv2.trulia.com --remote-port 53393 --conf
hive.spark.client.connect.timeout=1000 --conf
hive.spark.client.server.connect.timeout=90000 --conf
hive.spark.client.channel.log.level=null --conf
hive.spark.client.rpc.max.size=52428800 --conf
hive.spark.client.rpc.threads=8 --conf hive.spark.client.secret.bits=256
--conf hive.spark.client.rpc.server.address=null

* i even print out CLASSPATH in this script:
/usr/lib/spark-2.2.0-bin-hadoop2.6/bin/spark-submit

and /usr/lib/spark-2.2.0-bin-hadoop2.6/jars/spark-core_2.11-2.2.0.jar is in
it.

​so i ask... what am i missing?

thanks,
Stephen​

Reply via email to