For those interested
From: Mich Talebzadeh [mailto:m...@peridale.co.uk] Sent: 06 December 2015 20:33 To: u...@hive.apache.org Subject: Managed to make Hive run on Spark engine Thanks all especially to Xuefu.for contributions. Finally it works, which means don’t give up until it works :) hduser@rhes564::/usr/lib/hive/lib> hive Logging initialized using configuration in jar:file:/usr/lib/hive/lib/hive-common-1.2.1.jar!/hive-log4j.properties hive> set spark.home= /usr/lib/spark-1.3.1-bin-hadoop2.6; hive> set hive.execution.engine=spark; hive> set spark.master=spark://50.140.197.217:7077; hive> set spark.eventLog.enabled=true; hive> set spark.eventLog.dir= /usr/lib/spark-1.3.1-bin-hadoop2.6/logs; hive> set spark.executor.memory=512m; hive> set spark.serializer=org.apache.spark.serializer.KryoSerializer; hive> set hive.spark.client.server.connect.timeout=220000ms; hive> set spark.io.compression.codec=org.apache.spark.io.LZFCompressionCodec; hive> use asehadoop; OK Time taken: 0.638 seconds hive> select count(1) from t; Query ID = hduser_20151206200528_4b85889f-e4ca-41d2-9bd2-1082104be42b Total jobs = 1 Launching Job 1 out of 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapreduce.job.reduces=<number> Starting Spark Job = c8fee86c-0286-4276-aaa1-2a5eb4e4958a Query Hive on Spark job[0] stages: 0 1 Status: Running (Hive on Spark job[0]) Job Progress Format CurrentTime StageId_StageAttemptId: SucceededTasksCount(+RunningTasksCount-FailedTasksCount)/TotalTasksCount [StageCost] 2015-12-06 20:05:36,299 Stage-0_0: 0(+1)/1 Stage-1_0: 0/1 2015-12-06 20:05:39,344 Stage-0_0: 1/1 Finished Stage-1_0: 0(+1)/1 2015-12-06 20:05:40,350 Stage-0_0: 1/1 Finished Stage-1_0: 1/1 Finished Status: Finished successfully in 8.10 seconds OK The versions used for this project OS version Linux version 2.6.18-92.el5xen (brewbuil...@ls20-bc2-13.build.redhat.com <mailto:brewbuil...@ls20-bc2-13.build.redhat.com> ) (gcc version 4.1.2 20071124 (Red Hat 4.1.2-41)) #1 SMP Tue Apr 29 13:31:30 EDT 2008 Hadoop 2.6.0 Hive 1.2.1 spark-1.3.1-bin-hadoop2.6 (downloaded from prebuild spark-1.3.1-bin-hadoop2.6.gz for starting spark standalone cluster) The Jar file used in $HIVE_HOME/lib to link Hive to spark was --> spark-assembly-1.3.1-hadoop2.4.0.jar (built from the source downloaded as zipped file spark-1.3.1.gz and built with command line make-distribution.sh --name "hadoop2-without-hive" --tgz "-Pyarn,hadoop-provided,hadoop-2.4,parquet-provided" Pretty picky on parameters, CLASSPATH, IP addresses or hostname etc to make it work I will create a full guide on how to build and make Hive to run with Spark as its engine (as opposed to MR). HTH Mich Talebzadeh Sybase ASE 15 Gold Medal Award 2008 A Winning Strategy: Running the most Critical Financial Data on ASE 15 http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.pdf Author of the books "A Practitioner’s Guide to Upgrading to Sybase ASE 15", ISBN 978-0-9563693-0-7. co-author "Sybase Transact SQL Guidelines Best Practices", ISBN 978-0-9759693-0-4 Publications due shortly: Complex Event Processing in Heterogeneous Environments, ISBN: 978-0-9563693-3-8 Oracle and Sybase, Concepts and Contrasts, ISBN: 978-0-9563693-1-4, volume one out shortly http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/> NOTE: The information in this email is proprietary and confidential. This message is for the designated recipient only, if you are not the intended recipient, you should destroy it immediately. Any information in this message shall not be understood as given or endorsed by Peridale Technology Ltd, its subsidiaries or their employees, unless expressly so stated. It is the responsibility of the recipient to ensure that this email is virus free, therefore neither Peridale Ltd, its subsidiaries nor their employees accept any responsibility.