FW: Managed to make Hive run on Spark engine

Mich Talebzadeh Mon, 07 Dec 2015 07:51:12 -0800

For those interested


From: Mich Talebzadeh [mailto:m...@peridale.co.uk] 
Sent: 06 December 2015 20:33
To: u...@hive.apache.org
Subject: Managed to make Hive run on Spark engine

 

Thanks all especially to Xuefu.for contributions. Finally it works, which means 
don’t give up until it works :)

 

hduser@rhes564::/usr/lib/hive/lib> hive

Logging initialized using configuration in 
jar:file:/usr/lib/hive/lib/hive-common-1.2.1.jar!/hive-log4j.properties

hive> set spark.home= /usr/lib/spark-1.3.1-bin-hadoop2.6;

hive> set hive.execution.engine=spark;

hive> set spark.master=spark://50.140.197.217:7077;

hive> set spark.eventLog.enabled=true;

hive> set spark.eventLog.dir= /usr/lib/spark-1.3.1-bin-hadoop2.6/logs;

hive> set spark.executor.memory=512m;

hive> set spark.serializer=org.apache.spark.serializer.KryoSerializer;

hive> set hive.spark.client.server.connect.timeout=220000ms;

hive> set spark.io.compression.codec=org.apache.spark.io.LZFCompressionCodec;

hive> use asehadoop;

OK

Time taken: 0.638 seconds

hive> select count(1) from t;

Query ID = hduser_20151206200528_4b85889f-e4ca-41d2-9bd2-1082104be42b

Total jobs = 1

Launching Job 1 out of 1

In order to change the average load for a reducer (in bytes):

  set hive.exec.reducers.bytes.per.reducer=<number>

In order to limit the maximum number of reducers:

  set hive.exec.reducers.max=<number>

In order to set a constant number of reducers:

  set mapreduce.job.reduces=<number>

Starting Spark Job = c8fee86c-0286-4276-aaa1-2a5eb4e4958a

 

Query Hive on Spark job[0] stages:

0

1

 

Status: Running (Hive on Spark job[0])

Job Progress Format

CurrentTime StageId_StageAttemptId: 
SucceededTasksCount(+RunningTasksCount-FailedTasksCount)/TotalTasksCount 
[StageCost]

2015-12-06 20:05:36,299 Stage-0_0: 0(+1)/1      Stage-1_0: 0/1

2015-12-06 20:05:39,344 Stage-0_0: 1/1 Finished Stage-1_0: 0(+1)/1

2015-12-06 20:05:40,350 Stage-0_0: 1/1 Finished Stage-1_0: 1/1 Finished

Status: Finished successfully in 8.10 seconds

OK

 

The versions used for this project

 

 

OS version Linux version 2.6.18-92.el5xen 
(brewbuil...@ls20-bc2-13.build.redhat.com 
<mailto:brewbuil...@ls20-bc2-13.build.redhat.com> ) (gcc version 4.1.2 20071124 
(Red Hat 4.1.2-41)) #1 SMP Tue Apr 29 13:31:30 EDT 2008

 

Hadoop 2.6.0

Hive 1.2.1

spark-1.3.1-bin-hadoop2.6 (downloaded from prebuild 
spark-1.3.1-bin-hadoop2.6.gz for starting spark standalone cluster)

The Jar file used in $HIVE_HOME/lib to link Hive to spark was --> 
spark-assembly-1.3.1-hadoop2.4.0.jar 

   (built from the source downloaded as zipped file spark-1.3.1.gz and built 
with command line make-distribution.sh --name "hadoop2-without-hive" --tgz 
"-Pyarn,hadoop-provided,hadoop-2.4,parquet-provided"

 

Pretty picky on parameters, CLASSPATH, IP addresses or hostname etc to make it 
work

 

I will create a full guide on how to build and make Hive to run with Spark as 
its engine (as opposed to MR).

 

HTH

 

Mich Talebzadeh

 

Sybase ASE 15 Gold Medal Award 2008

A Winning Strategy: Running the most Critical Financial Data on ASE 15

http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.pdf

Author of the books "A Practitioner’s Guide to Upgrading to Sybase ASE 15", 
ISBN 978-0-9563693-0-7. 

co-author "Sybase Transact SQL Guidelines Best Practices", ISBN 
978-0-9759693-0-4

Publications due shortly:

Complex Event Processing in Heterogeneous Environments, ISBN: 978-0-9563693-3-8

Oracle and Sybase, Concepts and Contrasts, ISBN: 978-0-9563693-1-4, volume one 
out shortly

 

http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/> 

 

NOTE: The information in this email is proprietary and confidential. This 
message is for the designated recipient only, if you are not the intended 
recipient, you should destroy it immediately. Any information in this message 
shall not be understood as given or endorsed by Peridale Technology Ltd, its 
subsidiaries or their employees, unless expressly so stated. It is the 
responsibility of the recipient to ensure that this email is virus free, 
therefore neither Peridale Ltd, its subsidiaries nor their employees accept any 
responsibility.

FW: Managed to make Hive run on Spark engine

Reply via email to