RE: Hive on Spark - Hadoop 2 - Installation - Ubuntu

Mich Talebzadeh Mon, 23 Nov 2015 02:50:20 -0800

As example shows all set in hive-core.xml


<property>

    <name>hive.execution.engine</name>

    <value>spark</value>

    <description>

      Expects one of [mr, tez, spark].

      Chooses execution engine. Options are: mr (Map reduce, default) or tez 
(hadoop 2 only)

    </description>

  </property>

 

<property>

    <name> spark.eventLog.enabled</name>

    <value>true</value>

    <description>

           Spark event log setting

    </description>

  </property>

 

 

Mich Talebzadeh

 

Sybase ASE 15 Gold Medal Award 2008

A Winning Strategy: Running the most Critical Financial Data on ASE 15

http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.pdf

Author of the books "A Practitioner’s Guide to Upgrading to Sybase ASE 15", 
ISBN 978-0-9563693-0-7. 

co-author "Sybase Transact SQL Guidelines Best Practices", ISBN 
978-0-9759693-0-4

Publications due shortly:

Complex Event Processing in Heterogeneous Environments, ISBN: 978-0-9563693-3-8

Oracle and Sybase, Concepts and Contrasts, ISBN: 978-0-9563693-1-4, volume one 
out shortly

 

http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/> 

 

NOTE: The information in this email is proprietary and confidential. This 
message is for the designated recipient only, if you are not the intended 
recipient, you should destroy it immediately. Any information in this message 
shall not be understood as given or endorsed by Peridale Technology Ltd, its 
subsidiaries or their employees, unless expressly so stated. It is the 
responsibility of the recipient to ensure that this email is virus free, 
therefore neither Peridale Ltd, its subsidiaries nor their employees accept any 
responsibility.

 

From: Dasun Hegoda [mailto:dasunheg...@gmail.com] 
Sent: 23 November 2015 10:40
To: user@hive.apache.org
Subject: Re: Hive on Spark - Hadoop 2 - Installation - Ubuntu

 

Thank you very much. This is very informative. Do you know how to set these in 
hive-site.xml?

 

hive> set spark.master=<Spark Master URL>

hive> set spark.eventLog.enabled=true;

hive> set spark.eventLog.dir=<Spark event log folder (must exist)>

hive> set spark.executor.memory=512m;             

hive> set spark.serializer=org.apache.spark.serializer.KryoSerializer;

 

If these set these in hive-site I think we will be able to get through

 

On Mon, Nov 23, 2015 at 3:05 PM, Mich Talebzadeh <m...@peridale.co.uk 
<mailto:m...@peridale.co.uk> > wrote:

Hi,

 

I am looking at the set up here

 

https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started.

 

First this is about configuration of Hive to work with Spark. These are my 
understanding

 

1.    Hive uses Yarn as its resource manager regardless

2.    Hive uses MapReduce as its execution engine by default

3.    Changing the execution engine to that of Spark at the configuration 
level. If you look at Hive configuration file ->  
$HIVE_HOME/conf/hive-site.xml, you will see that default is mr MapReduce

<property>

    <name>hive.execution.engine</name>

    <value>mr</value>

    <description>

      Expects one of [mr, tez].

      Chooses execution engine. Options are: mr (Map reduce, default) or tez 
(hadoop 2 only)

    </description>

  </property>

 

4.    If you change that to spark and restart Hive, you will force Hive to use 
spark as its engine. So the choice is either do it at the configuration level 
or session level (i.e set set hive.execution.engine=spark;). For the rest of 
parameters you can do the same. i.e. at hive-core.xml or at session level. 
Personally I would still want hive to use MR engine so I will create 
spark-defaults.conf as mentioned.

5.    I then start spark as standalone that works fine

hduser@rhes564::/usr/lib/spark> ./sbin/start-master.sh

starting org.apache.spark.deploy.master.Master, logging to 
/usr/lib/spark/sbin/../logs/spark-hduser-org.apache.spark.deploy.master.Master-1-rhes564.out

hduser@rhes564::/usr/lib/spark> more  
/usr/lib/spark/sbin/../logs/spark-hduser-org.apache.spark.deploy.master.Master-1-rhes564.out

Spark Command: /usr/java/latest/bin/java -cp 
/usr/lib/spark/sbin/../conf/:/usr/lib/spark/lib/spark-assembly-1.5.2-hadoop2.6.0.jar:/usr/lib/spark/lib/datanucleus-core-3.2.10.jar:/usr/lib/spark/lib/datanucleus-ap

i-jdo-3.2.6.jar:/usr/lib/spark/lib/datanucleus-rdbms-3.2.9.jar -Xms1g -Xmx1g 
-XX:MaxPermSize=256m org.apache.spark.deploy.master.Master --ip rhes564 --port 
7077 --webui-port 8080

========================================

Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties

15/11/21 21:41:58 INFO Master: Registered signal handlers for [TERM, HUP, INT]

15/11/21 21:41:58 WARN Utils: Your hostname, rhes564 resolves to a loopback 
address: 127.0.0.1; using 50.140.197.217 instead (on interface eth0)

15/11/21 21:41:58 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another 
address

15/11/21 21:41:59 WARN NativeCodeLoader: Unable to load native-hadoop library 
for your platform... using builtin-java classes where applicable

15/11/21 21:41:59 INFO SecurityManager: Changing view acls to: hduser

15/11/21 21:41:59 INFO SecurityManager: Changing modify acls to: hduser

15/11/21 21:41:59 INFO SecurityManager: SecurityManager: authentication 
disabled; ui acls disabled; users with view permissions: Set(hduser); users 
with modify permissions: Set(hduser)

15/11/21 21:41:59 INFO Slf4jLogger: Slf4jLogger started

15/11/21 21:42:00 INFO Remoting: Starting remoting

15/11/21 21:42:00 INFO Remoting: Remoting started; listening on addresses 
:[akka.tcp://sparkMaster@rhes564:7077]

15/11/21 21:42:00 INFO Utils: Successfully started service 'sparkMaster' on 
port 7077.

15/11/21 21:42:00 INFO Master: Starting Spark master at spark://rhes564:7077

15/11/21 21:42:00 INFO Master: Running Spark version 1.5.2

15/11/21 21:42:00 INFO Utils: Successfully started service 'MasterUI' on port 
8080.

15/11/21 21:42:00 INFO MasterWebUI: Started MasterWebUI at 
http://50.140.197.217:8080

15/11/21 21:42:00 INFO Utils: Successfully started service on port 6066.

15/11/21 21:42:00 INFO StandaloneRestServer: Started REST server for submitting 
applications on port 6066

15/11/21 21:42:00 INFO Master: I have been elected leader! New state: ALIVE

6.    Then I try to start interactive spark-shell and it fails with an error 
that I reported before

hduser@rhes564::/usr/lib/spark/bin> ./spark-shell --master spark://rhes564:7077

log4j:WARN No appenders could be found for logger 
(org.apache.hadoop.metrics2.lib.MutableMetricsFactory).

log4j:WARN Please initialize the log4j system properly.

log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more 
info.

Using Spark's repl log4j profile: 
org/apache/spark/log4j-defaults-repl.properties

To adjust logging level use sc.setLogLevel("INFO")

Welcome to

      ____              __

     / __/__  ___ _____/ /__

    _\ \/ _ \/ _ `/ __/  '_/

   /___/ .__/\_,_/_/ /_/\_\   version 1.5.2

      /_/

 

Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_25)

Type in expressions to have them evaluated.

Type :help for more information.

15/11/23 09:33:56 WARN Utils: Your hostname, rhes564 resolves to a loopback 
address: 127.0.0.1; using 50.140.197.217 instead (on interface eth0)

15/11/23 09:33:56 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another 
address

15/11/23 09:33:57 WARN MetricsSystem: Using default name DAGScheduler for 
source because spark.app.id <http://spark.app.id>  is not set.

Spark context available as sc.

15/11/23 09:34:00 WARN HiveConf: HiveConf of name 
hive.server2.thrift.http.min.worker.threads does not exist

15/11/23 09:34:00 WARN HiveConf: HiveConf of name hive.mapjoin.optimized.keys 
does not exist

15/11/23 09:34:00 WARN HiveConf: HiveConf of name hive.mapjoin.lazy.hashtable 
does not exist

15/11/23 09:34:00 WARN HiveConf: HiveConf of name 
hive.server2.thrift.http.max.worker.threads does not exist

15/11/23 09:34:00 WARN HiveConf: HiveConf of name 
hive.server2.logging.operation.verbose does not exist

15/11/23 09:34:00 WARN HiveConf: HiveConf of name 
hive.optimize.multigroupby.common.distincts does not exist

java.lang.RuntimeException: java.lang.RuntimeException: The root scratch dir: 
/tmp/hive on HDFS should be writable. Current permissions are: rwx------

 

That is where I am now and I have reported this spark user group but no luck 
yet. 

 

 

Mich Talebzadeh

 

Sybase ASE 15 Gold Medal Award 2008

A Winning Strategy: Running the most Critical Financial Data on ASE 15

http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.pdf

Author of the books "A Practitioner’s Guide to Upgrading to Sybase ASE 15", 
ISBN 978-0-9563693-0-7. 

co-author "Sybase Transact SQL Guidelines Best Practices", ISBN 
978-0-9759693-0-4

Publications due shortly:

Complex Event Processing in Heterogeneous Environments, ISBN: 978-0-9563693-3-8

Oracle and Sybase, Concepts and Contrasts, ISBN: 978-0-9563693-1-4, volume one 
out shortly

 

http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/> 

 

NOTE: The information in this email is proprietary and confidential. This 
message is for the designated recipient only, if you are not the intended 
recipient, you should destroy it immediately. Any information in this message 
shall not be understood as given or endorsed by Peridale Technology Ltd, its 
subsidiaries or their employees, unless expressly so stated. It is the 
responsibility of the recipient to ensure that this email is virus free, 
therefore neither Peridale Ltd, its subsidiaries nor their employees accept any 
responsibility.

 

From: Dasun Hegoda [mailto:dasunheg...@gmail.com <mailto:dasunheg...@gmail.com> 
] 
Sent: 23 November 2015 07:05
To: user@hive.apache.org <mailto:user@hive.apache.org> 
Subject: Re: Hive on Spark - Hadoop 2 - Installation - Ubuntu

 

Anyone????

 

On Sat, Nov 21, 2015 at 1:32 PM, Dasun Hegoda <dasunheg...@gmail.com 
<mailto:dasunheg...@gmail.com> > wrote:

Thank you very much but I would like to do the integration of these components 
myself rather than using a packaged distribution. I think I have come to right 
place. Can you please kindly tell me the configuration steps run Hive on Spark?

 

At least someone please elaborate these steps.

https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started.

 

Because at the latter part of the guide configurations are set in the Hive 
runtime shell which is not permanent according to my knowledge.

 

Please help me to get this done. Also I'm planning write a detailed guide with 
configuration steps to run Hive on Spark. So others can benefited from it and 
not troubled like me.

 

Can someone please kindly tell me the configuration steps run Hive on Spark?

 

 

On Sat, Nov 21, 2015 at 12:28 PM, Sai Gopalakrishnan 
<sai.gopalakrish...@aspiresys.com <mailto:sai.gopalakrish...@aspiresys.com> > 
wrote:

Hi everyone,

 

Thank you for your responses. I think Mich's suggestion is a great one, will go 
with it. As Alan suggested, using compactor in Hive should help out with 
managing the delta files.

 

@Dasun, pardon me for deviating from the topic. Regarding configuration, you 
could try a packaged distribution (Hortonworks , Cloudera or MapR) like  Jörn 
Franke said. I use Hortonworks, its open-source and compatible with Linux and 
Windows, provides detailed documentation for installation and can be installed 
in less than a day provided you're all set with the hardware. 
http://hortonworks.com/hdp/downloads/ 


 <http://hortonworks.com/hdp/downloads/> 

Download Hadoop - Hortonworks

Download Apache Hadoop for the enterprise with Hortonworks Data Platform. Data 
access, storage, governance, security and operations across Linux and Windows

 <http://hortonworks.com/hdp/downloads/> Read more...

 

 

Regards,

Sai

 


  _____  


From: Dasun Hegoda <dasunheg...@gmail.com <mailto:dasunheg...@gmail.com> >
Sent: Saturday, November 21, 2015 8:00 AM
To: user@hive.apache.org <mailto:user@hive.apache.org> 
Subject: Re: Hive on Spark - Hadoop 2 - Installation - Ubuntu 

 

Hi Mich, Hi Sai, Hi Jorn,

Thank you very much for the information. I think we are deviating from the 
original question. Hive on Spark on Ubuntu. Can you please kindly tell me the 
configuration steps?

 

 

 

On Fri, Nov 20, 2015 at 11:10 PM, Jörn Franke <jornfra...@gmail.com 
<mailto:jornfra...@gmail.com> > wrote:

I think the most recent versions of cloudera or Hortonworks should include all 
these components - try their Sandboxes. 


On 20 Nov 2015, at 12:54, Dasun Hegoda <dasunheg...@gmail.com 
<mailto:dasunheg...@gmail.com> > wrote:

Where can I get a Hadoop distribution containing these technologies? Link?

 

On Fri, Nov 20, 2015 at 5:22 PM, Jörn Franke <jornfra...@gmail.com 
<mailto:jornfra...@gmail.com> > wrote:

I recommend to use a Hadoop distribution containing these technologies. I think 
you get also other useful tools for your scenario, such as Auditing using 
sentry or ranger.


On 20 Nov 2015, at 10:48, Mich Talebzadeh <m...@peridale.co.uk 
<mailto:m...@peridale.co.uk> > wrote:

Well

 

“I'm planning to deploy Hive on Spark but I can't find the installation steps. 
I tried to read the official '[Hive on Spark][1]' guide but it has problems. As 
an example it says under 'Configuring Yarn' 
`yarn.resourcemanager.scheduler.class=org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler`
 but does not imply where should I do it. Also as per the guide configurations 
are set in the Hive runtime shell which is not permanent according to my 
knowledge.”

 

You can do that in yarn-site.xml file which is normally under 
$HADOOP_HOME/etc/hadoop.

 

 

HTH

 

 

 

Mich Talebzadeh

 

Sybase ASE 15 Gold Medal Award 2008

A Winning Strategy: Running the most Critical Financial Data on ASE 15

http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.pdf

Author of the books "A Practitioner’s Guide to Upgrading to Sybase ASE 15", 
ISBN 978-0-9563693-0-7. 

co-author "Sybase Transact SQL Guidelines Best Practices", ISBN 
978-0-9759693-0-4

Publications due shortly:

Complex Event Processing in Heterogeneous Environments, ISBN: 978-0-9563693-3-8

Oracle and Sybase, Concepts and Contrasts, ISBN: 978-0-9563693-1-4, volume one 
out shortly

 

http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/> 

 

NOTE: The information in this email is proprietary and confidential. This 
message is for the designated recipient only, if you are not the intended 
recipient, you should destroy it immediately. Any information in this message 
shall not be understood as given or endorsed by Peridale Technology Ltd, its 
subsidiaries or their employees, unless expressly so stated. It is the 
responsibility of the recipient to ensure that this email is virus free, 
therefore neither Peridale Ltd, its subsidiaries nor their employees accept any 
responsibility.

 

From: Dasun Hegoda [mailto:dasunheg...@gmail.com] 
Sent: 20 November 2015 09:36
To: user@hive.apache.org <mailto:user@hive.apache.org> 
Subject: Hive on Spark - Hadoop 2 - Installation - Ubuntu

 

Hi,

 

What I'm planning to do is develop a reporting platform using existing data. I 
have an existing RDBMS which has large number of records. So I'm using. 
(http://stackoverflow.com/questions/33635234/hadoop-2-7-spark-hive-jasperreports-scoop-architecuture)

 

 - Scoop - Extract data from RDBMS to Hadoop

 - Hadoop - Storage platform -> *Deployment Completed*

 - Hive - Datawarehouse

 - Spark - Read time processing -> *Deployment Completed*

 

I'm planning to deploy Hive on Spark but I can't find the installation steps. I 
tried to read the official '[Hive on Spark][1]' guide but it has problems. As 
an example it says under 'Configuring Yarn' 
`yarn.resourcemanager.scheduler.class=org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler`
 but does not imply where should I do it. Also as per the guide configurations 
are set in the Hive runtime shell which is not permanent according to my 
knowledge.

 

Given that I read [this][2] but it does not have any steps.

 

Please provide me the steps to run Hive on Spark on Ubuntu as a production 
system?

 

 

  [1]: 
https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started

  [2]: 
http://stackoverflow.com/questions/26018306/how-to-configure-hive-to-use-spark

 

-- 

Regards,

Dasun Hegoda, Software Engineer  
www.dasunhegoda.com <http://www.dasunhegoda.com/>  | dasunheg...@gmail.com 
<mailto:dasunheg...@gmail.com> 





 

-- 

Regards, 

Dasun Hegoda, Software Engineer  
www.dasunhegoda.com <http://www.dasunhegoda.com/>  | dasunheg...@gmail.com 
<mailto:dasunheg...@gmail.com> 





 

-- 

Regards, 

Dasun Hegoda, Software Engineer  
www.dasunhegoda.com <http://www.dasunhegoda.com/>  | dasunheg...@gmail.com 
<mailto:dasunheg...@gmail.com> 



This e-mail message and any attachments are for the sole use of the intended 
recipient(s) and may contain proprietary, confidential, trade secret or 
privileged information. Any unauthorized review, use, disclosure or 
distribution is prohibited and may be a violation of law. If you are not the 
intended recipient, please contact the sender by reply e-mail and destroy all 
copies of the original message. 





 

-- 

Regards,

Dasun Hegoda, Software Engineer  
www.dasunhegoda.com <http://www.dasunhegoda.com/>  | dasunheg...@gmail.com 
<mailto:dasunheg...@gmail.com> 





 

-- 

Regards,

Dasun Hegoda, Software Engineer  
www.dasunhegoda.com <http://www.dasunhegoda.com/>  | dasunheg...@gmail.com 
<mailto:dasunheg...@gmail.com> 





 

-- 

Regards,

Dasun Hegoda, Software Engineer  
www.dasunhegoda.com <http://www.dasunhegoda.com/>  | dasunheg...@gmail.com 
<mailto:dasunheg...@gmail.com>

RE: Hive on Spark - Hadoop 2 - Installation - Ubuntu

Reply via email to