Re: hive + tez + yarn 2.4

Grandl Robert Wed, 18 Jun 2014 22:36:13 -0700

Thanks a lot guys for your help.

The version mismatch solved my problem. 

Robert

On Wednesday, June 18, 2014 2:19 PM, Bikas Saha <[email protected]> wrote:

Hive 0.13 is incompatible with Tez-0.5 (trunk). Hive depends on Tez-0.4. You 
should probably check out branch-0.4 and build that. 

Bikas

From:Grandl Robert [mailto:[email protected]] 
Sent: Wednesday, June 18, 2014 2:05 PM
To: [email protected]
Subject: Re: hive + tez + yarn 2.4

Hi Hitesh,

I followed the steps mentioned there. The error mentioned above: 

FAILED: Execution Error, return code 1 from 
org.apache.hadoop.hive.ql.exec.tez.TezTask

was mainly that Hive was expecting the hive-exec jar to be in /user/hadoop 
instead of /user/hive as mentioned on some posts. I copied the hive exec in 
that user, and now the TEZ job is launched, succeeds after a long while, but 
DAGs fails.

In Hive console I get the following:
hive> SELECT COUNT(*) FROM student;
Query ID = hadoop_20140618133737_83b94345-4058-4e25-8528-fa0bfded4b86
Total jobs = 1
Launching Job 1 out of 1
Tez session was closed. Reopening...
Session re-established.

Status: Running (application id: application_1403117117414_0006)

Map 1: -/-    Reducer 2: 0/1    
Status: Failed
Vertex failed, vertexName=Map 1, vertexId=vertex_1403117117414_0006_1_01, 
diagnostics=[Vertex Input: student initializer failed., 
org.apache.tez.runtime.api.events.RootInputConfigureVertexTasksEvent.<init>(ILjava/util/List;)V]
Vertex killed, vertexName=Reducer 2, vertexId=vertex_1403117117414_0006_1_00, 
diagnostics=[Vertex received Kill in INITED state.]
DAG failed due to vertex failure. failedVertices:1 killedVertices:1
FAILED: Execution Error, return code 2 from 
org.apache.hadoop.hive.ql.exec.tez.TezTask

After the job finished(with failed DAGS), looking in job AM log, in 
stdout_dag_*, I can see the following exception:
2014-06-18 13:37:15,777 INFO [InputInitializer [Map 1] #0] 
org.apache.hadoop.hive.ql.exec.tez.SplitGrouper: Original split size is 56 
grouped split size is 6
, for bucket: 1
2014-06-18 13:37:15,781 INFO [InputInitializer [Map 1] #0] 
org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator: Number of grouped 
splits: 6
2014-06-18 13:37:15,788 ERROR [AsyncDispatcher event handler] 
org.apache.tez.dag.app.dag.impl.VertexImpl: Vertex Input: student initializer 
failed
java.lang.NoSuchMethodError: 
org.apache.tez.runtime.api.events.RootInputConfigureVertexTasksEvent.<init>(ILjava/util/List;)V
        at 
org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.createEventList(HiveSplitGenerator.java:177)
        at 
org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:92)
        at 
org.apache.tez.dag.app.dag.RootInputInitializerRunner$InputInitializerCallable$1.run(RootInputInitializerRunner.java:154)
        at 
org.apache.tez.dag.app.dag.RootInputInitializerRunner$InputInitializerCallable$1.run(RootInputInitializerRunner.java:146)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
        at 
org.apache.tez.dag.app.dag.RootInputInitializerRunner$InputInitializerCallable.call(RootInputInitializerRunner.java:146)
        at 
org.apache.tez.dag.app.dag.RootInputInitializerRunner$InputInitializerCallable.call(RootInputInitializerRunner.java:114)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
        at java.util.concurrent.FutureTask.run(FutureTask.java:166)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:722)
2014-06-18 13:37:15,798 INFO [HistoryEventHandlingThread] 
org.apache.tez.dag.history.logging.impl.SimpleHistoryLoggingService: Writing 
event VERTEX_FINISHED to history file
2014-06-18 13:37:15,800 INFO [AsyncDispatcher event handler] 
org.apache.tez.dag.history.HistoryEventHandler: 
[HISTORY][DAG:dag_1403117117414_0006_1][Event:VERTEX_FINISHED]: vertexName=Map 
1, vertexId=vertex_1403117117414_0006_1_01, initRequestedTime=1403123835325, 
initedTime=0, startRequestedTime=1403123835358, s

I also have tez-site.xml path into HADOOP_CLASSPATH. 

Do you have any idea about it ?

robert

On Wednesday, June 18, 2014 12:54 PM, Hitesh Shah <[email protected]> wrote:

Hi Robert, 

The 2.0.4 docs are quite old as they seem to be referring to a very old release 
of Tez. The relevant docs should be 
"http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.1.2/bk_installing_manually_book/content/rpm-chap-tez.html”

The minimal changes that you need to do are the following: 
  - follow the basic steps of setting up tez such as uploading jars to HDFS, 
creating the tez-site.xml and updating it to point to the correct path for the 
jar on HDFS
  - change HADOOP_CLASSPATH to have the tez jars in the class path on your 
client machine
  - set hive.execution.engine=tez in hive-site.xml or on your hive shell ( you 
can skip the step of uploading hive-exec jar to HDFS for now as its optional )

Also, “yarn-tez” for mapreduce.framewok.name should not be needed for running 
Hive-on-Tez. It is mainly a way to run MapReduce jobs using the Tez execution 
engine. 

thanks
— Hitesh

On Jun 18, 2014, at 10:24 AM, Grandl Robert <[email protected]> wrote:

> Hi guys,
> 
> I was trying to run hive atop tez atop yarn 2.4. Setting 
> mapreduce.framework.name to yarn-tez enables tez execution engine and I can 
> run the orderedwordcount example which comes along tez. 
> 
> However, I also installed Hive-0.13. Simply running a hive query still uses 
> Tez(because it is enabled with mapreduce.framework.name). However, I am not 
> sure it is completely utilizing Tez API's and stuff. In UI I can see that a 
> Tez application is running instead of MapReduce. 
> 
> But looking on the web, it seems there are other steps to enable Hive using 
> Tez or MapReduce framework:
> http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.0.4.0/bk_installing_manually_book/content/rpm-chap-tez-5-4.html
> 
> like setting some HIVE_AUX_JARS_PATH variable, and some properties such as: 
> set hive.use.tez.natively=true;
> set hive.execution.engine=tez; ?
> 
> However, following the steps mentioned in the link works only for the case 
> with disable Tez for Hive queries. 
> http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.0.4.0/bk_installing_manually_book/content/rpm-chap-tez-5-5.html
>  
> Can someone let me know if simply enabling yarn-tez in mapred-site works fine 
> ? Or what is a proper way to enable it ? (Hive -0.13(compiled from trunk), 
> Tez - 0.5(compiled from trunk) and Yarn-24(compiled from trunk). 
> 
> Thanks,
> robert

CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader of 
this message is not the intended recipient, you are hereby notified that any 
printing, copying, dissemination, distribution, disclosure or forwarding of 
this communication is strictly prohibited. If you have received this 
communication in error, please contact the sender immediately and delete it 
from your system. Thank You.

Re: hive + tez + yarn 2.4

Reply via email to