Could you file a HIVE jira for this? The session scratch dir should not be 
deleted after a query completes. We can pull in folks from both the respective 
Hive and Tez communities to debug why this is happening. 

The Tez AM only tries to delete the staging data on AM shutdown which in this 
scenario failed to do so: 

2015-06-17 18:00:48,926 INFO [Thread-1] app.DAGAppMaster: The shutdown handler 
is still running, waiting for it to complete
2015-06-17 18:00:48,936 WARN [AMShutdownThread] app.DAGAppMaster: Failed to 
delete tez scratch data dir, 
path=hdfs://yhd-jqhadoop2.int.yihaodian.com:8020/tmp/hive/lujian/_tez_session_dir/86bc0010-4816-4251-95aa-bb37b8d029da/.tez/application_1433219182593_180456

thanks
— Hitesh


On Jun 17, 2015, at 7:57 PM, [email protected] wrote:

> Here is hive log:
> Status: Running (Executing on YARN cluster with App id 
> application_1433219182593_180456) 
> 
> Map 1: -/-    Reducer 2: 0/5  Reducer 3: 0/5  
> Map 1: 0(+1)/1        Reducer 2: 0/5  Reducer 3: 0/5  
> Map 1: 1/1    Reducer 2: 0(+1)/5      Reducer 3: 0/5  
> Map 1: 1/1    Reducer 2: 0(+5)/5      Reducer 3: 0/5  
> Map 1: 1/1    Reducer 2: 2(+3)/5      Reducer 3: 0/5  
> Map 1: 1/1    Reducer 2: 4(+1)/5      Reducer 3: 0(+5)/5      
> Map 1: 1/1    Reducer 2: 5/5  Reducer 3: 0(+5)/5      
> Map 1: 1/1    Reducer 2: 5/5  Reducer 3: 5/5  
> Loading data to table testtmp.tmp_pm_cpttr_hot_srch partition (cur_flg=0, 
> ds=2015-06-16) 
> Partition testtmp.tmp_pm_cpttr_hot_srch{cur_flg=0, ds=2015-06-16} stats: 
> [numFiles=5, numRows=0, totalSize=0, rawDataSize=0] 
> OK 
> Time taken: 3.885 seconds 
> OK 
> Time taken: 0.266 seconds 
> OK 
> Time taken: 0.067 seconds 
> Query ID = lujian_20150617180000_f048ad51-d72f-458f-8480-bef366606a68 
> Total jobs = 1 
> Launching Job 1 out of 1 
> 
> 
> Status: Running (Executing on YARN cluster with App id 
> application_1433219182593_180456) 
> 
> Map 1: 0/1    Map 2: -/-      
> Map 1: 0/1    Map 2: 0/1      
> Map 1: 0/1    Map 2: 0/1      
> Map 1: 0(+0,-1)/1     Map 2: 0(+0,-1)/1       
> Map 1: 0(+0,-1)/1     Map 2: 0(+0,-1)/1       
> Map 1: 0(+0,-2)/1     Map 2: 0(+0,-2)/1       
> Map 1: 0(+0,-2)/1     Map 2: 0(+0,-2)/1       
> Map 1: 0(+0,-3)/1     Map 2: 0(+0,-3)/1       
> Status: Failed 
> Vertex failed, vertexName=Map 2, vertexId=vertex_1433219182593_180456_3_01, 
> diagnostics=[Task failed, taskId=task_1433219182593_180456_3_01_000000, 
> diagnostics=[TaskAttempt 0 failed, info=[Container 
> container_1433219182593_180456_01_000014 finished with diagnostics set to 
> [Container failed. File does not exist: 
> hdfs://yhd-jqhadoop2.int.yihaodian.com:8020/tmp/hive/lujian/_tez_session_dir/86bc0010-4816-4251-95aa-bb37b8d029da/.tez/application_1433219182593_180456/tez-conf.pb
>  
> ]], TaskAttempt 1 failed, info=[Container 
> container_1433219182593_180456_01_000016 finished with diagnostics set to 
> [Container failed. File does not exist: 
> hdfs://yhd-jqhadoop2.int.yihaodian.com:8020/tmp/hive/lujian/_tez_session_dir/86bc0010-4816-4251-95aa-bb37b8d029da/.tez/application_1433219182593_180456/tez-conf.pb
>  
> ]], TaskAttempt 2 failed, info=[Container 
> container_1433219182593_180456_01_000018 finished with diagnostics set to 
> [Container failed. File does not exist: 
> hdfs://yhd-jqhadoop2.int.yihaodian.com:8020/tmp/hive/lujian/_tez_session_dir/86bc0010-4816-4251-95aa-bb37b8d029da/.tez/application_1433219182593_180456/tez-conf.pb
>  
> ]], TaskAttempt 3 failed, info=[Container 
> container_1433219182593_180456_01_000020 finished with diagnostics set to 
> [Container failed. File does not exist: 
> hdfs://yhd-jqhadoop2.int.yihaodian.com:8020/tmp/hive/lujian/_tez_session_dir/86bc0010-4816-4251-95aa-bb37b8d029da/.tez/application_1433219182593_180456/tez-conf.pb
>  
> ]]], Vertex failed as one or more tasks failed. failedTasks:1, Vertex 
> vertex_1433219182593_180456_3_01 [Map 2] killed/failed due to:null] 
> Vertex killed, vertexName=Map 1, vertexId=vertex_1433219182593_180456_3_00, 
> diagnostics=[Vertex received Kill while in RUNNING state., Vertex killed as 
> other vertex failed. failedTasks:0, Vertex vertex_1433219182593_180456_3_00 
> [Map 1] killed/failed due to:null] 
> DAG failed due to vertex failure. failedVertices:1 killedVertices:1 
> FAILED: Execution Error, return code 2 from 
> org.apache.hadoop.hive.ql.exec.tez.TezTask
> 
> I think maybe first successfule job delete  tez-conf.pb ?
> 
> [email protected]
>  
> From: Hitesh Shah
> Date: 2015-06-18 10:46
> To: user
> Subject: Re: hive 1.1.0 on tez0.53 error
> That particular log is a red herring and not really an issue that is causing 
> the failure.
>  
> The main problem based on the log is this:
>  
> 2015-06-17 18:00:43,543 INFO [AsyncDispatcher event handler] 
> history.HistoryEventHandler: 
> [HISTORY][DAG:dag_1433219182593_180456_3][Event:DAG_FINISHED]: 
> dagId=dag_1433219182593_180456_3, startTime=1434535228467, 
> finishTime=1434535243529, timeTaken=15062, status=FAILED, diagnostics=Vertex 
> failed, vertexName=Map 2, vertexId=vertex_1433219182593_180456_3_01, 
> diagnostics=[Task failed, taskId=task_1433219182593_180456_3_01_000000, 
> diagnostics=[TaskAttempt 0 failed, info=[Container 
> container_1433219182593_180456_01_000014 finished with diagnostics set to 
> [Container failed. File does not exist: 
> hdfs://yhd-jqhadoop2.int.yihaodian.com:8020/tmp/hive/lujian/_tez_session_dir/86bc0010-4816-4251-95aa-bb37b8d029da/.tez/application_1433219182593_180456/tez-conf.pb
> ]], TaskAttempt 1 failed, info=[Container 
> container_1433219182593_180456_01_000016 finished with diagnostics set to 
> [Container failed. File does not exist: 
> hdfs://yhd-jqhadoop2.int.yihaodian.com:8020/tmp/hive/lujian/_tez_session_dir/86bc0010-4816-4251-95aa-bb37b8d029da/.tez/application_1433219182593_180456/tez-conf.pb
> ]], TaskAttempt 2 failed, info=[Container 
> container_1433219182593_180456_01_000018 finished with diagnostics set to 
> [Container failed. File does not exist: 
> hdfs://yhd-jqhadoop2.int.yihaodian.com:8020/tmp/hive/lujian/_tez_session_dir/86bc0010-4816-4251-95aa-bb37b8d029da/.tez/application_1433219182593_180456/tez-conf.pb
> ]], TaskAttempt 3 failed, info=[Container 
> container_1433219182593_180456_01_000020 finished with diagnostics set to 
> [Container failed. File does not exist: 
> hdfs://yhd-jqhadoop2.int.yihaodian.com:8020/tmp/hive/lujian/_tez_session_dir/86bc0010-4816-4251-95aa-bb37b8d029da/.tez/application_1433219182593_180456/tez-conf.pb
> ]]], Vertex failed as one or more tasks failed. failedTasks:1, Vertex 
> vertex_1433219182593_180456_3_01 [Map 2] killed/failed due to:null]
> Vertex killed, vertexName=Map 1, vertexId=vertex_1433219182593_180456_3_00, 
> diagnostics=[Vertex received Kill while in RUNNING state., Vertex killed as 
> other vertex failed. failedTasks:0, Vertex vertex_1433219182593_180456_3_00 
> [Map 1] killed/failed due to:null]
> DAG failed due to vertex failure. failedVertices:1 killedVertices:1, 
> counters=Counters: 2, org.apache.tez.common.counters.DAGCounter, 
> NUM_FAILED_TASKS=7, NUM_KILLED_TASKS=1
>  
> While the dag was running, it seems like the local resources ( tez-conf.pb ) 
> needed for the YARN container disappeared and as a result, container launches 
> started failing eventually leading to a dag failure.
>  
> — Hitesh
>  
>  
> On Jun 17, 2015, at 7:31 PM, [email protected] wrote:
>  
> >  Hive return:
> >  Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask
> > 
> >  I check log found :
> > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
> >  No lease on 
> > /tmp/hive/lujian/_tez_session_dir/86bc0010-4816-4251-95aa-bb37b8d029da/.tez/application_1433219182593_180456/recovery/1/summary:
> >  File does not exist. Holder DFSClient_NONMAPREDUCE_-1030523577_1 does not 
> > have any open files.
> > at 
> > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2938)
> > at 
> > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:3002)
> > at 
> > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:2982)
> > at 
> > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:626)
> >
> >
> >
> > More detail log see my attach.
> >
> > [email protected]
> > <tezerror.rar>

Reply via email to