Re: hive 1.1.0 on tez0.53 error

2015-06-19 Thread Hitesh Shah
Yes something along those lines.

This might help a bit more: http://techtonka.com/?p=174

thanks
— Hitesh

On Jun 18, 2015, at 6:44 PM, r7raul1...@163.com wrote:

 
 log like this hdfs-audit.log.9 ?
 r7raul1...@163.com
  
 From: Hitesh Shah
 Date: 2015-06-19 02:28
 To: user
 Subject: Re: hive 1.1.0 on tez0.53 error
 Also, if you have access to the name node audit logs, can you search for all 
 accesses of 
 /tmp/hive/lujian/_tez_session_dir/86bc0010-4816-4251-95aa-bb37b8d029da/“ 
 directory and see if/when someone tried to delete it?
  
 thanks
 — Hitesh
  
 On Jun 17, 2015, at 7:57 PM, r7raul1...@163.com wrote:
  
  Here is hive log:
  Status: Running (Executing on YARN cluster with App id 
  application_1433219182593_180456)
 
  Map 1: -/-  Reducer 2: 0/5  Reducer 3: 0/5
  Map 1: 0(+1)/1  Reducer 2: 0/5  Reducer 3: 0/5
  Map 1: 1/1  Reducer 2: 0(+1)/5  Reducer 3: 0/5
  Map 1: 1/1  Reducer 2: 0(+5)/5  Reducer 3: 0/5
  Map 1: 1/1  Reducer 2: 2(+3)/5  Reducer 3: 0/5
  Map 1: 1/1  Reducer 2: 4(+1)/5  Reducer 3: 0(+5)/5
  Map 1: 1/1  Reducer 2: 5/5  Reducer 3: 0(+5)/5
  Map 1: 1/1  Reducer 2: 5/5  Reducer 3: 5/5
  Loading data to table testtmp.tmp_pm_cpttr_hot_srch partition (cur_flg=0, 
  ds=2015-06-16)
  Partition testtmp.tmp_pm_cpttr_hot_srch{cur_flg=0, ds=2015-06-16} stats: 
  [numFiles=5, numRows=0, totalSize=0, rawDataSize=0]
  OK
  Time taken: 3.885 seconds
  OK
  Time taken: 0.266 seconds
  OK
  Time taken: 0.067 seconds
  Query ID = lujian_2015061718_f048ad51-d72f-458f-8480-bef366606a68
  Total jobs = 1
  Launching Job 1 out of 1
 
 
  Status: Running (Executing on YARN cluster with App id 
  application_1433219182593_180456)
 
  Map 1: 0/1  Map 2: -/-
  Map 1: 0/1  Map 2: 0/1
  Map 1: 0/1  Map 2: 0/1
  Map 1: 0(+0,-1)/1   Map 2: 0(+0,-1)/1
  Map 1: 0(+0,-1)/1   Map 2: 0(+0,-1)/1
  Map 1: 0(+0,-2)/1   Map 2: 0(+0,-2)/1
  Map 1: 0(+0,-2)/1   Map 2: 0(+0,-2)/1
  Map 1: 0(+0,-3)/1   Map 2: 0(+0,-3)/1
  Status: Failed
  Vertex failed, vertexName=Map 2, vertexId=vertex_1433219182593_180456_3_01, 
  diagnostics=[Task failed, taskId=task_1433219182593_180456_3_01_00, 
  diagnostics=[TaskAttempt 0 failed, info=[Container 
  container_1433219182593_180456_01_14 finished with diagnostics set to 
  [Container failed. File does not exist: 
  hdfs://yhd-jqhadoop2.int.yihaodian.com:8020/tmp/hive/lujian/_tez_session_dir/86bc0010-4816-4251-95aa-bb37b8d029da/.tez/application_1433219182593_180456/tez-conf.pb
  ]], TaskAttempt 1 failed, info=[Container 
  container_1433219182593_180456_01_16 finished with diagnostics set to 
  [Container failed. File does not exist: 
  hdfs://yhd-jqhadoop2.int.yihaodian.com:8020/tmp/hive/lujian/_tez_session_dir/86bc0010-4816-4251-95aa-bb37b8d029da/.tez/application_1433219182593_180456/tez-conf.pb
  ]], TaskAttempt 2 failed, info=[Container 
  container_1433219182593_180456_01_18 finished with diagnostics set to 
  [Container failed. File does not exist: 
  hdfs://yhd-jqhadoop2.int.yihaodian.com:8020/tmp/hive/lujian/_tez_session_dir/86bc0010-4816-4251-95aa-bb37b8d029da/.tez/application_1433219182593_180456/tez-conf.pb
  ]], TaskAttempt 3 failed, info=[Container 
  container_1433219182593_180456_01_20 finished with diagnostics set to 
  [Container failed. File does not exist: 
  hdfs://yhd-jqhadoop2.int.yihaodian.com:8020/tmp/hive/lujian/_tez_session_dir/86bc0010-4816-4251-95aa-bb37b8d029da/.tez/application_1433219182593_180456/tez-conf.pb
  ]]], Vertex failed as one or more tasks failed. failedTasks:1, Vertex 
  vertex_1433219182593_180456_3_01 [Map 2] killed/failed due to:null]
  Vertex killed, vertexName=Map 1, vertexId=vertex_1433219182593_180456_3_00, 
  diagnostics=[Vertex received Kill while in RUNNING state., Vertex killed as 
  other vertex failed. failedTasks:0, Vertex vertex_1433219182593_180456_3_00 
  [Map 1] killed/failed due to:null]
  DAG failed due to vertex failure. failedVertices:1 killedVertices:1
  FAILED: Execution Error, return code 2 from 
  org.apache.hadoop.hive.ql.exec.tez.TezTask
 
  I think maybe first successfule job delete  tez-conf.pb ?
 
  r7raul1...@163.com
  
  From: Hitesh Shah
  Date: 2015-06-18 10:46
  To: user
  Subject: Re: hive 1.1.0 on tez0.53 error
  That particular log is a red herring and not really an issue that is 
  causing the failure.
  
  The main problem based on the log is this:
  
  2015-06-17 18:00:43,543 INFO [AsyncDispatcher event handler] 
  history.HistoryEventHandler: 
  [HISTORY][DAG:dag_1433219182593_180456_3][Event:DAG_FINISHED]: 
  dagId=dag_1433219182593_180456_3, startTime=1434535228467, 
  finishTime=1434535243529, timeTaken=15062, status=FAILED, 
  diagnostics=Vertex failed, vertexName=Map 2, 
  vertexId=vertex_1433219182593_180456_3_01, diagnostics=[Task failed, 
  taskId=task_1433219182593_180456_3_01_00, diagnostics=[TaskAttempt 0 
  failed, info=[Container container_1433219182593_180456_01_14 finished 
  with diagnostics set to [Container failed. File does

Re: hive 1.1.0 on tez0.53 error

2015-06-18 Thread Hitesh Shah
Also, if you have access to the name node audit logs, can you search for all 
accesses of 
/tmp/hive/lujian/_tez_session_dir/86bc0010-4816-4251-95aa-bb37b8d029da/“ 
directory and see if/when someone tried to delete it? 

thanks
— Hitesh

On Jun 17, 2015, at 7:57 PM, r7raul1...@163.com wrote:

 Here is hive log:
 Status: Running (Executing on YARN cluster with App id 
 application_1433219182593_180456) 
 
 Map 1: -/-Reducer 2: 0/5  Reducer 3: 0/5  
 Map 1: 0(+1)/1Reducer 2: 0/5  Reducer 3: 0/5  
 Map 1: 1/1Reducer 2: 0(+1)/5  Reducer 3: 0/5  
 Map 1: 1/1Reducer 2: 0(+5)/5  Reducer 3: 0/5  
 Map 1: 1/1Reducer 2: 2(+3)/5  Reducer 3: 0/5  
 Map 1: 1/1Reducer 2: 4(+1)/5  Reducer 3: 0(+5)/5  
 Map 1: 1/1Reducer 2: 5/5  Reducer 3: 0(+5)/5  
 Map 1: 1/1Reducer 2: 5/5  Reducer 3: 5/5  
 Loading data to table testtmp.tmp_pm_cpttr_hot_srch partition (cur_flg=0, 
 ds=2015-06-16) 
 Partition testtmp.tmp_pm_cpttr_hot_srch{cur_flg=0, ds=2015-06-16} stats: 
 [numFiles=5, numRows=0, totalSize=0, rawDataSize=0] 
 OK 
 Time taken: 3.885 seconds 
 OK 
 Time taken: 0.266 seconds 
 OK 
 Time taken: 0.067 seconds 
 Query ID = lujian_2015061718_f048ad51-d72f-458f-8480-bef366606a68 
 Total jobs = 1 
 Launching Job 1 out of 1 
 
 
 Status: Running (Executing on YARN cluster with App id 
 application_1433219182593_180456) 
 
 Map 1: 0/1Map 2: -/-  
 Map 1: 0/1Map 2: 0/1  
 Map 1: 0/1Map 2: 0/1  
 Map 1: 0(+0,-1)/1 Map 2: 0(+0,-1)/1   
 Map 1: 0(+0,-1)/1 Map 2: 0(+0,-1)/1   
 Map 1: 0(+0,-2)/1 Map 2: 0(+0,-2)/1   
 Map 1: 0(+0,-2)/1 Map 2: 0(+0,-2)/1   
 Map 1: 0(+0,-3)/1 Map 2: 0(+0,-3)/1   
 Status: Failed 
 Vertex failed, vertexName=Map 2, vertexId=vertex_1433219182593_180456_3_01, 
 diagnostics=[Task failed, taskId=task_1433219182593_180456_3_01_00, 
 diagnostics=[TaskAttempt 0 failed, info=[Container 
 container_1433219182593_180456_01_14 finished with diagnostics set to 
 [Container failed. File does not exist: 
 hdfs://yhd-jqhadoop2.int.yihaodian.com:8020/tmp/hive/lujian/_tez_session_dir/86bc0010-4816-4251-95aa-bb37b8d029da/.tez/application_1433219182593_180456/tez-conf.pb
  
 ]], TaskAttempt 1 failed, info=[Container 
 container_1433219182593_180456_01_16 finished with diagnostics set to 
 [Container failed. File does not exist: 
 hdfs://yhd-jqhadoop2.int.yihaodian.com:8020/tmp/hive/lujian/_tez_session_dir/86bc0010-4816-4251-95aa-bb37b8d029da/.tez/application_1433219182593_180456/tez-conf.pb
  
 ]], TaskAttempt 2 failed, info=[Container 
 container_1433219182593_180456_01_18 finished with diagnostics set to 
 [Container failed. File does not exist: 
 hdfs://yhd-jqhadoop2.int.yihaodian.com:8020/tmp/hive/lujian/_tez_session_dir/86bc0010-4816-4251-95aa-bb37b8d029da/.tez/application_1433219182593_180456/tez-conf.pb
  
 ]], TaskAttempt 3 failed, info=[Container 
 container_1433219182593_180456_01_20 finished with diagnostics set to 
 [Container failed. File does not exist: 
 hdfs://yhd-jqhadoop2.int.yihaodian.com:8020/tmp/hive/lujian/_tez_session_dir/86bc0010-4816-4251-95aa-bb37b8d029da/.tez/application_1433219182593_180456/tez-conf.pb
  
 ]]], Vertex failed as one or more tasks failed. failedTasks:1, Vertex 
 vertex_1433219182593_180456_3_01 [Map 2] killed/failed due to:null] 
 Vertex killed, vertexName=Map 1, vertexId=vertex_1433219182593_180456_3_00, 
 diagnostics=[Vertex received Kill while in RUNNING state., Vertex killed as 
 other vertex failed. failedTasks:0, Vertex vertex_1433219182593_180456_3_00 
 [Map 1] killed/failed due to:null] 
 DAG failed due to vertex failure. failedVertices:1 killedVertices:1 
 FAILED: Execution Error, return code 2 from 
 org.apache.hadoop.hive.ql.exec.tez.TezTask
 
 I think maybe first successfule job delete  tez-conf.pb ?
 
 r7raul1...@163.com
  
 From: Hitesh Shah
 Date: 2015-06-18 10:46
 To: user
 Subject: Re: hive 1.1.0 on tez0.53 error
 That particular log is a red herring and not really an issue that is causing 
 the failure.
  
 The main problem based on the log is this:
  
 2015-06-17 18:00:43,543 INFO [AsyncDispatcher event handler] 
 history.HistoryEventHandler: 
 [HISTORY][DAG:dag_1433219182593_180456_3][Event:DAG_FINISHED]: 
 dagId=dag_1433219182593_180456_3, startTime=1434535228467, 
 finishTime=1434535243529, timeTaken=15062, status=FAILED, diagnostics=Vertex 
 failed, vertexName=Map 2, vertexId=vertex_1433219182593_180456_3_01, 
 diagnostics=[Task failed, taskId=task_1433219182593_180456_3_01_00, 
 diagnostics=[TaskAttempt 0 failed, info=[Container 
 container_1433219182593_180456_01_14 finished with diagnostics set to 
 [Container failed. File does not exist: 
 hdfs://yhd-jqhadoop2.int.yihaodian.com:8020/tmp/hive/lujian/_tez_session_dir/86bc0010-4816-4251-95aa-bb37b8d029da/.tez/application_1433219182593_180456/tez-conf.pb
 ]], TaskAttempt 1 failed, info=[Container 
 container_1433219182593_180456_01_16 finished

hive 1.1.0 on tez0.53 error

2015-06-17 Thread r7raul1...@163.com
 Hive return: 
 Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask
 
 I check log found :
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
 No lease on 
/tmp/hive/lujian/_tez_session_dir/86bc0010-4816-4251-95aa-bb37b8d029da/.tez/application_1433219182593_180456/recovery/1/summary:
 File does not exist. Holder DFSClient_NONMAPREDUCE_-1030523577_1 does not have 
any open files. 
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2938)
 
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:3002)
 
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:2982)
 
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:626)
 



More detail log see my attach.



r7raul1...@163.com


tezerror.rar
Description: Binary data