Re: hive 1.1.0 on tez0.53 error
Yes something along those lines. This might help a bit more: http://techtonka.com/?p=174 thanks — Hitesh On Jun 18, 2015, at 6:44 PM, r7raul1...@163.com wrote: log like this hdfs-audit.log.9 ? r7raul1...@163.com From: Hitesh Shah Date: 2015-06-19 02:28 To: user Subject: Re: hive 1.1.0 on tez0.53 error Also, if you have access to the name node audit logs, can you search for all accesses of /tmp/hive/lujian/_tez_session_dir/86bc0010-4816-4251-95aa-bb37b8d029da/“ directory and see if/when someone tried to delete it? thanks — Hitesh On Jun 17, 2015, at 7:57 PM, r7raul1...@163.com wrote: Here is hive log: Status: Running (Executing on YARN cluster with App id application_1433219182593_180456) Map 1: -/- Reducer 2: 0/5 Reducer 3: 0/5 Map 1: 0(+1)/1 Reducer 2: 0/5 Reducer 3: 0/5 Map 1: 1/1 Reducer 2: 0(+1)/5 Reducer 3: 0/5 Map 1: 1/1 Reducer 2: 0(+5)/5 Reducer 3: 0/5 Map 1: 1/1 Reducer 2: 2(+3)/5 Reducer 3: 0/5 Map 1: 1/1 Reducer 2: 4(+1)/5 Reducer 3: 0(+5)/5 Map 1: 1/1 Reducer 2: 5/5 Reducer 3: 0(+5)/5 Map 1: 1/1 Reducer 2: 5/5 Reducer 3: 5/5 Loading data to table testtmp.tmp_pm_cpttr_hot_srch partition (cur_flg=0, ds=2015-06-16) Partition testtmp.tmp_pm_cpttr_hot_srch{cur_flg=0, ds=2015-06-16} stats: [numFiles=5, numRows=0, totalSize=0, rawDataSize=0] OK Time taken: 3.885 seconds OK Time taken: 0.266 seconds OK Time taken: 0.067 seconds Query ID = lujian_2015061718_f048ad51-d72f-458f-8480-bef366606a68 Total jobs = 1 Launching Job 1 out of 1 Status: Running (Executing on YARN cluster with App id application_1433219182593_180456) Map 1: 0/1 Map 2: -/- Map 1: 0/1 Map 2: 0/1 Map 1: 0/1 Map 2: 0/1 Map 1: 0(+0,-1)/1 Map 2: 0(+0,-1)/1 Map 1: 0(+0,-1)/1 Map 2: 0(+0,-1)/1 Map 1: 0(+0,-2)/1 Map 2: 0(+0,-2)/1 Map 1: 0(+0,-2)/1 Map 2: 0(+0,-2)/1 Map 1: 0(+0,-3)/1 Map 2: 0(+0,-3)/1 Status: Failed Vertex failed, vertexName=Map 2, vertexId=vertex_1433219182593_180456_3_01, diagnostics=[Task failed, taskId=task_1433219182593_180456_3_01_00, diagnostics=[TaskAttempt 0 failed, info=[Container container_1433219182593_180456_01_14 finished with diagnostics set to [Container failed. File does not exist: hdfs://yhd-jqhadoop2.int.yihaodian.com:8020/tmp/hive/lujian/_tez_session_dir/86bc0010-4816-4251-95aa-bb37b8d029da/.tez/application_1433219182593_180456/tez-conf.pb ]], TaskAttempt 1 failed, info=[Container container_1433219182593_180456_01_16 finished with diagnostics set to [Container failed. File does not exist: hdfs://yhd-jqhadoop2.int.yihaodian.com:8020/tmp/hive/lujian/_tez_session_dir/86bc0010-4816-4251-95aa-bb37b8d029da/.tez/application_1433219182593_180456/tez-conf.pb ]], TaskAttempt 2 failed, info=[Container container_1433219182593_180456_01_18 finished with diagnostics set to [Container failed. File does not exist: hdfs://yhd-jqhadoop2.int.yihaodian.com:8020/tmp/hive/lujian/_tez_session_dir/86bc0010-4816-4251-95aa-bb37b8d029da/.tez/application_1433219182593_180456/tez-conf.pb ]], TaskAttempt 3 failed, info=[Container container_1433219182593_180456_01_20 finished with diagnostics set to [Container failed. File does not exist: hdfs://yhd-jqhadoop2.int.yihaodian.com:8020/tmp/hive/lujian/_tez_session_dir/86bc0010-4816-4251-95aa-bb37b8d029da/.tez/application_1433219182593_180456/tez-conf.pb ]]], Vertex failed as one or more tasks failed. failedTasks:1, Vertex vertex_1433219182593_180456_3_01 [Map 2] killed/failed due to:null] Vertex killed, vertexName=Map 1, vertexId=vertex_1433219182593_180456_3_00, diagnostics=[Vertex received Kill while in RUNNING state., Vertex killed as other vertex failed. failedTasks:0, Vertex vertex_1433219182593_180456_3_00 [Map 1] killed/failed due to:null] DAG failed due to vertex failure. failedVertices:1 killedVertices:1 FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask I think maybe first successfule job delete tez-conf.pb ? r7raul1...@163.com From: Hitesh Shah Date: 2015-06-18 10:46 To: user Subject: Re: hive 1.1.0 on tez0.53 error That particular log is a red herring and not really an issue that is causing the failure. The main problem based on the log is this: 2015-06-17 18:00:43,543 INFO [AsyncDispatcher event handler] history.HistoryEventHandler: [HISTORY][DAG:dag_1433219182593_180456_3][Event:DAG_FINISHED]: dagId=dag_1433219182593_180456_3, startTime=1434535228467, finishTime=1434535243529, timeTaken=15062, status=FAILED, diagnostics=Vertex failed, vertexName=Map 2, vertexId=vertex_1433219182593_180456_3_01, diagnostics=[Task failed, taskId=task_1433219182593_180456_3_01_00, diagnostics=[TaskAttempt 0 failed, info=[Container container_1433219182593_180456_01_14 finished with diagnostics set to [Container failed. File does
Re: hive 1.1.0 on tez0.53 error
Also, if you have access to the name node audit logs, can you search for all accesses of /tmp/hive/lujian/_tez_session_dir/86bc0010-4816-4251-95aa-bb37b8d029da/“ directory and see if/when someone tried to delete it? thanks — Hitesh On Jun 17, 2015, at 7:57 PM, r7raul1...@163.com wrote: Here is hive log: Status: Running (Executing on YARN cluster with App id application_1433219182593_180456) Map 1: -/-Reducer 2: 0/5 Reducer 3: 0/5 Map 1: 0(+1)/1Reducer 2: 0/5 Reducer 3: 0/5 Map 1: 1/1Reducer 2: 0(+1)/5 Reducer 3: 0/5 Map 1: 1/1Reducer 2: 0(+5)/5 Reducer 3: 0/5 Map 1: 1/1Reducer 2: 2(+3)/5 Reducer 3: 0/5 Map 1: 1/1Reducer 2: 4(+1)/5 Reducer 3: 0(+5)/5 Map 1: 1/1Reducer 2: 5/5 Reducer 3: 0(+5)/5 Map 1: 1/1Reducer 2: 5/5 Reducer 3: 5/5 Loading data to table testtmp.tmp_pm_cpttr_hot_srch partition (cur_flg=0, ds=2015-06-16) Partition testtmp.tmp_pm_cpttr_hot_srch{cur_flg=0, ds=2015-06-16} stats: [numFiles=5, numRows=0, totalSize=0, rawDataSize=0] OK Time taken: 3.885 seconds OK Time taken: 0.266 seconds OK Time taken: 0.067 seconds Query ID = lujian_2015061718_f048ad51-d72f-458f-8480-bef366606a68 Total jobs = 1 Launching Job 1 out of 1 Status: Running (Executing on YARN cluster with App id application_1433219182593_180456) Map 1: 0/1Map 2: -/- Map 1: 0/1Map 2: 0/1 Map 1: 0/1Map 2: 0/1 Map 1: 0(+0,-1)/1 Map 2: 0(+0,-1)/1 Map 1: 0(+0,-1)/1 Map 2: 0(+0,-1)/1 Map 1: 0(+0,-2)/1 Map 2: 0(+0,-2)/1 Map 1: 0(+0,-2)/1 Map 2: 0(+0,-2)/1 Map 1: 0(+0,-3)/1 Map 2: 0(+0,-3)/1 Status: Failed Vertex failed, vertexName=Map 2, vertexId=vertex_1433219182593_180456_3_01, diagnostics=[Task failed, taskId=task_1433219182593_180456_3_01_00, diagnostics=[TaskAttempt 0 failed, info=[Container container_1433219182593_180456_01_14 finished with diagnostics set to [Container failed. File does not exist: hdfs://yhd-jqhadoop2.int.yihaodian.com:8020/tmp/hive/lujian/_tez_session_dir/86bc0010-4816-4251-95aa-bb37b8d029da/.tez/application_1433219182593_180456/tez-conf.pb ]], TaskAttempt 1 failed, info=[Container container_1433219182593_180456_01_16 finished with diagnostics set to [Container failed. File does not exist: hdfs://yhd-jqhadoop2.int.yihaodian.com:8020/tmp/hive/lujian/_tez_session_dir/86bc0010-4816-4251-95aa-bb37b8d029da/.tez/application_1433219182593_180456/tez-conf.pb ]], TaskAttempt 2 failed, info=[Container container_1433219182593_180456_01_18 finished with diagnostics set to [Container failed. File does not exist: hdfs://yhd-jqhadoop2.int.yihaodian.com:8020/tmp/hive/lujian/_tez_session_dir/86bc0010-4816-4251-95aa-bb37b8d029da/.tez/application_1433219182593_180456/tez-conf.pb ]], TaskAttempt 3 failed, info=[Container container_1433219182593_180456_01_20 finished with diagnostics set to [Container failed. File does not exist: hdfs://yhd-jqhadoop2.int.yihaodian.com:8020/tmp/hive/lujian/_tez_session_dir/86bc0010-4816-4251-95aa-bb37b8d029da/.tez/application_1433219182593_180456/tez-conf.pb ]]], Vertex failed as one or more tasks failed. failedTasks:1, Vertex vertex_1433219182593_180456_3_01 [Map 2] killed/failed due to:null] Vertex killed, vertexName=Map 1, vertexId=vertex_1433219182593_180456_3_00, diagnostics=[Vertex received Kill while in RUNNING state., Vertex killed as other vertex failed. failedTasks:0, Vertex vertex_1433219182593_180456_3_00 [Map 1] killed/failed due to:null] DAG failed due to vertex failure. failedVertices:1 killedVertices:1 FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask I think maybe first successfule job delete tez-conf.pb ? r7raul1...@163.com From: Hitesh Shah Date: 2015-06-18 10:46 To: user Subject: Re: hive 1.1.0 on tez0.53 error That particular log is a red herring and not really an issue that is causing the failure. The main problem based on the log is this: 2015-06-17 18:00:43,543 INFO [AsyncDispatcher event handler] history.HistoryEventHandler: [HISTORY][DAG:dag_1433219182593_180456_3][Event:DAG_FINISHED]: dagId=dag_1433219182593_180456_3, startTime=1434535228467, finishTime=1434535243529, timeTaken=15062, status=FAILED, diagnostics=Vertex failed, vertexName=Map 2, vertexId=vertex_1433219182593_180456_3_01, diagnostics=[Task failed, taskId=task_1433219182593_180456_3_01_00, diagnostics=[TaskAttempt 0 failed, info=[Container container_1433219182593_180456_01_14 finished with diagnostics set to [Container failed. File does not exist: hdfs://yhd-jqhadoop2.int.yihaodian.com:8020/tmp/hive/lujian/_tez_session_dir/86bc0010-4816-4251-95aa-bb37b8d029da/.tez/application_1433219182593_180456/tez-conf.pb ]], TaskAttempt 1 failed, info=[Container container_1433219182593_180456_01_16 finished
hive 1.1.0 on tez0.53 error
Hive return: Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask I check log found : org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): No lease on /tmp/hive/lujian/_tez_session_dir/86bc0010-4816-4251-95aa-bb37b8d029da/.tez/application_1433219182593_180456/recovery/1/summary: File does not exist. Holder DFSClient_NONMAPREDUCE_-1030523577_1 does not have any open files. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2938) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:3002) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:2982) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:626) More detail log see my attach. r7raul1...@163.com tezerror.rar Description: Binary data