Chengshun Xia created YARN-3775: ----------------------------------- Summary: Job does not exit after all node become unhealthy Key: YARN-3775 URL: https://issues.apache.org/jira/browse/YARN-3775 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.7.1 Environment: Environment: Version : 2.7.0 OS: RHEL7 NameNodes: xiachsh11 xiachsh12 (HA enabled) DataNodes: 5 xiachsh13-17 ResourceManage: xiachsh11 NodeManage: 5 xiachsh13-17 all nodes are openstack provisioned: MEM: 1.5G Disk: 16G
Reporter: Chengshun Xia Running Terasort with data size 10G, all the containers exit since the disk space threshold 0.90 reached,at this point,the job does not exit with error 15/06/05 13:13:28 INFO mapreduce.Job: map 9% reduce 0% 15/06/05 13:13:52 INFO mapreduce.Job: map 10% reduce 0% 15/06/05 13:14:30 INFO mapreduce.Job: map 11% reduce 0% 15/06/05 13:15:11 INFO mapreduce.Job: map 12% reduce 0% 15/06/05 13:15:43 INFO mapreduce.Job: map 13% reduce 0% 15/06/05 13:16:38 INFO mapreduce.Job: map 14% reduce 0% 15/06/05 13:16:41 INFO mapreduce.Job: map 15% reduce 0% 15/06/05 13:16:53 INFO mapreduce.Job: map 16% reduce 0% 15/06/05 13:17:24 INFO mapreduce.Job: map 17% reduce 0% 15/06/05 13:17:53 INFO mapreduce.Job: map 18% reduce 0% 15/06/05 13:18:36 INFO mapreduce.Job: map 19% reduce 0% 15/06/05 13:19:03 INFO mapreduce.Job: map 20% reduce 0% 15/06/05 13:19:09 INFO mapreduce.Job: map 15% reduce 0% 15/06/05 13:19:32 INFO mapreduce.Job: map 16% reduce 0% 15/06/05 13:20:00 INFO mapreduce.Job: map 17% reduce 0% 15/06/05 13:20:36 INFO mapreduce.Job: map 18% reduce 0% 15/06/05 13:20:57 INFO mapreduce.Job: map 19% reduce 0% 15/06/05 13:21:22 INFO mapreduce.Job: map 18% reduce 0% 15/06/05 13:21:24 INFO mapreduce.Job: map 14% reduce 0% 15/06/05 13:21:25 INFO mapreduce.Job: map 9% reduce 0% 15/06/05 13:21:28 INFO mapreduce.Job: map 10% reduce 0% 15/06/05 13:22:22 INFO mapreduce.Job: map 11% reduce 0% 15/06/05 13:23:06 INFO mapreduce.Job: map 12% reduce 0% 15/06/05 13:23:41 INFO mapreduce.Job: map 9% reduce 0% 15/06/05 13:23:42 INFO mapreduce.Job: map 5% reduce 0% 15/06/05 13:24:38 INFO mapreduce.Job: map 6% reduce 0% 15/06/05 13:25:16 INFO mapreduce.Job: map 7% reduce 0% 15/06/05 13:25:53 INFO mapreduce.Job: map 8% reduce 0% 15/06/05 13:26:35 INFO mapreduce.Job: map 9% reduce 0% the last response time is 15/06/05 13:26:35 and current time : [root@xiachsh11 logs]# date Fri Jun 5 19:19:59 EDT 2015 [root@xiachsh11 logs]# [root@xiachsh11 logs]# yarn node -list 15/06/05 19:20:18 INFO client.RMProxy: Connecting to ResourceManager at xiachsh11.eng.platformlab.ibm.com/9.21.62.234:8032 Total Nodes:0 Node-Id Node-State Node-Http-Address Number-of-Running-Containers [root@xiachsh11 logs]# -- This message was sent by Atlassian JIRA (v6.3.4#6332)