Chengshun Xia created YARN-3775:
-----------------------------------

             Summary: Job does not exit after all node become unhealthy
                 Key: YARN-3775
                 URL: https://issues.apache.org/jira/browse/YARN-3775
             Project: Hadoop YARN
          Issue Type: Bug
          Components: resourcemanager
    Affects Versions: 2.7.1
         Environment: Environment:
Version : 2.7.0
OS: RHEL7 
NameNodes:  xiachsh11 xiachsh12 (HA enabled)
DataNodes:  5 xiachsh13-17
ResourceManage:  xiachsh11
NodeManage: 5 xiachsh13-17 
all nodes are openstack provisioned:  
MEM: 1.5G 
Disk: 16G 

            Reporter: Chengshun Xia


Running Terasort with data size 10G, all the containers exit since the disk 
space threshold 0.90 reached,at this point,the job does not exit with error 
15/06/05 13:13:28 INFO mapreduce.Job:  map 9% reduce 0%
15/06/05 13:13:52 INFO mapreduce.Job:  map 10% reduce 0%
15/06/05 13:14:30 INFO mapreduce.Job:  map 11% reduce 0%
15/06/05 13:15:11 INFO mapreduce.Job:  map 12% reduce 0%
15/06/05 13:15:43 INFO mapreduce.Job:  map 13% reduce 0%
15/06/05 13:16:38 INFO mapreduce.Job:  map 14% reduce 0%
15/06/05 13:16:41 INFO mapreduce.Job:  map 15% reduce 0%
15/06/05 13:16:53 INFO mapreduce.Job:  map 16% reduce 0%
15/06/05 13:17:24 INFO mapreduce.Job:  map 17% reduce 0%
15/06/05 13:17:53 INFO mapreduce.Job:  map 18% reduce 0%
15/06/05 13:18:36 INFO mapreduce.Job:  map 19% reduce 0%
15/06/05 13:19:03 INFO mapreduce.Job:  map 20% reduce 0%
15/06/05 13:19:09 INFO mapreduce.Job:  map 15% reduce 0%
15/06/05 13:19:32 INFO mapreduce.Job:  map 16% reduce 0%
15/06/05 13:20:00 INFO mapreduce.Job:  map 17% reduce 0%
15/06/05 13:20:36 INFO mapreduce.Job:  map 18% reduce 0%
15/06/05 13:20:57 INFO mapreduce.Job:  map 19% reduce 0%
15/06/05 13:21:22 INFO mapreduce.Job:  map 18% reduce 0%
15/06/05 13:21:24 INFO mapreduce.Job:  map 14% reduce 0%
15/06/05 13:21:25 INFO mapreduce.Job:  map 9% reduce 0%
15/06/05 13:21:28 INFO mapreduce.Job:  map 10% reduce 0%
15/06/05 13:22:22 INFO mapreduce.Job:  map 11% reduce 0%
15/06/05 13:23:06 INFO mapreduce.Job:  map 12% reduce 0%
15/06/05 13:23:41 INFO mapreduce.Job:  map 9% reduce 0%
15/06/05 13:23:42 INFO mapreduce.Job:  map 5% reduce 0%
15/06/05 13:24:38 INFO mapreduce.Job:  map 6% reduce 0%
15/06/05 13:25:16 INFO mapreduce.Job:  map 7% reduce 0%
15/06/05 13:25:53 INFO mapreduce.Job:  map 8% reduce 0%
15/06/05 13:26:35 INFO mapreduce.Job:  map 9% reduce 0%



the last response time is  15/06/05 13:26:35
and current time :
[root@xiachsh11 logs]# date
Fri Jun  5 19:19:59 EDT 2015
[root@xiachsh11 logs]#

[root@xiachsh11 logs]# yarn node -list
15/06/05 19:20:18 INFO client.RMProxy: Connecting to ResourceManager at 
xiachsh11.eng.platformlab.ibm.com/9.21.62.234:8032
Total Nodes:0
         Node-Id             Node-State Node-Http-Address       
Number-of-Running-Containers
[root@xiachsh11 logs]#







--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to