Kartik Bhatia created YARN-8345:
-----------------------------------

             Summary: NodeHealthCheckerService to differentiate between reason 
for UnusableNodes for client to act suitably on it
                 Key: YARN-8345
                 URL: https://issues.apache.org/jira/browse/YARN-8345
             Project: Hadoop YARN
          Issue Type: New Feature
          Components: nodemanager
            Reporter: Kartik Bhatia


+*Current Scenario :*+ 

NodeHealthCheckerService marks a node Unhealthy on basis of 2 things : 
 # External Script
 # Directory status

If a directory is marked as full(as per DiskCheck configs in yarn-site), node 
manager marks this as unhealthy. 

Once a node is marked unhealthy, mapreduce launches all the map tasks that ran 
on this usable node. This leads to even successful tasks being relaunched.

+{color:#333333}*Problem :*{color}+

{color:#333333}We do not have distinction between disk limit to stop container 
launch on that node and limit so that reducer can read data from that 
node.{color}

{color:#333333}For Example : {color}

{color:#333333}Let us consider a 3 TB disk. If we set max disk utilisation 
percentage as 95% (since launch of container requires approx 0.15 TB for jobs 
in our cluster) and there are few nodes where disk utilisation is say 96%, the 
threshold will be breached. These nodes will be marked unhealthy by 
NodeManager. This will result in all successful mappers being relaunched on 
other nodes. But still 4% memory is good enough for reducers to read that data. 
This causes unnecessary delay in our jobs. (Mappers launching again can preempt 
reducers if there is crunch for space and there are issues with calculating 
Headroom in Capacity scheduler as well){color}

 

+*Correction :*+

We need a state (say UNUSABLE_WRITE) that can let mapreduce know that node is 
still good for reading data and successful mappers should not be relaunched. 
This can prevent delay.

  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

Reply via email to