[ 
https://issues.apache.org/jira/browse/YARN-9649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amithsha updated YARN-9649:
---------------------------
    Description: 
In our prod cluster, blacklisting nodes via AM is enabled by default. All of a 
sudden observed the compute fluctuation. On further debugging found that AM is 
blacklisting the nodes. But no list of blacklisted nodes with App Id is logged 
by RM, also curling AM logs are not possible since the number of jobs is huge.

https://issues.apache.org/jira/browse/YARN-4307 

As described in the above the Jira we can view UI. But not in log or RestApi. 
*It will be helpful if we push the list to RestApi or to RM log.*

In RM log we found the followings 

2015-10-28 21:57:35,982 DEBUG 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator:
 Skipping 'host' localhost for application_1446028825407_0002 since it has been 
blacklisted 

But this doesnt contian the information about the App where its container 
failed.

For debugging node related issues we should know the node and appId. 

  was:
In our prod cluster, blacklisting nodes via AM is enabled by default. All of a 
sudden observed the compute fluctuation. On further debugging found that AM is 
blacklisting the nodes. But no list of blacklisted nodes is logged by RM, also 
curling AM logs are not possible since the number of jobs is huge.

https://issues.apache.org/jira/browse/YARN-4307 

As described in the above the Jira we can view UI. But not in log or RestApi. 
*It will be helpful if we push the list to RestApi or to RM log.*

In RM log we found the followings 

2015-10-28 21:57:35,982 DEBUG 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator:
 Skipping 'host' localhost for application_1446028825407_0002 since it has been 
blacklisted 

But this doesnt contian the information about the App where its container 
failed.

 


> Display Blacklisted Nodes By AM in RM Log and AM Rest api
> ---------------------------------------------------------
>
>                 Key: YARN-9649
>                 URL: https://issues.apache.org/jira/browse/YARN-9649
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: resourcemanager
>    Affects Versions: 2.9.0
>            Reporter: Amithsha
>            Priority: Major
>
> In our prod cluster, blacklisting nodes via AM is enabled by default. All of 
> a sudden observed the compute fluctuation. On further debugging found that AM 
> is blacklisting the nodes. But no list of blacklisted nodes with App Id is 
> logged by RM, also curling AM logs are not possible since the number of jobs 
> is huge.
> https://issues.apache.org/jira/browse/YARN-4307 
> As described in the above the Jira we can view UI. But not in log or RestApi. 
> *It will be helpful if we push the list to RestApi or to RM log.*
> In RM log we found the followings 
> 2015-10-28 21:57:35,982 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator:
>  Skipping 'host' localhost for application_1446028825407_0002 since it has 
> been blacklisted 
> But this doesnt contian the information about the App where its container 
> failed.
> For debugging node related issues we should know the node and appId. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to