[
https://issues.apache.org/jira/browse/YARN-9649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Amithsha updated YARN-9649:
---------------------------
Description:
In our prod cluster, blacklisting nodes via AM is enabled by default. All of a
sudden observed the compute fluctuation. On further debugging found that AM is
blacklisting the nodes. But no list of blacklisted nodes with App Id is logged
by RM, also curling AM logs are not possible since the number of jobs is huge.
https://issues.apache.org/jira/browse/YARN-4307
As described in the above the Jira we can view UI. But not in log or RestApi.
*It will be helpful if we push the list to RestApi or to RM log.*
In RM log we found the followings
2015-10-28 21:57:35,982 DEBUG
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator:
Skipping 'host' localhost for application_1446028825407_0002 since it has been
blacklisted
But this doesnt contian the information about the App where its container
failed.
For debugging node related issues we should know the node and appId.
was:
In our prod cluster, blacklisting nodes via AM is enabled by default. All of a
sudden observed the compute fluctuation. On further debugging found that AM is
blacklisting the nodes. But no list of blacklisted nodes is logged by RM, also
curling AM logs are not possible since the number of jobs is huge.
https://issues.apache.org/jira/browse/YARN-4307
As described in the above the Jira we can view UI. But not in log or RestApi.
*It will be helpful if we push the list to RestApi or to RM log.*
In RM log we found the followings
2015-10-28 21:57:35,982 DEBUG
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator:
Skipping 'host' localhost for application_1446028825407_0002 since it has been
blacklisted
But this doesnt contian the information about the App where its container
failed.
> Display Blacklisted Nodes By AM in RM Log and AM Rest api
> ---------------------------------------------------------
>
> Key: YARN-9649
> URL: https://issues.apache.org/jira/browse/YARN-9649
> Project: Hadoop YARN
> Issue Type: Improvement
> Components: resourcemanager
> Affects Versions: 2.9.0
> Reporter: Amithsha
> Priority: Major
>
> In our prod cluster, blacklisting nodes via AM is enabled by default. All of
> a sudden observed the compute fluctuation. On further debugging found that AM
> is blacklisting the nodes. But no list of blacklisted nodes with App Id is
> logged by RM, also curling AM logs are not possible since the number of jobs
> is huge.
> https://issues.apache.org/jira/browse/YARN-4307
> As described in the above the Jira we can view UI. But not in log or RestApi.
> *It will be helpful if we push the list to RestApi or to RM log.*
> In RM log we found the followings
> 2015-10-28 21:57:35,982 DEBUG
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator:
> Skipping 'host' localhost for application_1446028825407_0002 since it has
> been blacklisted
> But this doesnt contian the information about the App where its container
> failed.
> For debugging node related issues we should know the node and appId.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]