[
https://issues.apache.org/jira/browse/YARN-5356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017663#comment-17017663
]
Brahma Reddy Battula edited comment on YARN-5356 at 1/17/20 2:35 AM:
---------------------------------------------------------------------
thanks to all it's nice to have. There is an issue when rolling
upgrade,PhysicalResource will be null always.
i) Upgrade RM from 2.7 to 3.0.
ii) Upgrade NM from 2.7 to 3.0.
Here when NM re-register,as RMContext already have this nodeID so it will not
added again as httpport also same hence "PhysicalResource" will be always null
in the upgraded cluster till RM restart.
RMNode rmNode = new RMNodeImpl(nodeId, rmContext, host, cmPort, httpPort,
resolve(host), capability, nodeManagerVersion, physicalResource);
*org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService#registerNodeManager*
{code:java}
RMNode oldNode = this.rmContext.getRMNodes().putIfAbsent(nodeId, rmNode);
if (oldNode == null) {
RMNodeStartedEvent startEvent = new RMNodeStartedEvent(nodeId,
request.getNMContainerStatuses(),
request.getRunningApplications());
if (request.getLogAggregationReportsForApps() != null
&& !request.getLogAggregationReportsForApps().isEmpty()) {
if (LOG.isDebugEnabled()) {
LOG.debug("Found the number of previous cached log aggregation "
+ "status from nodemanager:" + nodeId + " is :"
+ request.getLogAggregationReportsForApps().size());
}
startEvent.setLogAggregationReportsForApps(request
.getLogAggregationReportsForApps());
}
this.rmContext.getDispatcher().getEventHandler().handle(
startEvent);
} else {
LOG.info("Reconnect from the node at: " + host);
this.nmLivelinessMonitor.unregister(nodeId);
if (CollectionUtils.isEmpty(request.getRunningApplications())
&& rmNode.getState() != NodeState.DECOMMISSIONING
&& rmNode.getHttpPort() != oldNode.getHttpPort()) {
// Reconnected node differs, so replace old node and start new node
switch (rmNode.getState()) {
case RUNNING:
ClusterMetrics.getMetrics().decrNumActiveNodes();
break;
case UNHEALTHY:
ClusterMetrics.getMetrics().decrNumUnhealthyNMs();
break;
default:
LOG.debug("Unexpected Rmnode state");
}
this.rmContext.getDispatcher().getEventHandler()
.handle(new NodeRemovedSchedulerEvent(rmNode));
this.rmContext.getRMNodes().put(nodeId, rmNode);
this.rmContext.getDispatcher().getEventHandler()
.handle(new RMNodeStartedEvent(nodeId, null, null));
} else {
// Reset heartbeat ID since node just restarted.
oldNode.resetLastNodeHeartBeatResponse();
this.rmContext.getDispatcher().getEventHandler()
.handle(new RMNodeReconnectEvent(nodeId, rmNode,
request.getRunningApplications(),
request.getNMContainerStatuses()));
}
}{code}
will raise Jira for same.
was (Author: brahmareddy):
thanks to all it's nice to have. There is an issue when rolling
upgrade,PhysicalResource will be null always.
i) Upgrade RM from 2.7 to 3.0.
ii) Upgrade NM from 2.7 to 3.0.
Here when NM re-register,as RMContext already have this nodeID so it will not
added again as httpport also same hence "PhysicalResource" will be always null
in the upgraded cluster till RM restart.
will raise Jira for same.
> NodeManager should communicate physical resource capability to ResourceManager
> ------------------------------------------------------------------------------
>
> Key: YARN-5356
> URL: https://issues.apache.org/jira/browse/YARN-5356
> Project: Hadoop YARN
> Issue Type: Improvement
> Components: nodemanager, resourcemanager
> Affects Versions: 3.0.0-alpha1
> Reporter: Nathan Roberts
> Assignee: Íñigo Goiri
> Priority: Major
> Labels: oct16-medium
> Fix For: 2.9.0, 3.0.0-alpha2
>
> Attachments: YARN-5356.000.patch, YARN-5356.001.patch,
> YARN-5356.002.patch, YARN-5356.002.patch, YARN-5356.003.patch,
> YARN-5356.004.patch, YARN-5356.005.patch, YARN-5356.006.patch,
> YARN-5356.007.patch, YARN-5356.008.patch, YARN-5356.009.patch,
> YARN-5356.010.patch, YARN-5356.011.patch
>
>
> Currently ResourceUtilization contains absolute quantities of resource used
> (e.g. 4096MB memory used). It would be good if the NM also communicated the
> actual physical resource capabilities of the node so that the RM can use this
> data to schedule more effectively (overcommit, etc)
> Currently the only available information is the Resource the node registered
> with (or later updated using updateNodeResource). However, these aren't
> really sufficient to get a good view of how utilized a resource is. For
> example, if a node reports 400% CPU utilization, does that mean it's
> completely full, or barely utilized? Today there is no reliable way to figure
> this out.
> [~elgoiri] - Lots of good work is happening in YARN-2965 so curious if you
> have thoughts/opinions on this?
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]