[jira] [Commented] (YARN-4413) Nodes in the includes list should not be listed as decommissioned in the UI
[ https://issues.apache.org/jira/browse/YARN-4413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15046994#comment-15046994 ] Sunil G commented on YARN-4413: --- Hi [~templedf] Thank you for the updated patch. I have some doubts on the updated patch. I am not very sure about the move from DECOMMISSIONED to SHUTDOWN on RECOMMISSION event. Event doesnt sounds so clean or correct. Why could we not send SHUTDOWN event itself. I see no harm in doing that. Because after refresh, a node is found to be in valid state as per config but DECOMMISSIONED by RM. So such nodes can be moved via SHUTDOWN event. Please correct me if I am missing something here. > Nodes in the includes list should not be listed as decommissioned in the UI > --- > > Key: YARN-4413 > URL: https://issues.apache.org/jira/browse/YARN-4413 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.7.1 >Reporter: Daniel Templeton >Assignee: Daniel Templeton > Attachments: YARN-4413.001.patch > > > If I decommission a node and then move it from the excludes list back to the > includes list, but I don't restart the node, the node will still be listed by > the web UI as decomissioned until either the NM or RM is restarted. Ideally, > removing the node from the excludes list and putting it back into the > includes list should cause the node to be reported as shutdown instead. > CC [~kshukla] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4413) Nodes in the includes list should not be listed as decommissioned in the UI
[ https://issues.apache.org/jira/browse/YARN-4413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15045329#comment-15045329 ] Kuhu Shukla commented on YARN-4413: --- [~templedf] Also, YARN-4386 would still be needed since currently we are looking for DECOMMISSIONED nodes in list returned by getRMNodes() which does not contain nodes in that state. Such nodes are part of getInactiveRMNodes list. So the change would still be needed even if we decide to add transition from DECOMM to RECOMM or not. I also had a question about the DECOMM to RECOMM transition and please pardon any naivety. If we transition a node which is not running NM process any more since its was DECOMM-ed and would then be SHUTDOWN, how does a RECOMM event help this node, unless we decide to start the NM process? Am I missing something here? Appreciate any comments. > Nodes in the includes list should not be listed as decommissioned in the UI > --- > > Key: YARN-4413 > URL: https://issues.apache.org/jira/browse/YARN-4413 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.7.1 >Reporter: Daniel Templeton >Assignee: Daniel Templeton > Attachments: YARN-4413.001.patch > > > If I decommission a node and then move it from the excludes list back to the > includes list, but I don't restart the node, the node will still be listed by > the web UI as decomissioned until either the NM or RM is restarted. Ideally, > removing the node from the excludes list and putting it back into the > includes list should cause the node to be reported as shutdown instead. > CC [~kshukla] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4413) Nodes in the includes list should not be listed as decommissioned in the UI
[ https://issues.apache.org/jira/browse/YARN-4413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15045370#comment-15045370 ] Kuhu Shukla commented on YARN-4413: --- I see, so that answers my transition query, thanks [~templedf]. But don't you think we are looking at the wrong list for DECOMM-ed nodes? They are in inActiveRMNodes() list and not the entries of getRMNodes list as far as i can tell. Hope this helps. > Nodes in the includes list should not be listed as decommissioned in the UI > --- > > Key: YARN-4413 > URL: https://issues.apache.org/jira/browse/YARN-4413 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.7.1 >Reporter: Daniel Templeton >Assignee: Daniel Templeton > Attachments: YARN-4413.001.patch > > > If I decommission a node and then move it from the excludes list back to the > includes list, but I don't restart the node, the node will still be listed by > the web UI as decomissioned until either the NM or RM is restarted. Ideally, > removing the node from the excludes list and putting it back into the > includes list should cause the node to be reported as shutdown instead. > CC [~kshukla] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4413) Nodes in the includes list should not be listed as decommissioned in the UI
[ https://issues.apache.org/jira/browse/YARN-4413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15045333#comment-15045333 ] Kuhu Shukla commented on YARN-4413: --- Small comment on the patch, {code} new RMNodeEvent(entry.getKey(), RMNodeEventType.DECOMMISSION)); } else if (entry.getValue().getState() == NodeState.DECOMMISSIONED) { this.rmContext.getDispatcher().getEventHandler().handle( {code} This wont ever evaluate for the same reason as above. AFAIK, decomm-ed nodes are part of inactive list alone while {{entry}} is traversing getRMNodes() list, always returning null and if condition will not evaluate to true any time. Please let me know if I am missing something here. > Nodes in the includes list should not be listed as decommissioned in the UI > --- > > Key: YARN-4413 > URL: https://issues.apache.org/jira/browse/YARN-4413 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.7.1 >Reporter: Daniel Templeton >Assignee: Daniel Templeton > Attachments: YARN-4413.001.patch > > > If I decommission a node and then move it from the excludes list back to the > includes list, but I don't restart the node, the node will still be listed by > the web UI as decomissioned until either the NM or RM is restarted. Ideally, > removing the node from the excludes list and putting it back into the > includes list should cause the node to be reported as shutdown instead. > CC [~kshukla] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4413) Nodes in the includes list should not be listed as decommissioned in the UI
[ https://issues.apache.org/jira/browse/YARN-4413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15045344#comment-15045344 ] Daniel Templeton commented on YARN-4413: [~kshukla], this patch is more for the UI. The point is that if I decommission, shutdown, and then recommission a node, the UI will continue to show it as decommissioned until the node is restarted. This patch closes that gap. > Nodes in the includes list should not be listed as decommissioned in the UI > --- > > Key: YARN-4413 > URL: https://issues.apache.org/jira/browse/YARN-4413 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.7.1 >Reporter: Daniel Templeton >Assignee: Daniel Templeton > Attachments: YARN-4413.001.patch > > > If I decommission a node and then move it from the excludes list back to the > includes list, but I don't restart the node, the node will still be listed by > the web UI as decomissioned until either the NM or RM is restarted. Ideally, > removing the node from the excludes list and putting it back into the > includes list should cause the node to be reported as shutdown instead. > CC [~kshukla] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4413) Nodes in the includes list should not be listed as decommissioned in the UI
[ https://issues.apache.org/jira/browse/YARN-4413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15042355#comment-15042355 ] Daniel Templeton commented on YARN-4413: [~kshukla], it looks to me like this patch (YARN-4413) obviates YARN-4386, since it becomes possible to recommission decommissioned nodes. > Nodes in the includes list should not be listed as decommissioned in the UI > --- > > Key: YARN-4413 > URL: https://issues.apache.org/jira/browse/YARN-4413 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.7.1 >Reporter: Daniel Templeton >Assignee: Daniel Templeton > Attachments: YARN-4413.001.patch > > > If I decommission a node and then move it from the excludes list back to the > includes list, but I don't restart the node, the node will still be listed by > the web UI as decomissioned until either the NM or RM is restarted. Ideally, > removing the node from the excludes list and putting it back into the > includes list should cause the node to be reported as shutdown instead. > CC [~kshukla] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4413) Nodes in the includes list should not be listed as decommissioned in the UI
[ https://issues.apache.org/jira/browse/YARN-4413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15042363#comment-15042363 ] Kuhu Shukla commented on YARN-4413: --- I agree. However, from the discussions with Junping and Sunil on YARN-4386, bq. I think Recommission event shouldn't be applied on decommissioned nodes as it won't have any affect and we'd better to keep consistent with previous behavior before graceful decommission comes out. Asking [~djp] for further comments. Thanks a lot. > Nodes in the includes list should not be listed as decommissioned in the UI > --- > > Key: YARN-4413 > URL: https://issues.apache.org/jira/browse/YARN-4413 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.7.1 >Reporter: Daniel Templeton >Assignee: Daniel Templeton > Attachments: YARN-4413.001.patch > > > If I decommission a node and then move it from the excludes list back to the > includes list, but I don't restart the node, the node will still be listed by > the web UI as decomissioned until either the NM or RM is restarted. Ideally, > removing the node from the excludes list and putting it back into the > includes list should cause the node to be reported as shutdown instead. > CC [~kshukla] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4413) Nodes in the includes list should not be listed as decommissioned in the UI
[ https://issues.apache.org/jira/browse/YARN-4413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15038135#comment-15038135 ] Daniel Templeton commented on YARN-4413: bq. But a restart will help here to clear the metrics. True, but it will also cause an outage, which comes with its own potential impact. bq. So I feel we could look both lists upon refresh and remove/add nodes based on the entries in both files and from memory. Agreed. I'll past a patch with my general approach shortly. > Nodes in the includes list should not be listed as decommissioned in the UI > --- > > Key: YARN-4413 > URL: https://issues.apache.org/jira/browse/YARN-4413 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.7.1 >Reporter: Daniel Templeton >Assignee: Daniel Templeton > > If I decommission a node and then move it from the excludes list back to the > includes list, but I don't restart the node, the node will still be listed by > the web UI as decomissioned until either the NM or RM is restarted. Ideally, > removing the node from the excludes list and putting it back into the > includes list should cause the node to be reported as shutdown instead. > CC [~kshukla] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4413) Nodes in the includes list should not be listed as decommissioned in the UI
[ https://issues.apache.org/jira/browse/YARN-4413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15038265#comment-15038265 ] Kuhu Shukla commented on YARN-4413: --- YARN-4386 tracks the RECOMMISSION check. The current patch does not have a test since its an invalid check. > Nodes in the includes list should not be listed as decommissioned in the UI > --- > > Key: YARN-4413 > URL: https://issues.apache.org/jira/browse/YARN-4413 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.7.1 >Reporter: Daniel Templeton >Assignee: Daniel Templeton > Attachments: YARN-4413.001.patch > > > If I decommission a node and then move it from the excludes list back to the > includes list, but I don't restart the node, the node will still be listed by > the web UI as decomissioned until either the NM or RM is restarted. Ideally, > removing the node from the excludes list and putting it back into the > includes list should cause the node to be reported as shutdown instead. > CC [~kshukla] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4413) Nodes in the includes list should not be listed as decommissioned in the UI
[ https://issues.apache.org/jira/browse/YARN-4413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15038269#comment-15038269 ] Kuhu Shukla commented on YARN-4413: --- The current patch for YARN-4386* > Nodes in the includes list should not be listed as decommissioned in the UI > --- > > Key: YARN-4413 > URL: https://issues.apache.org/jira/browse/YARN-4413 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.7.1 >Reporter: Daniel Templeton >Assignee: Daniel Templeton > Attachments: YARN-4413.001.patch > > > If I decommission a node and then move it from the excludes list back to the > includes list, but I don't restart the node, the node will still be listed by > the web UI as decomissioned until either the NM or RM is restarted. Ideally, > removing the node from the excludes list and putting it back into the > includes list should cause the node to be reported as shutdown instead. > CC [~kshukla] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4413) Nodes in the includes list should not be listed as decommissioned in the UI
[ https://issues.apache.org/jira/browse/YARN-4413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15036643#comment-15036643 ] Kuhu Shukla commented on YARN-4413: --- Thanks for reporting this [~templedf]. Was a node refresh done after the file change ? If yes then I think, since this metric is updated during AddNodeTransition (which updates rejoined metrics) , there is no transition that takes care of this until the node tries to register/heartbeat (as it is absent from all RMNodeImpl lists). One way could be to do this check in {{refreshNodes}}. > Nodes in the includes list should not be listed as decommissioned in the UI > --- > > Key: YARN-4413 > URL: https://issues.apache.org/jira/browse/YARN-4413 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.7.1 >Reporter: Daniel Templeton >Assignee: Daniel Templeton > > If I decommission a node and then move it from the excludes list back to the > includes list, but I don't restart the node, the node will still be listed by > the web UI as decomissioned until either the NM or RM is restarted. Ideally, > removing the node from the excludes list and putting it back into the > includes list should cause the node to be reported as shutdown instead. > CC [~kshukla] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4413) Nodes in the includes list should not be listed as decommissioned in the UI
[ https://issues.apache.org/jira/browse/YARN-4413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15036645#comment-15036645 ] Daniel Templeton commented on YARN-4413: That's what I was thinking. > Nodes in the includes list should not be listed as decommissioned in the UI > --- > > Key: YARN-4413 > URL: https://issues.apache.org/jira/browse/YARN-4413 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.7.1 >Reporter: Daniel Templeton >Assignee: Daniel Templeton > > If I decommission a node and then move it from the excludes list back to the > includes list, but I don't restart the node, the node will still be listed by > the web UI as decomissioned until either the NM or RM is restarted. Ideally, > removing the node from the excludes list and putting it back into the > includes list should cause the node to be reported as shutdown instead. > CC [~kshukla] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4413) Nodes in the includes list should not be listed as decommissioned in the UI
[ https://issues.apache.org/jira/browse/YARN-4413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15036644#comment-15036644 ] Daniel Templeton commented on YARN-4413: Yes. The refresh marks nodes newly added to the excludes list as decommissioned, but it doesn't do anything for nodes newly added to the includes list. > Nodes in the includes list should not be listed as decommissioned in the UI > --- > > Key: YARN-4413 > URL: https://issues.apache.org/jira/browse/YARN-4413 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.7.1 >Reporter: Daniel Templeton >Assignee: Daniel Templeton > > If I decommission a node and then move it from the excludes list back to the > includes list, but I don't restart the node, the node will still be listed by > the web UI as decomissioned until either the NM or RM is restarted. Ideally, > removing the node from the excludes list and putting it back into the > includes list should cause the node to be reported as shutdown instead. > CC [~kshukla] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4413) Nodes in the includes list should not be listed as decommissioned in the UI
[ https://issues.apache.org/jira/browse/YARN-4413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15037232#comment-15037232 ] Sunil G commented on YARN-4413: --- Hi [~templedf] Thank you for raising this ticket. As you mentioned, I could see that a node is moved from exclude to include list and performed {{-refreshNodes}}. And this caused some counts still to be displayed in UI. But a restart will help here to clear the metrics. One point to note here. The way I see it, I do not think we can remove or reset this decommissioned count directly by only seeing the include list. There can be cases where we would have done {{graceful decommissioning}}, and this can add few nodes to decommissioned list which is not one-to-one mapped with exclude list. So I feel we could look both lists upon refresh and remove/add nodes based on the entries in both files and from memory. > Nodes in the includes list should not be listed as decommissioned in the UI > --- > > Key: YARN-4413 > URL: https://issues.apache.org/jira/browse/YARN-4413 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.7.1 >Reporter: Daniel Templeton >Assignee: Daniel Templeton > > If I decommission a node and then move it from the excludes list back to the > includes list, but I don't restart the node, the node will still be listed by > the web UI as decomissioned until either the NM or RM is restarted. Ideally, > removing the node from the excludes list and putting it back into the > includes list should cause the node to be reported as shutdown instead. > CC [~kshukla] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4413) Nodes in the includes list should not be listed as decommissioned in the UI
[ https://issues.apache.org/jira/browse/YARN-4413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15036638#comment-15036638 ] Allen Wittenauer commented on YARN-4413: Is there a node list refresh happening in that procedure above? > Nodes in the includes list should not be listed as decommissioned in the UI > --- > > Key: YARN-4413 > URL: https://issues.apache.org/jira/browse/YARN-4413 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.7.1 >Reporter: Daniel Templeton >Assignee: Daniel Templeton > > If I decommission a node and then move it from the excludes list back to the > includes list, but I don't restart the node, the node will still be listed by > the web UI as decomissioned until either the NM or RM is restarted. Ideally, > removing the node from the excludes list and putting it back into the > includes list should cause the node to be reported as shutdown instead. > CC [~kshukla] -- This message was sent by Atlassian JIRA (v6.3.4#6332)