[jira] [Commented] (YARN-4413) Nodes in the includes list should not be listed as decommissioned in the UI

2015-12-08 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15046994#comment-15046994
 ] 

Sunil G commented on YARN-4413:
---

Hi [~templedf]
Thank you for the updated patch.

I have some doubts on the updated patch. I am not very sure about the move from 
DECOMMISSIONED to SHUTDOWN on RECOMMISSION event. Event doesnt sounds so clean 
or correct. Why could we not send SHUTDOWN event itself. I see no harm in doing 
that.
Because after refresh, a node is found to be in valid state as per config but 
DECOMMISSIONED by RM. So such nodes can be moved via SHUTDOWN event. Please 
correct me if I am missing something here.

> Nodes in the includes list should not be listed as decommissioned in the UI
> ---
>
> Key: YARN-4413
> URL: https://issues.apache.org/jira/browse/YARN-4413
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
> Attachments: YARN-4413.001.patch
>
>
> If I decommission a node and then move it from the excludes list back to the 
> includes list, but I don't restart the node, the node will still be listed by 
> the web UI as decomissioned until either the NM or RM is restarted.  Ideally, 
> removing the node from the excludes list and putting it back into the 
> includes list should cause the node to be reported as shutdown instead.
> CC [~kshukla]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4413) Nodes in the includes list should not be listed as decommissioned in the UI

2015-12-07 Thread Kuhu Shukla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15045329#comment-15045329
 ] 

Kuhu Shukla commented on YARN-4413:
---

[~templedf]

Also, YARN-4386 would still be needed since currently we are looking for 
DECOMMISSIONED nodes in list returned by getRMNodes() which does not contain 
nodes in that state. Such nodes are part of getInactiveRMNodes list. So the 
change would still be needed even if we decide to add transition from DECOMM to 
RECOMM or not. 

I also had a question about the DECOMM to RECOMM transition and please pardon 
any naivety. If we transition a node which is not running NM process any more 
since its was DECOMM-ed and would then be SHUTDOWN, how does a RECOMM event 
help this node, unless we decide to start the NM process? Am I missing 
something here? Appreciate any comments.

> Nodes in the includes list should not be listed as decommissioned in the UI
> ---
>
> Key: YARN-4413
> URL: https://issues.apache.org/jira/browse/YARN-4413
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
> Attachments: YARN-4413.001.patch
>
>
> If I decommission a node and then move it from the excludes list back to the 
> includes list, but I don't restart the node, the node will still be listed by 
> the web UI as decomissioned until either the NM or RM is restarted.  Ideally, 
> removing the node from the excludes list and putting it back into the 
> includes list should cause the node to be reported as shutdown instead.
> CC [~kshukla]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4413) Nodes in the includes list should not be listed as decommissioned in the UI

2015-12-07 Thread Kuhu Shukla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15045370#comment-15045370
 ] 

Kuhu Shukla commented on YARN-4413:
---

I see, so that answers my transition query, thanks [~templedf]. But don't you 
think we are looking at the wrong list for DECOMM-ed nodes? They are in 
inActiveRMNodes() list and not the entries of getRMNodes list as far as i can 
tell. Hope this helps.

> Nodes in the includes list should not be listed as decommissioned in the UI
> ---
>
> Key: YARN-4413
> URL: https://issues.apache.org/jira/browse/YARN-4413
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
> Attachments: YARN-4413.001.patch
>
>
> If I decommission a node and then move it from the excludes list back to the 
> includes list, but I don't restart the node, the node will still be listed by 
> the web UI as decomissioned until either the NM or RM is restarted.  Ideally, 
> removing the node from the excludes list and putting it back into the 
> includes list should cause the node to be reported as shutdown instead.
> CC [~kshukla]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4413) Nodes in the includes list should not be listed as decommissioned in the UI

2015-12-07 Thread Kuhu Shukla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15045333#comment-15045333
 ] 

Kuhu Shukla commented on YARN-4413:
---

Small comment on the patch, 
{code}
   new RMNodeEvent(entry.getKey(), RMNodeEventType.DECOMMISSION));
} else if (entry.getValue().getState() == NodeState.DECOMMISSIONED) {
 this.rmContext.getDispatcher().getEventHandler().handle(
{code}

This wont ever evaluate for the same reason as above. AFAIK, decomm-ed nodes 
are part of inactive list alone while {{entry}} is traversing getRMNodes() 
list, always returning null and if condition will not evaluate to true any 
time. Please let me know if I am missing something here.

> Nodes in the includes list should not be listed as decommissioned in the UI
> ---
>
> Key: YARN-4413
> URL: https://issues.apache.org/jira/browse/YARN-4413
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
> Attachments: YARN-4413.001.patch
>
>
> If I decommission a node and then move it from the excludes list back to the 
> includes list, but I don't restart the node, the node will still be listed by 
> the web UI as decomissioned until either the NM or RM is restarted.  Ideally, 
> removing the node from the excludes list and putting it back into the 
> includes list should cause the node to be reported as shutdown instead.
> CC [~kshukla]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4413) Nodes in the includes list should not be listed as decommissioned in the UI

2015-12-07 Thread Daniel Templeton (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15045344#comment-15045344
 ] 

Daniel Templeton commented on YARN-4413:


[~kshukla], this patch is more for the UI.  The point is that if I 
decommission, shutdown, and then recommission a node, the UI will continue to 
show it as decommissioned until the node is restarted.  This patch closes that 
gap.

> Nodes in the includes list should not be listed as decommissioned in the UI
> ---
>
> Key: YARN-4413
> URL: https://issues.apache.org/jira/browse/YARN-4413
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
> Attachments: YARN-4413.001.patch
>
>
> If I decommission a node and then move it from the excludes list back to the 
> includes list, but I don't restart the node, the node will still be listed by 
> the web UI as decomissioned until either the NM or RM is restarted.  Ideally, 
> removing the node from the excludes list and putting it back into the 
> includes list should cause the node to be reported as shutdown instead.
> CC [~kshukla]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4413) Nodes in the includes list should not be listed as decommissioned in the UI

2015-12-04 Thread Daniel Templeton (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15042355#comment-15042355
 ] 

Daniel Templeton commented on YARN-4413:


[~kshukla], it looks to me like this patch (YARN-4413) obviates YARN-4386, 
since it becomes possible to recommission decommissioned nodes.

> Nodes in the includes list should not be listed as decommissioned in the UI
> ---
>
> Key: YARN-4413
> URL: https://issues.apache.org/jira/browse/YARN-4413
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
> Attachments: YARN-4413.001.patch
>
>
> If I decommission a node and then move it from the excludes list back to the 
> includes list, but I don't restart the node, the node will still be listed by 
> the web UI as decomissioned until either the NM or RM is restarted.  Ideally, 
> removing the node from the excludes list and putting it back into the 
> includes list should cause the node to be reported as shutdown instead.
> CC [~kshukla]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4413) Nodes in the includes list should not be listed as decommissioned in the UI

2015-12-04 Thread Kuhu Shukla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15042363#comment-15042363
 ] 

Kuhu Shukla commented on YARN-4413:
---

I agree. However, from the discussions with Junping and Sunil on YARN-4386,
bq. I think Recommission event shouldn't be applied on decommissioned nodes as 
it won't have any affect and we'd better to keep consistent with previous 
behavior before graceful decommission comes out.
Asking [~djp] for further comments. Thanks a lot.

> Nodes in the includes list should not be listed as decommissioned in the UI
> ---
>
> Key: YARN-4413
> URL: https://issues.apache.org/jira/browse/YARN-4413
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
> Attachments: YARN-4413.001.patch
>
>
> If I decommission a node and then move it from the excludes list back to the 
> includes list, but I don't restart the node, the node will still be listed by 
> the web UI as decomissioned until either the NM or RM is restarted.  Ideally, 
> removing the node from the excludes list and putting it back into the 
> includes list should cause the node to be reported as shutdown instead.
> CC [~kshukla]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4413) Nodes in the includes list should not be listed as decommissioned in the UI

2015-12-03 Thread Daniel Templeton (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15038135#comment-15038135
 ] 

Daniel Templeton commented on YARN-4413:


bq. But a restart will help here to clear the metrics.

True, but it will also cause an outage, which comes with its own potential 
impact.

bq. So I feel we could look both lists upon refresh and remove/add nodes based 
on the entries in both files and from memory.

Agreed.  I'll past a patch with my general approach shortly.

> Nodes in the includes list should not be listed as decommissioned in the UI
> ---
>
> Key: YARN-4413
> URL: https://issues.apache.org/jira/browse/YARN-4413
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>
> If I decommission a node and then move it from the excludes list back to the 
> includes list, but I don't restart the node, the node will still be listed by 
> the web UI as decomissioned until either the NM or RM is restarted.  Ideally, 
> removing the node from the excludes list and putting it back into the 
> includes list should cause the node to be reported as shutdown instead.
> CC [~kshukla]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4413) Nodes in the includes list should not be listed as decommissioned in the UI

2015-12-03 Thread Kuhu Shukla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15038265#comment-15038265
 ] 

Kuhu Shukla commented on YARN-4413:
---

YARN-4386 tracks the RECOMMISSION check. The current patch does not have a test 
since its an invalid check.

> Nodes in the includes list should not be listed as decommissioned in the UI
> ---
>
> Key: YARN-4413
> URL: https://issues.apache.org/jira/browse/YARN-4413
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
> Attachments: YARN-4413.001.patch
>
>
> If I decommission a node and then move it from the excludes list back to the 
> includes list, but I don't restart the node, the node will still be listed by 
> the web UI as decomissioned until either the NM or RM is restarted.  Ideally, 
> removing the node from the excludes list and putting it back into the 
> includes list should cause the node to be reported as shutdown instead.
> CC [~kshukla]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4413) Nodes in the includes list should not be listed as decommissioned in the UI

2015-12-03 Thread Kuhu Shukla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15038269#comment-15038269
 ] 

Kuhu Shukla commented on YARN-4413:
---

The current patch for YARN-4386*

> Nodes in the includes list should not be listed as decommissioned in the UI
> ---
>
> Key: YARN-4413
> URL: https://issues.apache.org/jira/browse/YARN-4413
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
> Attachments: YARN-4413.001.patch
>
>
> If I decommission a node and then move it from the excludes list back to the 
> includes list, but I don't restart the node, the node will still be listed by 
> the web UI as decomissioned until either the NM or RM is restarted.  Ideally, 
> removing the node from the excludes list and putting it back into the 
> includes list should cause the node to be reported as shutdown instead.
> CC [~kshukla]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4413) Nodes in the includes list should not be listed as decommissioned in the UI

2015-12-02 Thread Kuhu Shukla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15036643#comment-15036643
 ] 

Kuhu Shukla commented on YARN-4413:
---

Thanks for reporting this [~templedf]. Was a node refresh done after the file 
change ? If yes then I think,  since this metric is updated during 
AddNodeTransition (which updates rejoined metrics) , there is no transition 
that takes care of this until the node tries to register/heartbeat (as it is 
absent from all RMNodeImpl lists). One way could be to do this check in 
{{refreshNodes}}. 

> Nodes in the includes list should not be listed as decommissioned in the UI
> ---
>
> Key: YARN-4413
> URL: https://issues.apache.org/jira/browse/YARN-4413
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>
> If I decommission a node and then move it from the excludes list back to the 
> includes list, but I don't restart the node, the node will still be listed by 
> the web UI as decomissioned until either the NM or RM is restarted.  Ideally, 
> removing the node from the excludes list and putting it back into the 
> includes list should cause the node to be reported as shutdown instead.
> CC [~kshukla]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4413) Nodes in the includes list should not be listed as decommissioned in the UI

2015-12-02 Thread Daniel Templeton (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15036645#comment-15036645
 ] 

Daniel Templeton commented on YARN-4413:


That's what I was thinking.

> Nodes in the includes list should not be listed as decommissioned in the UI
> ---
>
> Key: YARN-4413
> URL: https://issues.apache.org/jira/browse/YARN-4413
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>
> If I decommission a node and then move it from the excludes list back to the 
> includes list, but I don't restart the node, the node will still be listed by 
> the web UI as decomissioned until either the NM or RM is restarted.  Ideally, 
> removing the node from the excludes list and putting it back into the 
> includes list should cause the node to be reported as shutdown instead.
> CC [~kshukla]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4413) Nodes in the includes list should not be listed as decommissioned in the UI

2015-12-02 Thread Daniel Templeton (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15036644#comment-15036644
 ] 

Daniel Templeton commented on YARN-4413:


Yes.  The refresh marks nodes newly added to the excludes list as 
decommissioned, but it doesn't do anything for nodes newly added to the 
includes list.

> Nodes in the includes list should not be listed as decommissioned in the UI
> ---
>
> Key: YARN-4413
> URL: https://issues.apache.org/jira/browse/YARN-4413
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>
> If I decommission a node and then move it from the excludes list back to the 
> includes list, but I don't restart the node, the node will still be listed by 
> the web UI as decomissioned until either the NM or RM is restarted.  Ideally, 
> removing the node from the excludes list and putting it back into the 
> includes list should cause the node to be reported as shutdown instead.
> CC [~kshukla]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4413) Nodes in the includes list should not be listed as decommissioned in the UI

2015-12-02 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15037232#comment-15037232
 ] 

Sunil G commented on YARN-4413:
---

Hi [~templedf]
Thank you for raising this ticket.
As you mentioned, I could see that a node is moved from exclude to include list 
and performed {{-refreshNodes}}. And this caused some counts still to be 
displayed in UI. But a restart will help here to clear the metrics.

One point to note here. The way I see it, I do not think we can remove or reset 
this decommissioned count directly by only seeing the include list. There can 
be cases where we would have done {{graceful decommissioning}}, and this can 
add few nodes to decommissioned list which is not one-to-one mapped with 
exclude list.
So I feel we could look both lists upon refresh and remove/add nodes based on 
the entries in both files and from memory.

> Nodes in the includes list should not be listed as decommissioned in the UI
> ---
>
> Key: YARN-4413
> URL: https://issues.apache.org/jira/browse/YARN-4413
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>
> If I decommission a node and then move it from the excludes list back to the 
> includes list, but I don't restart the node, the node will still be listed by 
> the web UI as decomissioned until either the NM or RM is restarted.  Ideally, 
> removing the node from the excludes list and putting it back into the 
> includes list should cause the node to be reported as shutdown instead.
> CC [~kshukla]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4413) Nodes in the includes list should not be listed as decommissioned in the UI

2015-12-02 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15036638#comment-15036638
 ] 

Allen Wittenauer commented on YARN-4413:


Is there a node list refresh happening in that procedure above?

> Nodes in the includes list should not be listed as decommissioned in the UI
> ---
>
> Key: YARN-4413
> URL: https://issues.apache.org/jira/browse/YARN-4413
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>
> If I decommission a node and then move it from the excludes list back to the 
> includes list, but I don't restart the node, the node will still be listed by 
> the web UI as decomissioned until either the NM or RM is restarted.  Ideally, 
> removing the node from the excludes list and putting it back into the 
> includes list should cause the node to be reported as shutdown instead.
> CC [~kshukla]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)