[jira] [Commented] (YARN-4676) Automatic and Asynchronous Decommissioning Nodes Status Tracking

Junping Du (JIRA) Tue, 02 Aug 2016 16:01:29 -0700

    [ 
https://issues.apache.org/jira/browse/YARN-4676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15404963#comment-15404963
 ]


Junping Du commented on YARN-4676:
----------------------------------

bq. If NM crashes (for example, JVM exit due to out of heap), it suppose to 
restart automatically, instead of waiting fur human to start it. Isn't that the 
general practice? 
I don't think this is a general case as YARN deployment cases could be various 
- in many cases (especially at on-premise environment), NM is not supposed to 
be so fragile and admin need to figure out what's happening before NM crash. 
Also, even we want to make NM get restart immediately (without human 
assistant/trouble-shoot), the auto restart logic is outside of YARN but belongs 
to some cluster deployment/monitor tools like Ambari. Here, we'd better not to 
have many assumptions.

bq. But nothing prevent/disallow the NM daemon from restart, wither 
automatically or by human. When such NM restart, it will try to register itself 
to RM, which will be told to shutdown if it still appear in the exclude list. 
Such node will remain as DECOMMISSIONED inside RM until 10+ minutes later into 
LOST after the EXPIRE event.
As I said above, this belongs to admin's behavior or your monitor tools logic. 
Just like if an admin is madly to keep starting a NM which belongs to 
decommissioned node, YARN can do nothing about it but just keep shutdown NM. 
Such node should always keep as DECOMMISSIONED and I don't see any benefit to 
move it to EXPIRE status.

bq. Such DECOMMISSIONED node can be recommissioned (refreshNodes after it is 
removed from the exclude list). During which it is transition into RUNNING 
state.
I don't see this hack can bring any benefit, comparing with refreshNode with 
moving it to include list and restart the NM deamon which will go through 
normal register process. The risk is we need to take care a separated code path 
that is dedicated for this minor case.

bq. These behavior appears to me as robust instead of hacking. It appears that 
the behavior you expected relies on a separate mechanism that permanently 
shutdown NM once it is DECOMMISSIONED.
I never hear we need a separate mechanism to shutdown NM once it is 
decommissioned. It should be built-in behavior for Apache Hadoop YARN so far. 
Are you talking about a private/specific branch rather than current 
trunk/branch-2?

bq. As long as such DECOMMISSIONED node never try to register or be 
recommissioned, yes, I expect these transitions you listed could be removed.
The re-register of node after taking refreshNode operation is going through the 
normal register process which is good enough for me. I don't think we need some 
change here unless we have strong reasons. So. Yes. Please remove these 
transitions because this is not correct based on current YARN's logic.

bq. So I see these transitions are really needed. That said, I could removed 
them and maintain them privately inside EMR branch for the sake of getting this 
JIRA going.
I can understand the pain point to maintain a private branch - may be standing 
at your private (EMR) branch, these pieces of code could be needed. However, as 
a community contributor, you have to switch your roles to stand at community 
code base in trunk/branch-2, and we committers can only help to get in pieces 
of code that benefit the whole community. If these piece of code can be 
important for another story (like resource elasticity of YARN) to benefit the 
community, we can move it out to another dedicated work but we need to have 
open discussion on design/implementation ahead - that's the right process for 
patch/feature contribution.

bq. These transitions are there almost single the beginning of this JIRA, any 
other comments/surprises?
These issues already make me surprised enough - these transitions in RMNode 
belongs to very key logic to YARN, and we need to be careful as always. I need 
more time to review the rest of code. Hopefully, I can finish my 1st round 
tomorrow and publish the left comments.

> Automatic and Asynchronous Decommissioning Nodes Status Tracking
> ----------------------------------------------------------------
>
>                 Key: YARN-4676
>                 URL: https://issues.apache.org/jira/browse/YARN-4676
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>    Affects Versions: 2.8.0
>            Reporter: Daniel Zhi
>            Assignee: Daniel Zhi
>              Labels: features
>         Attachments: GracefulDecommissionYarnNode.pdf, 
> GracefulDecommissionYarnNode.pdf, YARN-4676.004.patch, YARN-4676.005.patch, 
> YARN-4676.006.patch, YARN-4676.007.patch, YARN-4676.008.patch, 
> YARN-4676.009.patch, YARN-4676.010.patch, YARN-4676.011.patch, 
> YARN-4676.012.patch, YARN-4676.013.patch, YARN-4676.014.patch, 
> YARN-4676.015.patch, YARN-4676.016.patch, YARN-4676.017.patch, 
> YARN-4676.018.patch, YARN-4676.019.patch
>
>
> YARN-4676 implements an automatic, asynchronous and flexible mechanism to 
> graceful decommission
> YARN nodes. After user issues the refreshNodes request, ResourceManager 
> automatically evaluates
> status of all affected nodes to kicks out decommission or recommission 
> actions. RM asynchronously
> tracks container and application status related to DECOMMISSIONING nodes to 
> decommission the
> nodes immediately after there are ready to be decommissioned. Decommissioning 
> timeout at individual
> nodes granularity is supported and could be dynamically updated. The 
> mechanism naturally supports multiple
> independent graceful decommissioning “sessions” where each one involves 
> different sets of nodes with
> different timeout settings. Such support is ideal and necessary for graceful 
> decommission request issued
> by external cluster management software instead of human.
> DecommissioningNodeWatcher inside ResourceTrackingService tracks 
> DECOMMISSIONING nodes status automatically and asynchronously after 
> client/admin made the graceful decommission request. It tracks 
> DECOMMISSIONING nodes status to decide when, after all running containers on 
> the node have completed, will be transitioned into DECOMMISSIONED state. 
> NodesListManager detect and handle include and exclude list changes to kick 
> out decommission or recommission as necessary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-4676) Automatic and Asynchronous Decommissioning Nodes Status Tracking

Reply via email to