[jira] [Commented] (YARN-7102) NM heartbeat stuck when responseId overflows MAX_INT

Jason Lowe (JIRA) Fri, 22 Sep 2017 11:00:46 -0700

    [ 
https://issues.apache.org/jira/browse/YARN-7102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16176821#comment-16176821
 ]


Jason Lowe commented on YARN-7102:
----------------------------------

Ah sorry, so maybe we're OK with this scenario in the current code as far as 
throwing away heartbeats and instead trade that for not being able to always 
detect a duplicate heartbeat.  That's going to be less severe than a dropped 
heartbeat but still potentially problematic.

ResourceTrackerService is synchronously handing the updated response to the 
RMNodeImpl, so we really have no excuse why we need to wait for the 
asynchronous message containing the response to arrive at the RMNodeImpl in 
order to get the last response ID updated properly.  As I mentioned above, we 
should never return a response for the current heartbeat request until we are 
ready to receive the next heartbeat request.  I don't understand the appeal of 
going with the "take anything greater than" approach with corner cases that 
fail (like wrap-around or NM heartbeating much farther ahead and really is 
out-of-sync) given we can cover all those cases in a straightforward way 
without the caveats.


> NM heartbeat stuck when responseId overflows MAX_INT
> ----------------------------------------------------
>
>                 Key: YARN-7102
>                 URL: https://issues.apache.org/jira/browse/YARN-7102
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Botong Huang
>            Assignee: Botong Huang
>            Priority: Critical
>         Attachments: YARN-7102.v1.patch, YARN-7102.v2.patch, 
> YARN-7102.v3.patch, YARN-7102.v4.patch, YARN-7102.v5.patch, YARN-7102.v6.patch
>
>
> ResponseId overflow problem in NM-RM heartbeat. This is same as AM-RM 
> heartbeat in YARN-6640, please refer to YARN-6640 for details. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (YARN-7102) NM heartbeat stuck when responseId overflows MAX_INT

Reply via email to