[ https://issues.apache.org/jira/browse/YARN-8729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16613300#comment-16613300 ]
Weiwei Yang commented on YARN-8729: ----------------------------------- +1 on my side. [~ebadger], are you fine with this patch? > Node status updater thread could be lost after it restarted > ----------------------------------------------------------- > > Key: YARN-8729 > URL: https://issues.apache.org/jira/browse/YARN-8729 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager > Affects Versions: 3.2.0 > Reporter: Tao Yang > Assignee: Tao Yang > Priority: Critical > Attachments: YARN-8729.001.patch, YARN-8729.001.patch, > YARN-8729.002.patch > > > Today I found a lost NM whose node status updater thread was not exist after > this thread restarted. In > {{NodeStatusUpdaterImpl#rebootNodeStatusUpdaterAndRegisterWithRM}}, isStopped > flag is not updated to be false before executing {{statusUpdater.start()}}, > so that if the thread is immediately started and found isStopped==true, it > will exit without any log. > Key codes in > {{NodeStatusUpdaterImpl#rebootNodeStatusUpdaterAndRegisterWithRM}}: > {code:java} > statusUpdater.join(); > registerWithRM(); > statusUpdater = new Thread(statusUpdaterRunnable, "Node Status Updater"); > statusUpdater.start(); > this.isStopped = false; //this line should be moved before > statusUpdater.start(); > LOG.info("NodeStatusUpdater thread is reRegistered and restarted"); > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org