Daniel Templeton updated YARN-5694:
    Attachment: YARN-5694.branch-2.6.002.patch

Uploading new branch-2.6 patch to fix the test.

> ZKRMStateStore can prevent the transition to standby in branch-2.7 if the ZK 
> node is unreachable
> ------------------------------------------------------------------------------------------------
>                 Key: YARN-5694
>                 URL: https://issues.apache.org/jira/browse/YARN-5694
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.7.3
>            Reporter: Daniel Templeton
>            Assignee: Daniel Templeton
>            Priority: Critical
>              Labels: oct16-medium
>         Attachments: YARN-5694.001.patch, YARN-5694.002.patch, 
> YARN-5694.003.patch, YARN-5694.004.patch, YARN-5694.004.patch, 
> YARN-5694.005.patch, YARN-5694.006.patch, YARN-5694.007.patch, 
> YARN-5694.008.patch, YARN-5694.branch-2.6.001.patch, 
> YARN-5694.branch-2.6.002.patch, YARN-5694.branch-2.7.001.patch, 
> YARN-5694.branch-2.7.002.patch, YARN-5694.branch-2.7.004.patch, 
> YARN-5694.branch-2.7.005.patch
> {{ZKRMStateStore.doStoreMultiWithRetries()}} holds the lock while trying to 
> talk to ZK.  If the connection fails, it will retry while still holding the 
> lock.  The retries are intended to be strictly time limited, but in the case 
> that the ZK node is unreachable, the time limit fails, resulting in the 
> thread holding the lock for over an hour.  Transitioning the RM to standby 
> requires that same lock, so in exactly the case that the RM should be 
> transitioning to standby, the {{VerifyActiveStatusThread}} blocks it from 
> happening.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to