[jira] [Updated] (YARN-5694) ZKRMStateStore can prevent the transition to standby in branch-2.7 if the ZK node is unreachable

2016-12-01 Thread Daniel Templeton (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Templeton updated YARN-5694:
---
Attachment: YARN-5694.branch-2.6.002.patch

Uploading new branch-2.6 patch to fix the test.

> ZKRMStateStore can prevent the transition to standby in branch-2.7 if the ZK 
> node is unreachable
> 
>
> Key: YARN-5694
> URL: https://issues.apache.org/jira/browse/YARN-5694
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.3
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Critical
>  Labels: oct16-medium
> Attachments: YARN-5694.001.patch, YARN-5694.002.patch, 
> YARN-5694.003.patch, YARN-5694.004.patch, YARN-5694.004.patch, 
> YARN-5694.005.patch, YARN-5694.006.patch, YARN-5694.007.patch, 
> YARN-5694.008.patch, YARN-5694.branch-2.6.001.patch, 
> YARN-5694.branch-2.6.002.patch, YARN-5694.branch-2.7.001.patch, 
> YARN-5694.branch-2.7.002.patch, YARN-5694.branch-2.7.004.patch, 
> YARN-5694.branch-2.7.005.patch
>
>
> {{ZKRMStateStore.doStoreMultiWithRetries()}} holds the lock while trying to 
> talk to ZK.  If the connection fails, it will retry while still holding the 
> lock.  The retries are intended to be strictly time limited, but in the case 
> that the ZK node is unreachable, the time limit fails, resulting in the 
> thread holding the lock for over an hour.  Transitioning the RM to standby 
> requires that same lock, so in exactly the case that the RM should be 
> transitioning to standby, the {{VerifyActiveStatusThread}} blocks it from 
> happening.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5694) ZKRMStateStore can prevent the transition to standby in branch-2.7 if the ZK node is unreachable

2016-11-29 Thread Daniel Templeton (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Templeton updated YARN-5694:
---
Attachment: YARN-5694.branch-2.6.001.patch

Here's a 2.6 patch.

> ZKRMStateStore can prevent the transition to standby in branch-2.7 if the ZK 
> node is unreachable
> 
>
> Key: YARN-5694
> URL: https://issues.apache.org/jira/browse/YARN-5694
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.3
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Critical
>  Labels: oct16-medium
> Attachments: YARN-5694.001.patch, YARN-5694.002.patch, 
> YARN-5694.003.patch, YARN-5694.004.patch, YARN-5694.004.patch, 
> YARN-5694.005.patch, YARN-5694.006.patch, YARN-5694.007.patch, 
> YARN-5694.008.patch, YARN-5694.branch-2.6.001.patch, 
> YARN-5694.branch-2.7.001.patch, YARN-5694.branch-2.7.002.patch, 
> YARN-5694.branch-2.7.004.patch, YARN-5694.branch-2.7.005.patch
>
>
> {{ZKRMStateStore.doStoreMultiWithRetries()}} holds the lock while trying to 
> talk to ZK.  If the connection fails, it will retry while still holding the 
> lock.  The retries are intended to be strictly time limited, but in the case 
> that the ZK node is unreachable, the time limit fails, resulting in the 
> thread holding the lock for over an hour.  Transitioning the RM to standby 
> requires that same lock, so in exactly the case that the RM should be 
> transitioning to standby, the {{VerifyActiveStatusThread}} blocks it from 
> happening.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5694) ZKRMStateStore can prevent the transition to standby in branch-2.7 if the ZK node is unreachable

2016-11-22 Thread Daniel Templeton (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Templeton updated YARN-5694:
---
Attachment: YARN-5694.branch-2.7.005.patch

Switched to MockRM in the tests

> ZKRMStateStore can prevent the transition to standby in branch-2.7 if the ZK 
> node is unreachable
> 
>
> Key: YARN-5694
> URL: https://issues.apache.org/jira/browse/YARN-5694
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.3
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Critical
>  Labels: oct16-medium
> Attachments: YARN-5694.001.patch, YARN-5694.002.patch, 
> YARN-5694.003.patch, YARN-5694.004.patch, YARN-5694.004.patch, 
> YARN-5694.005.patch, YARN-5694.006.patch, YARN-5694.007.patch, 
> YARN-5694.008.patch, YARN-5694.branch-2.7.001.patch, 
> YARN-5694.branch-2.7.002.patch, YARN-5694.branch-2.7.004.patch, 
> YARN-5694.branch-2.7.005.patch
>
>
> {{ZKRMStateStore.doStoreMultiWithRetries()}} holds the lock while trying to 
> talk to ZK.  If the connection fails, it will retry while still holding the 
> lock.  The retries are intended to be strictly time limited, but in the case 
> that the ZK node is unreachable, the time limit fails, resulting in the 
> thread holding the lock for over an hour.  Transitioning the RM to standby 
> requires that same lock, so in exactly the case that the RM should be 
> transitioning to standby, the {{VerifyActiveStatusThread}} blocks it from 
> happening.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5694) ZKRMStateStore can prevent the transition to standby in branch-2.7 if the ZK node is unreachable

2016-11-22 Thread Daniel Templeton (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Templeton updated YARN-5694:
---
Summary: ZKRMStateStore can prevent the transition to standby in branch-2.7 
if the ZK node is unreachable  (was: ZKRMStateStore can prevent the transition 
to standby if the ZK node is unreachable)

> ZKRMStateStore can prevent the transition to standby in branch-2.7 if the ZK 
> node is unreachable
> 
>
> Key: YARN-5694
> URL: https://issues.apache.org/jira/browse/YARN-5694
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.3
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Critical
>  Labels: oct16-medium
> Attachments: YARN-5694.001.patch, YARN-5694.002.patch, 
> YARN-5694.003.patch, YARN-5694.004.patch, YARN-5694.004.patch, 
> YARN-5694.005.patch, YARN-5694.006.patch, YARN-5694.007.patch, 
> YARN-5694.008.patch, YARN-5694.branch-2.7.001.patch, 
> YARN-5694.branch-2.7.002.patch, YARN-5694.branch-2.7.004.patch
>
>
> {{ZKRMStateStore.doStoreMultiWithRetries()}} holds the lock while trying to 
> talk to ZK.  If the connection fails, it will retry while still holding the 
> lock.  The retries are intended to be strictly time limited, but in the case 
> that the ZK node is unreachable, the time limit fails, resulting in the 
> thread holding the lock for over an hour.  Transitioning the RM to standby 
> requires that same lock, so in exactly the case that the RM should be 
> transitioning to standby, the {{VerifyActiveStatusThread}} blocks it from 
> happening.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5694) ZKRMStateStore can prevent the transition to standby in branch-2.7 if the ZK node is unreachable

2016-11-22 Thread Daniel Templeton (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Templeton updated YARN-5694:
---
Affects Version/s: (was: 3.0.0-alpha1)
   2.7.3

> ZKRMStateStore can prevent the transition to standby in branch-2.7 if the ZK 
> node is unreachable
> 
>
> Key: YARN-5694
> URL: https://issues.apache.org/jira/browse/YARN-5694
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.3
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Critical
>  Labels: oct16-medium
> Attachments: YARN-5694.001.patch, YARN-5694.002.patch, 
> YARN-5694.003.patch, YARN-5694.004.patch, YARN-5694.004.patch, 
> YARN-5694.005.patch, YARN-5694.006.patch, YARN-5694.007.patch, 
> YARN-5694.008.patch, YARN-5694.branch-2.7.001.patch, 
> YARN-5694.branch-2.7.002.patch, YARN-5694.branch-2.7.004.patch
>
>
> {{ZKRMStateStore.doStoreMultiWithRetries()}} holds the lock while trying to 
> talk to ZK.  If the connection fails, it will retry while still holding the 
> lock.  The retries are intended to be strictly time limited, but in the case 
> that the ZK node is unreachable, the time limit fails, resulting in the 
> thread holding the lock for over an hour.  Transitioning the RM to standby 
> requires that same lock, so in exactly the case that the RM should be 
> transitioning to standby, the {{VerifyActiveStatusThread}} blocks it from 
> happening.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org