[jira] [Updated] (YARN-3671) Integrate Federation services with ResourceManager
[ https://issues.apache.org/jira/browse/YARN-3671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Subru Krishnan updated YARN-3671: - Attachment: YARN-3671-YARN-2915-v5.patch Fixing the checkstyle warning that can be addressed (v5). > Integrate Federation services with ResourceManager > -- > > Key: YARN-3671 > URL: https://issues.apache.org/jira/browse/YARN-3671 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Subru Krishnan >Assignee: Subru Krishnan > Attachments: YARN-3671-YARN-2915-v1.patch, > YARN-3671-YARN-2915-v2.patch, YARN-3671-YARN-2915-v3.patch, > YARN-3671-YARN-2915-v4.patch, YARN-3671-YARN-2915-v5.patch > > > This JIRA proposes adding the ability to turn on Federation services like > StateStore, cluster membership heartbeat etc in the RM -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-3671) Integrate Federation services with ResourceManager
[ https://issues.apache.org/jira/browse/YARN-3671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Subru Krishnan updated YARN-3671: - Attachment: YARN-3671-YARN-2915-v4.patch bq. Here, the scheduler is passed in via reference. If the RM switches to standby and then switches back to active, the scheduler object will be re-created and the reference will be updated in RMContext, but the reference here won't be updated, which cause the scheduler object here outdated? Good question [~jianhe]. The scheduler object will *not* be outdated as {{FederationStateStoreService}} is an active service and the {{FederationStateStoreHeartbeat}} is initialized on it's _serviceStart_: {code} stateStoreHeartbeat = new FederationStateStoreHeartbeat(subClusterId, stateStoreClient, rmContext.getScheduler()); {code} To double-check, I have updated the {{TestFederationRMStateStoreService}} with an explicit transtion to standby and then back to active. I have also renamed "state-store.heartbeat-interval" --> "state-store.heartbeat-interval-secs" in v4 of the patch as you suggested. > Integrate Federation services with ResourceManager > -- > > Key: YARN-3671 > URL: https://issues.apache.org/jira/browse/YARN-3671 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Subru Krishnan >Assignee: Subru Krishnan > Attachments: YARN-3671-YARN-2915-v1.patch, > YARN-3671-YARN-2915-v2.patch, YARN-3671-YARN-2915-v3.patch, > YARN-3671-YARN-2915-v4.patch > > > This JIRA proposes adding the ability to turn on Federation services like > StateStore, cluster membership heartbeat etc in the RM -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-3671) Integrate Federation services with ResourceManager
[ https://issues.apache.org/jira/browse/YARN-3671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Subru Krishnan updated YARN-3671: - Attachment: YARN-3671-YARN-2915-v3.patch [~jianhe], I looked into it & seems like we should be able to reuse RM_CLUSTER_ID safely as you suggested. So I have replaced FEDERATION_SUBCLUSTER_ID with RM_CLUSTER_ID in v3 of the patch. > Integrate Federation services with ResourceManager > -- > > Key: YARN-3671 > URL: https://issues.apache.org/jira/browse/YARN-3671 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Subru Krishnan >Assignee: Subru Krishnan > Attachments: YARN-3671-YARN-2915-v1.patch, > YARN-3671-YARN-2915-v2.patch, YARN-3671-YARN-2915-v3.patch > > > This JIRA proposes adding the ability to turn on Federation services like > StateStore, cluster membership heartbeat etc in the RM -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-3671) Integrate Federation services with ResourceManager
[ https://issues.apache.org/jira/browse/YARN-3671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Subru Krishnan updated YARN-3671: - Attachment: YARN-3671-YARN-2915-v2.patch Thanks [~jianhe] for the feedback. Updated patch (v2) to remove redundant null check and refactor _setStateStoreClient_ as suggested by you. As to your other questions: bq. we already have RM_CLUSTER_ID, any chance that this can be used for FEDERATION_SUBCLUSTER_ID ? That's a possibility. The reason I didn't combine both is RM_CLUSTER_ID is currently used for HA but Federation can work both with and without HA (and RM HA can work both with and without Federation). So felt it would be better to keep them separate. Thoughts? bq. I feel the SubClusterState is a bit redundant in the request object, because the API itself already indicates the state such as register / deregister. You are right. We don't want state to be null in the store so either the store impl can implicitly add SC_NEW/SC_UNREGISTERED on register / deregister or the invoker (which is always RM) can. I decided to do it in the RM for 2 reasons: 1. It is trivial (one line) & needs to be done in a single place (RM) instead of in each store impl we add. 2. This allows for flexibility future as RM could potentially register / deregister with different states (say SC_DRAINING). Makes sense? > Integrate Federation services with ResourceManager > -- > > Key: YARN-3671 > URL: https://issues.apache.org/jira/browse/YARN-3671 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Subru Krishnan >Assignee: Subru Krishnan > Attachments: YARN-3671-YARN-2915-v1.patch, > YARN-3671-YARN-2915-v2.patch > > > This JIRA proposes adding the ability to turn on Federation services like > StateStore, cluster membership heartbeat etc in the RM -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-3671) Integrate Federation services with ResourceManager
[ https://issues.apache.org/jira/browse/YARN-3671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Subru Krishnan updated YARN-3671: - Attachment: YARN-3671-YARN-2915-v1.patch Attaching a patch that adds a new _RMActiveService_ - {{FederationStateStoreService}} that's optionally turned on by a boolean *FEDERATION_ENABLED* flag. When enabled it starts a periodic heartbeat; {{FederationStateStoreHeartbeat}} to the _FederationStateStore_. > Integrate Federation services with ResourceManager > -- > > Key: YARN-3671 > URL: https://issues.apache.org/jira/browse/YARN-3671 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Subru Krishnan >Assignee: Subru Krishnan > Attachments: YARN-3671-YARN-2915-v1.patch > > > This JIRA proposes adding the ability to turn on Federation services like > StateStore, cluster membership heartbeat etc in the RM -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org