[jira] [Updated] (YARN-8010) Add config in FederationRMFailoverProxy to not bypass facade cache when failing over
[ https://issues.apache.org/jira/browse/YARN-8010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-8010: - Fix Version/s: (was: 3.1.1) Doing 3.1.0 RC1 now, moved all 3.1.1 (branch-3.1) fixes to 3.1.0 (branch-3.1.0) > Add config in FederationRMFailoverProxy to not bypass facade cache when > failing over > > > Key: YARN-8010 > URL: https://issues.apache.org/jira/browse/YARN-8010 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Botong Huang >Assignee: Botong Huang >Priority: Minor > Fix For: 3.1.0, 2.10.0, 2.9.1 > > Attachments: YARN-8010.v1.patch, YARN-8010.v1.patch, > YARN-8010.v2.patch, YARN-8010.v3.patch > > > Today when YarnRM is failing over, the FederationRMFailoverProxy running in > AMRMProxy will perform failover, try to get latest subcluster info from > FederationStateStore and then retry connect to the latest YarnRM master. When > calling getSubCluster() to FederationStateStoreFacade, it bypasses the cache > with a flush flag. When YarnRM is failing over, every AM heartbeat thread > creates a different thread inside FederationInterceptor, each of which keeps > performing failover several times. This leads to a big spike of getSubCluster > call to FederationStateStore. > Depending on the cluster setup (e.g. putting a VIP before all YarnRMs), > YarnRM master slave change might not result in RM addr change. In other > cases, a small delay of getting latest subcluster information may be > acceptable. This patch thus creates a config option, so that it is possible > to ask the FederationRMFailoverProxy to not flush cache when calling > getSubCluster(). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8010) Add config in FederationRMFailoverProxy to not bypass facade cache when failing over
[ https://issues.apache.org/jira/browse/YARN-8010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Subru Krishnan updated YARN-8010: - Summary: Add config in FederationRMFailoverProxy to not bypass facade cache when failing over (was: add config in FederationRMFailoverProxy to not bypass facade cache when failing over) > Add config in FederationRMFailoverProxy to not bypass facade cache when > failing over > > > Key: YARN-8010 > URL: https://issues.apache.org/jira/browse/YARN-8010 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Botong Huang >Assignee: Botong Huang >Priority: Minor > Attachments: YARN-8010.v1.patch, YARN-8010.v1.patch, > YARN-8010.v2.patch, YARN-8010.v3.patch > > > Today when YarnRM is failing over, the FederationRMFailoverProxy running in > AMRMProxy will perform failover, try to get latest subcluster info from > FederationStateStore and then retry connect to the latest YarnRM master. When > calling getSubCluster() to FederationStateStoreFacade, it bypasses the cache > with a flush flag. When YarnRM is failing over, every AM heartbeat thread > creates a different thread inside FederationInterceptor, each of which keeps > performing failover several times. This leads to a big spike of getSubCluster > call to FederationStateStore. > Depending on the cluster setup (e.g. putting a VIP before all YarnRMs), > YarnRM master slave change might not result in RM addr change. In other > cases, a small delay of getting latest subcluster information may be > acceptable. This patch thus creates a config option, so that it is possible > to ask the FederationRMFailoverProxy to not flush cache when calling > getSubCluster(). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8010) add config in FederationRMFailoverProxy to not bypass facade cache when failing over
[ https://issues.apache.org/jira/browse/YARN-8010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Botong Huang updated YARN-8010: --- Attachment: YARN-8010.v3.patch > add config in FederationRMFailoverProxy to not bypass facade cache when > failing over > > > Key: YARN-8010 > URL: https://issues.apache.org/jira/browse/YARN-8010 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Botong Huang >Assignee: Botong Huang >Priority: Minor > Attachments: YARN-8010.v1.patch, YARN-8010.v1.patch, > YARN-8010.v2.patch, YARN-8010.v3.patch > > > Today when YarnRM is failing over, the FederationRMFailoverProxy running in > AMRMProxy will perform failover, try to get latest subcluster info from > FederationStateStore and then retry connect to the latest YarnRM master. When > calling getSubCluster() to FederationStateStoreFacade, it bypasses the cache > with a flush flag. When YarnRM is failing over, every AM heartbeat thread > creates a different thread inside FederationInterceptor, each of which keeps > performing failover several times. This leads to a big spike of getSubCluster > call to FederationStateStore. > Depending on the cluster setup (e.g. putting a VIP before all YarnRMs), > YarnRM master slave change might not result in RM addr change. In other > cases, a small delay of getting latest subcluster information may be > acceptable. This patch thus creates a config option, so that it is possible > to ask the FederationRMFailoverProxy to not flush cache when calling > getSubCluster(). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8010) add config in FederationRMFailoverProxy to not bypass facade cache when failing over
[ https://issues.apache.org/jira/browse/YARN-8010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Botong Huang updated YARN-8010: --- Attachment: YARN-8010.v2.patch > add config in FederationRMFailoverProxy to not bypass facade cache when > failing over > > > Key: YARN-8010 > URL: https://issues.apache.org/jira/browse/YARN-8010 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Botong Huang >Assignee: Botong Huang >Priority: Minor > Attachments: YARN-8010.v1.patch, YARN-8010.v1.patch, > YARN-8010.v2.patch > > > Today when YarnRM is failing over, the FederationRMFailoverProxy running in > AMRMProxy will perform failover, try to get latest subcluster info from > FederationStateStore and then retry connect to the latest YarnRM master. When > calling getSubCluster() to FederationStateStoreFacade, it bypasses the cache > with a flush flag. When YarnRM is failing over, every AM heartbeat thread > creates a different thread inside FederationInterceptor, each of which keeps > performing failover several times. This leads to a big spike of getSubCluster > call to FederationStateStore. > Depending on the cluster setup (e.g. putting a VIP before all YarnRMs), > YarnRM master slave change might not result in RM addr change. In other > cases, a small delay of getting latest subcluster information may be > acceptable. This patch thus creates a config option, so that it is possible > to ask the FederationRMFailoverProxy to not flush cache when calling > getSubCluster(). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8010) add config in FederationRMFailoverProxy to not bypass facade cache when failing over
[ https://issues.apache.org/jira/browse/YARN-8010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Botong Huang updated YARN-8010: --- Attachment: YARN-8010.v1.patch > add config in FederationRMFailoverProxy to not bypass facade cache when > failing over > > > Key: YARN-8010 > URL: https://issues.apache.org/jira/browse/YARN-8010 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Botong Huang >Assignee: Botong Huang >Priority: Minor > Attachments: YARN-8010.v1.patch, YARN-8010.v1.patch > > > Today when YarnRM is failing over, the FederationRMFailoverProxy running in > AMRMProxy will perform failover, try to get latest subcluster info from > FederationStateStore and then retry connect to the latest YarnRM master. When > calling getSubCluster() to FederationStateStoreFacade, it bypasses the cache > with a flush flag. When YarnRM is failing over, every AM heartbeat thread > creates a different thread inside FederationInterceptor, each of which keeps > performing failover several times. This leads to a big spike of getSubCluster > call to FederationStateStore. > Depending on the cluster setup (e.g. putting a VIP before all YarnRMs), > YarnRM master slave change might not result in RM addr change. In other > cases, a small delay of getting latest subcluster information may be > acceptable. This patch thus creates a config option, so that it is possible > to ask the FederationRMFailoverProxy to not flush cache when calling > getSubCluster(). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8010) add config in FederationRMFailoverProxy to not bypass facade cache when failing over
[ https://issues.apache.org/jira/browse/YARN-8010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Botong Huang updated YARN-8010: --- Attachment: YARN-8010.v1.patch > add config in FederationRMFailoverProxy to not bypass facade cache when > failing over > > > Key: YARN-8010 > URL: https://issues.apache.org/jira/browse/YARN-8010 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Botong Huang >Assignee: Botong Huang >Priority: Minor > Attachments: YARN-8010.v1.patch > > > Today when YarnRM is failing over, the FederationRMFailoverProxy running in > AMRMProxy will perform failover, try to get latest subcluster info from > FederationStateStore and then retry connect to the latest YarnRM master. When > calling getSubCluster() to FederationStateStoreFacade, it bypasses the cache > with a flush flag. When YarnRM is failing over, every AM heartbeat thread > creates a different thread inside FederationInterceptor, each of which keeps > performing failover several times. This leads to a big spike of getSubCluster > call to FederationStateStore. > Depending on the cluster setup (e.g. putting a VIP before all YarnRMs), > YarnRM master slave change might not result in RM addr change. In other > cases, a small delay of getting latest subcluster information may be > acceptable. This patch thus creates a config option, so that it is possible > to ask the FederationRMFailoverProxy to not flush cache when calling > getSubCluster(). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8010) add config in FederationRMFailoverProxy to not bypass facade cache when failing over
[ https://issues.apache.org/jira/browse/YARN-8010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Botong Huang updated YARN-8010: --- Issue Type: Sub-task (was: Task) Parent: YARN-5597 > add config in FederationRMFailoverProxy to not bypass facade cache when > failing over > > > Key: YARN-8010 > URL: https://issues.apache.org/jira/browse/YARN-8010 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Botong Huang >Assignee: Botong Huang >Priority: Minor > Attachments: YARN-8010.v1.patch > > > Today when YarnRM is failing over, the FederationRMFailoverProxy running in > AMRMProxy will perform failover, try to get latest subcluster info from > FederationStateStore and then retry connect to the latest YarnRM master. When > calling getSubCluster() to FederationStateStoreFacade, it bypasses the cache > with a flush flag. When YarnRM is failing over, every AM heartbeat thread > creates a different thread inside FederationInterceptor, each of which keeps > performing failover several times. This leads to a big spike of getSubCluster > call to FederationStateStore. > Depending on the cluster setup (e.g. putting a VIP before all YarnRMs), > YarnRM master slave change might not result in RM addr change. In other > cases, a small delay of getting latest subcluster information may be > acceptable. This patch thus creates a config option, so that it is possible > to ask the FederationRMFailoverProxy to not flush cache when calling > getSubCluster(). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org