Subru Krishnan created YARN-5711: ------------------------------------ Summary: AM cannot reconnect to RM after failover when using RequestHedgingRMFailoverProxyProvider Key: YARN-5711 URL: https://issues.apache.org/jira/browse/YARN-5711 Project: Hadoop YARN Issue Type: Bug Components: applications, resourcemanager Affects Versions: 3.0.0-alpha1, 2.9.0 Reporter: Subru Krishnan Priority: Critical
When RM failsover, it does _not_ auto re-register running apps and so they need to re-register when reconnecting to new primary. This is done by catching {{ApplicationMasterNotRegisteredException}} in *allocate* calls and re-registering. But *RequestHedgingRMFailoverProxyProvider* does _not_ propagate {{YarnException}} as the actual invocation is done asynchronously using seperate threads. This JIRA proposes that the *RequestHedgingRMFailoverProxyProvider* propagate any {{YarnException}} that it encounters. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org