Subru Krishnan created YARN-5711:
------------------------------------

             Summary: AM cannot reconnect to RM after failover when using 
RequestHedgingRMFailoverProxyProvider
                 Key: YARN-5711
                 URL: https://issues.apache.org/jira/browse/YARN-5711
             Project: Hadoop YARN
          Issue Type: Bug
          Components: applications, resourcemanager
    Affects Versions: 3.0.0-alpha1, 2.9.0
            Reporter: Subru Krishnan
            Priority: Critical


When RM failsover, it does _not_ auto re-register running apps and so they need 
to re-register when reconnecting to new primary. This is done by catching 
{{ApplicationMasterNotRegisteredException}} in *allocate* calls and 
re-registering. But *RequestHedgingRMFailoverProxyProvider* does _not_ 
propagate {{YarnException}} as the actual invocation is done asynchronously 
using seperate threads.

This JIRA proposes that the *RequestHedgingRMFailoverProxyProvider* propagate 
any {{YarnException}} that it encounters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

Reply via email to