[
https://issues.apache.org/jira/browse/YARN-11924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ferenc Erdelyi updated YARN-11924:
----------------------------------
Description: Should a "yarn resourcemanager -format-state-store" command be
issued while one of the RM is starting and in the INIT state (because of
YARN-11551), there is a time period when the /confstore/CONF_STORE path does
not exist, hence the getZkData method returns a null value, causing the RM to
fail. To prevent this, add a check and re-try mechanism before giving up.
(was: Should a 'yarn resourcemanager -format-state-store' command is issued
while one of the RM is in a starting state, -because of YARN-11551- there is a
time period when the /confstore/CONF_STORE path does not exist, hence the
getZkData method returns a null value, causing the RM to fail. To prevent this,
add a check and re-try mechanism before giving up.)
> Add zkManager.exists(path) check to ZKConfigurationStore:getZkData() and
> retry mechanism
> ----------------------------------------------------------------------------------------
>
> Key: YARN-11924
> URL: https://issues.apache.org/jira/browse/YARN-11924
> Project: Hadoop YARN
> Issue Type: Bug
> Reporter: Ferenc Erdelyi
> Assignee: Ferenc Erdelyi
> Priority: Major
>
> Should a "yarn resourcemanager -format-state-store" command be issued while
> one of the RM is starting and in the INIT state (because of YARN-11551),
> there is a time period when the /confstore/CONF_STORE path does not exist,
> hence the getZkData method returns a null value, causing the RM to fail. To
> prevent this, add a check and re-try mechanism before giving up.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]