[
https://issues.apache.org/jira/browse/YARN-4494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15288950#comment-15288950
]
Sunil G commented on YARN-4494:
-------------------------------
Hi [~kasha]
As per the problem statement, if we are starting to recover complete apps
asynchronously, we may not know when this recovery will be completed. So if we
are getting a query (getApplication/Attempt etc) during this brief recovery
period, we could immediately try to recover the queried app from client (also
by blocking the client rpc call), and serve the metrics/state etc.
So it wont be a lazy recover when there is a request, we can immediately
recover and serve it.
bq.If yes, do we recover everything when someone requests all apps? How about
apps that match a specific category?
I was thinking in same line early. But we may block the client call for a long
time here till all apps are recovered. There are two options here, 1) block the
client call till all apps are recovered (it may be too long, and timeour may
happen) 2) error message/exception can be thrown to client indicating that
recovery is in progress.
Both these are not very clean solutions. But we have seen some de-merits of
recovering completed apps (in case of thousands of completed apps). TO avoid
this issue, max-completed applications were configured lesser.
cc/[~rohithsharma]
[~kasha], pls share your thoughts.
> Recover completed apps asynchronously
> -------------------------------------
>
> Key: YARN-4494
> URL: https://issues.apache.org/jira/browse/YARN-4494
> Project: Hadoop YARN
> Issue Type: Improvement
> Components: resourcemanager
> Reporter: Jun Gong
> Assignee: Jun Gong
>
> With RM HA enabled, when recovering apps, recover completed apps
> asynchronously.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]