[
https://issues.apache.org/jira/browse/YARN-1577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13913633#comment-13913633
]
Naren Koneru commented on YARN-1577:
------------------------------------
After digging through the details, here's the summary as I understand (sorry
for the repetition if any).
- Today, the unmanaged client (llama) is sending a request to launch the AM,
then waiting for the App state to be ACCEPTED and then its registering the AM
using AMRMClientAsync.registerApplicationMaster.
- This register call expects the AM RM token to be set, which is part of the
application report. The Client gets this token by calling
ApplicationClientProtocol.getApplicationReport after the APP is accepted.
With the change in YARN-1493, this is broken since the AppAttempt is launched
after the application is accepted and hence the token is not set. So the client
can run into race conditions depending on when its getting the application
report. The temporary hack we made in the client is to retry for a fixed number
of times.
One way to solve this could be:
- Change the ApplicationReport (returned by
ApplicationClientProtocol.getApplicationReport) to add an attempt state, so the
client can rely on the Attempt state to be launched before proceeding with the
UAM registration.
- However, this would not be backwards compatible since it involves changes to
the unmanaged clients. Since I do not see any documentation for the unmanaged
clients, is this acceptable?
Is this proposal ok?. If not, any other suggestions?. If this proposal is ok,
then I can submit a patch. Pls comment.
> Unmanaged AM is broken because of YARN-1493
> -------------------------------------------
>
> Key: YARN-1577
> URL: https://issues.apache.org/jira/browse/YARN-1577
> Project: Hadoop YARN
> Issue Type: Sub-task
> Affects Versions: 2.3.0
> Reporter: Jian He
> Assignee: Naren Koneru
> Priority: Blocker
>
> Today unmanaged AM client is waiting for app state to be Accepted to launch
> the AM. This is broken since we changed in YARN-1493 to start the attempt
> after the application is Accepted. We may need to introduce an attempt state
> report that client can rely on to query the attempt state and choose to
> launch the unmanaged AM.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)