[
https://issues.apache.org/jira/browse/YARN-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15588202#comment-15588202
]
Rohith Sharma K S commented on YARN-5611:
-----------------------------------------
Summary of discussion for supporting update API for timeout with [~vinodkv],
[~vvasudev] and [~sunilg].
*Current Patch Approach*: update API is synchronous call. And its timeout
parameter affect from now i.e targetTimeoutFromNow(in seconds) for given
ApplicationTimeoutTypes. Say, user wants to update timeout, if user passes 30
seconds for update then RM will calculate (now + 30 seconds ) and store in
state store.
*Concern on update API approach*
# synchronous call is too expensive. In case of statestore problem, other RPC
calls are blocked.
So basically approach of synchronous call need to be changed to asynchronous
call. Note, updatePriority is also blocking call which need to be solved once
server side supported.
*Challenges*
# Yarn-client need to be adopt polling mechanism for RM for its update
confirmation. The first important thing need consider for polling mechanism is
*Idempotent* API. So, to the user, *input remain relative*, but client need to
convert relative time to targetedTime and send a request to RM with
targeted-time. The RM takes requested targeted timeout as-is and use it. Say,
user input timeout as 30 seconds, then yarn client change it to
targetedTimeout=(now + 30seconds) and sends to RM. To the RM, update request is
always targeted time.
# Multiple client update timeout for same application. Here, issue is how does
yarn-client get to know about its update confirmation.? How long yarn-client
need to re-try/wait before bailing out.?
# *More importantly, Timezone issue* if yarnclient converts it as explained in
first challenge. Say, RM is running in IST , and yarnclient is running in PDT
timezone. Even though Yarnclient converts user input to targeted time and send
to RM, since both client-server timezone are different, the targetedTimeout
value is totally different.
*Some of the thoughts solving the challenges*
# Server(RM)
## Always accept targeted timeout in UTC format and store in state-store as UTC
format only. For RM monitoring, covert UTC timestamp to local machine timezone
and use it. This solves Timezone issue from RM.
## Once update request received by RM, firstly RM should store new value in
statestore. Secondly, update to monitoring service.
## RM should check targeted timeout is same as request time.If yes, return a
boolean flag indicating update is success. Always keep a copy of updated
timeout values in RMApp and return these values when somebody query for timeout
values.
# YarnClient:
## Convert user input timeout to targeted-timeout in UTC and send update
request to RM
## Keep sending update request in loop as long as response received as true
indicating update confirmation.
## Bail out and fail the client operation after some configured value.
Any comments on approach are welcome.
> Provide an API to update lifetime of an application.
> ----------------------------------------------------
>
> Key: YARN-5611
> URL: https://issues.apache.org/jira/browse/YARN-5611
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: resourcemanager
> Reporter: Rohith Sharma K S
> Assignee: Rohith Sharma K S
> Attachments: 0001-YARN-5611.patch, 0002-YARN-5611.patch,
> 0003-YARN-5611.patch, YARN-5611.v0.patch
>
>
> YARN-4205 monitors an Lifetime of an applications is monitored if required.
> Add an client api to update lifetime of an application.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]