Devaraj K commented on YARN-3813:

Thanks [~nijel] and [~rohithsharma] for the design proposal.

New auxillary service : RMAppTimeOutService
Responsibility is to track the running application. Simple logic

//if job is running and the time elapsed kill
&& (currentTime - app.getSubmitTime()) >= timeout

How frequently are you going to check this condition for each application?

Can we have a monitor something like RMAppTimeOutMonitor which extends 
AbstractLivelinessMonitor, when the application gets submitted to RM then we 
can register the application with RMAppTimeOutMonitor using the user specified 
timeout. And when the timeout reaches, RMAppTimeOutMonitor can trigger an event 
to take an action further.

bq. Yes, having a separate TIMEOUT event and TIMEOUT state is good approach and 
other option. Initially we consider to have new state TIMEOUT which require 
very huge changes across all the modules.
I feel having a TIMEOUT state for RMAppImpl  would be proper here. When 
RMAppTimeOutMonitor triggers an event on timeout for an application, RMAppImpl 
can move the state to TIMEOUT state from any of the non-final states and during 
the transition it can handle stopping the running attempt and the containers. I 
don't see here that there will be so many changes required for achieving it.

> Support Application timeout feature in YARN. 
> ---------------------------------------------
>                 Key: YARN-3813
>                 URL: https://issues.apache.org/jira/browse/YARN-3813
>             Project: Hadoop YARN
>          Issue Type: New Feature
>          Components: scheduler
>            Reporter: nijel
>         Attachments: YARN Application Timeout .pdf
> It will be useful to support Application Timeout in YARN. Some use cases are 
> not worried about the output of the applications if the application is not 
> completed in a specific time. 
> *Background:*
> The requirement is to show the CDR statistics of last few  minutes, say for 
> every 5 minutes. The same Job will run continuously with different dataset.
> So one job will be started in every 5 minutes. The estimate time for this 
> task is 2 minutes or lesser time. 
> If the application is not completing in the given time the output is not 
> useful.
> *Proposal*
> So idea is to support application timeout, with which timeout parameter is 
> given while submitting the job. 
> Here, user is expecting to finish (complete or kill) the application in the 
> given time.
> One option for us is to move this logic to Application client (who submit the 
> job). 
> But it will be nice if it can be generic logic and can make more robust.
> Kindly provide your suggestions/opinion on this feature. If it sounds good, i 
> will update the design doc and prototype patch

This message was sent by Atlassian JIRA

Reply via email to