[ 
https://issues.apache.org/jira/browse/YARN-5887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16181302#comment-16181302
 ] 

Konstantinos Karanasos commented on YARN-5887:
----------------------------------------------

Hi [~Hugo Oshiro],
The problem with the {{ContainerLaunchContext}} is that it gets created once 
and not updated during the execution.
What you want is the RM to periodically inform the NMs about the progress of 
applications for containers that are running on that NM.
So I think adding it in the node heartbeat response is the right way. You can 
look into the {{NodeStatusUpdater}} class to start.

For calculating the job progress, there are multiple ways. One of the 
implementations I had done internally at some point was doing exactly what you 
are suggesting. It is not ideal, but it is definitely a first approximation. 
More involved strategies could look into the DAG structure (you might wait for 
a single mapper to finish for starting the next stage) or take into account 
estimates of task runtimes from previous executions (so if you expect a task to 
run for 2 hours and another for 10 seconds, you can take that into account when 
calculating progress).

Hope this helps.

> Policies for choosing which opportunistic containers to kill
> ------------------------------------------------------------
>
>                 Key: YARN-5887
>                 URL: https://issues.apache.org/jira/browse/YARN-5887
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Konstantinos Karanasos
>
> When a guaranteed container arrives at an NM but there are no resources to 
> start its execution, opportunistic containers will be killed to make space 
> for the guaranteed container.
> At the moment, we kill opportunistic containers in reverse order of arrival 
> (first the most recently started ones). This is not always the right 
> decision. 
> For example, we might want to minimize the number of containers killed: to 
> start a 6GB container, we could kill one 6GB opportunistic or three 2GB ones. 
> Another example would be to refrain from killing containers of jobs that are 
> very close to completion (we have to pass job completion information to the 
> NM in that case).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to