[ 
https://issues.apache.org/jira/browse/YARN-4597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15593945#comment-15593945
 ] 

Arun Suresh commented on YARN-4597:
-----------------------------------

[~jianhe], thanks again for taking a look.

bq. I think there might be some behavior change or bug for scheduling 
guaranteed containers when the oppotunistic-queue is enabled.
Previously, when launching container, NM will not check for current vmem usage, 
and cpu usage. It assumes what RM allocated can be launched.
Now, NM will check these limits and won't launch the container if hits the 
limit.
Yup, we do a *hasResources* check only at the start of a container and when a 
container is killed. We assumed that resources requested by a container is 
constant, essentially we considered only actual *allocated* resources which we 
assume will not varying during the lifetime of the container... which implies, 
there is no point in checking this at any other time other than start and kill 
of containers.
But like you stated, if we consider container resource *utilization*, based on 
the work [~kasha] is doing in YARN-1011, then yes, we should have a timer 
thread that periodically checks the vmem and cpu usage and starts (and kills) 
containers based on that.

bq. the ResourceUtilizationManager looks like only incorporated some utility 
methods, not sure how we will make this pluggable later.
Following on my point above, the idea was to have a 
{{ResourceUtilizationManager}} that can provide a different value of 
{{getCurrentUtilization}}, {{addResource}} and {{subtractResource}} which is 
used by the ContainerScheduler to calculate the resources to free up. For 
instance, the current default one only takes into account actual resource 
*allocated* to containers...  for YARN-1011, we might replace that with the 
resource *utilized* by running containers, and provide a different value for 
{{getCurrentUtilization}}. The timer thread I mentioned in the previous point, 
which can be apart of this new ResourceUtilizationManager, can send events to 
the scheduler to re-process queued containers when utilization has changed.

bq. The logic to select opportunisitic container: we may kill more 
opportunistic containers than required. e.g...
Good catch, in the {{resourcesToFreeUp}}, I needed to decrement any 
already-marked-for-kill opportunistic container. It was there earlier, Had 
removed it when I was testing something, but forgot to put it back :)

bq. we don't need to synchronize on the currentUtilization object? I don't see 
any other place it's synchronized
Yup, It isnt required. Varun did point out the same.. I thought I had fixed it, 
think I might have missed 'git add'ing the change

w.r.t Adding the new transitions, I was seeing some error messages in some 
testcases. Will rerun and see if they are required… but in anycase, having them 
there should be harmless right?
 
The rest of your comments makes sense.. will address them shortly.


> Add SCHEDULE to NM container lifecycle
> --------------------------------------
>
>                 Key: YARN-4597
>                 URL: https://issues.apache.org/jira/browse/YARN-4597
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>            Reporter: Chris Douglas
>            Assignee: Arun Suresh
>         Attachments: YARN-4597.001.patch, YARN-4597.002.patch, 
> YARN-4597.003.patch
>
>
> Currently, the NM immediately launches containers after resource 
> localization. Several features could be more cleanly implemented if the NM 
> included a separate stage for reserving resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to