[jira] [Commented] (YARN-4597) Add SCHEDULE to NM container lifecycle

Arun Suresh (JIRA) Thu, 10 Nov 2016 23:41:03 -0800

    [ 
https://issues.apache.org/jira/browse/YARN-4597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15656417#comment-15656417
 ]


Arun Suresh commented on YARN-4597:
-----------------------------------

Appreciate the review [~kkaranasos],

1.
bq. The Container has two new methods (sendLaunchEvent and sendKillEvent), 
which are public and are not following..
sendKillEvent is used by the Scheduler (which is in another package) to kill a 
container. Since this patch introduces an external entity that launches and 
kills a container, viz. the Scheduler, I feel it is apt to keep both as public 
methods. I prefer it to 'dispatcher.getEventHandler().handle..'. 

2.
The Container needs to be added to the {{nodeUpdateQueue}} if the container is 
to be move from ACQUIRED to RUNNING state (this is a state transition all 
containers should go thru). Regarding the {{launchedContainers}}, Lets have 
both Opportunistic and Guaranteed containers flow through a common code-path... 
and introduce specific behaviors if required in subsequent patches as and when 
required.

3.
bq. In the OpportunisticContainerAllocatorAMService we are now calling the 
SchedulerNode::allocate, and then we do not update the used resources but we do 
update some other counters, which leads to inconsistencies.
Hmmm... I do see that the numContainers are not decremented correctly when 
release. Thanks... but it looks like it would more likely just impact reporting 
 / UI, nothing functional (Will update the patch). Can you specify which other 
counters specifically ? Like I mentioned in the previous patch.. lets run all 
containers thru as much of the common code path before we add new counters etc.

4.
bq. Maybe as part of a different JIRA, we should at some point extend the 
container.metrics in the ContainerImpl to keep track of the scheduled/queued 
containers.
Yup.. +1 to that.

The rest of your comments make sense... will update patch.

bq. let's stress-test the code in a cluster before committing to make sure 
everything is good
It has been tested on a 3 node cluster and MR Pi jobs (with opportunistic 
containers) and I didn't hit any major issues. We can always open follow-up 
JIRAs for specific performance related issues as and when we find it. Besides, 
stess-testing is not really a precondition to committing a patch.


> Add SCHEDULE to NM container lifecycle
> --------------------------------------
>
>                 Key: YARN-4597
>                 URL: https://issues.apache.org/jira/browse/YARN-4597
>             Project: Hadoop YARN
>          Issue Type: New Feature
>          Components: nodemanager
>            Reporter: Chris Douglas
>            Assignee: Arun Suresh
>              Labels: oct16-hard
>         Attachments: YARN-4597.001.patch, YARN-4597.002.patch, 
> YARN-4597.003.patch, YARN-4597.004.patch, YARN-4597.005.patch, 
> YARN-4597.006.patch, YARN-4597.007.patch, YARN-4597.008.patch, 
> YARN-4597.009.patch
>
>
> Currently, the NM immediately launches containers after resource 
> localization. Several features could be more cleanly implemented if the NM 
> included a separate stage for reserving resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (YARN-4597) Add SCHEDULE to NM container lifecycle

Reply via email to