[ https://issues.apache.org/jira/browse/YARN-4597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15656417#comment-15656417 ]
Arun Suresh commented on YARN-4597: ----------------------------------- Appreciate the review [~kkaranasos], 1. bq. The Container has two new methods (sendLaunchEvent and sendKillEvent), which are public and are not following.. sendKillEvent is used by the Scheduler (which is in another package) to kill a container. Since this patch introduces an external entity that launches and kills a container, viz. the Scheduler, I feel it is apt to keep both as public methods. I prefer it to 'dispatcher.getEventHandler().handle..'. 2. The Container needs to be added to the {{nodeUpdateQueue}} if the container is to be move from ACQUIRED to RUNNING state (this is a state transition all containers should go thru). Regarding the {{launchedContainers}}, Lets have both Opportunistic and Guaranteed containers flow through a common code-path... and introduce specific behaviors if required in subsequent patches as and when required. 3. bq. In the OpportunisticContainerAllocatorAMService we are now calling the SchedulerNode::allocate, and then we do not update the used resources but we do update some other counters, which leads to inconsistencies. Hmmm... I do see that the numContainers are not decremented correctly when release. Thanks... but it looks like it would more likely just impact reporting / UI, nothing functional (Will update the patch). Can you specify which other counters specifically ? Like I mentioned in the previous patch.. lets run all containers thru as much of the common code path before we add new counters etc. 4. bq. Maybe as part of a different JIRA, we should at some point extend the container.metrics in the ContainerImpl to keep track of the scheduled/queued containers. Yup.. +1 to that. The rest of your comments make sense... will update patch. bq. let's stress-test the code in a cluster before committing to make sure everything is good It has been tested on a 3 node cluster and MR Pi jobs (with opportunistic containers) and I didn't hit any major issues. We can always open follow-up JIRAs for specific performance related issues as and when we find it. Besides, stess-testing is not really a precondition to committing a patch. > Add SCHEDULE to NM container lifecycle > -------------------------------------- > > Key: YARN-4597 > URL: https://issues.apache.org/jira/browse/YARN-4597 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager > Reporter: Chris Douglas > Assignee: Arun Suresh > Labels: oct16-hard > Attachments: YARN-4597.001.patch, YARN-4597.002.patch, > YARN-4597.003.patch, YARN-4597.004.patch, YARN-4597.005.patch, > YARN-4597.006.patch, YARN-4597.007.patch, YARN-4597.008.patch, > YARN-4597.009.patch > > > Currently, the NM immediately launches containers after resource > localization. Several features could be more cleanly implemented if the NM > included a separate stage for reserving resources. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org