[ https://issues.apache.org/jira/browse/YARN-8250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16508382#comment-16508382 ]
Haibo Chen commented on YARN-8250: ---------------------------------- My apologies, Arun. I did not mean to indicate in any way you are trying to 'block'. {quote}what would 3) accomplish ? Again, point of opportunistic containers is to have it start up as fast as possible {quote} The reason why we want to tune how fast opportunistic containers are launched is that we'd like to minimize opportunistic container failures, and therefore task failures, because of the fact that the node utilization is only updated every few seconds, that is, opportunistic containers are launched and then killed quickly afterwards. yarn-1011 allows another option when there is no resources left un-allocated. Users can either wait for guaranteed containers to run their tasks at some point in the future, or start running in opportunistic containers early and automatically promoted to guaranteed containers at the same point. But if users experience more task failures, because the opportunistic containers in which the tasks are still running are launched fast and killed fast, they'll less likely to adopt yarn-1011. In that sense, being able to minimize opportunistic container failures is critical. That said, this behavior is only useful for yarn-1011. I totally agree that when there is no over-allocation and there is capacity when a container completes, there is no point of not starting opportunistic containers right away, as you said. {quote}If there is capacity at the time a container completes AND there are no G containers waiting to start, why not start the first O container in queue ? {quote} For that reason, we initially proposed another implementation of container scheduler. The proposal to change the existing container scheduler, as discussed with [~leftnoteasy], was to explore the possibility of converging on the behaviors and therefore avoiding two container scheduler implementations, to address his previous concerns. But this does not seem a good thing to do, given the discussions we've had so far. {quote}I am guessing the point of the JIRA is to ensure G container startup time is not impacted right ? {quote} The longer G container startup time is another consequence of too many opportunistic containers being launched in the case of over-allocation. We don't know how much opportunistic containers are actually consuming, so when we need to launch a G container when there isn't unallocated resources left, what we'll do is that we kill some opportunistic containers and check if more needs to be kill later when they finish. We may end up with a few rounds like that in some cases. Also, because the node utilization is stale, many opportunistic containers may get killed unnecessarily. However, this is NOT an issue at all, if there is no over-allocation. Hope that explains the thinking behind our proposal of a different container scheduler. Can you please elaborate on this, [~asuresh] ? I don't quite understand this. {quote}Wouldnt a simple approach be: Check if container is opportunistic, and if container is to be killed and if over-allocation is turned on, assume {{sleep-delay-before-sigkill.ms}} == 0 {quote} > Create another implementation of ContainerScheduler to support NM > overallocation > -------------------------------------------------------------------------------- > > Key: YARN-8250 > URL: https://issues.apache.org/jira/browse/YARN-8250 > Project: Hadoop YARN > Issue Type: Sub-task > Reporter: Haibo Chen > Assignee: Haibo Chen > Priority: Major > Attachments: YARN-8250-YARN-1011.00.patch, > YARN-8250-YARN-1011.01.patch, YARN-8250-YARN-1011.02.patch > > > YARN-6675 adds NM over-allocation support by modifying the existing > ContainerScheduler and providing a utilizationBased resource tracker. > However, the implementation adds a lot of complexity to ContainerScheduler, > and future tweak of over-allocation strategy based on how much containers > have been launched is even more complicated. > As such, this Jira proposes a new ContainerScheduler that always launch > guaranteed containers immediately and queues opportunistic containers. It > relies on a periodical check to launch opportunistic containers. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org