[jira] [Commented] (YARN-4597) Add SCHEDULE to NM container lifecycle

Konstantinos Karanasos (JIRA) Thu, 10 Nov 2016 19:19:53 -0800

    [ 
https://issues.apache.org/jira/browse/YARN-4597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15655975#comment-15655975
 ]


Konstantinos Karanasos commented on YARN-4597:
----------------------------------------------

Thanks for working on this, [~asuresh]! I am sending some first comments. I 
have not yet looked at the {{ContainerScheduler}} -- I will do that tomorrow.

- The {{Container}} has two new methods ({{sendLaunchEvent}} and 
{{sendKillEvent}}), which are public and are not following the design of the 
rest of the code that keeps such methods private and calls them through 
transitions in the {{ContainerImpl}}. Let's try to use the existing design if 
possible.

- In {{RMNodeImpl}}:
-- Instead of using the {{launchedContainers}} for both the launched and the 
queued, we might want to split it in two: one for the launched and one for the 
queued containers.
-- I think we should not add opportunistic containers to the 
{{launchContainers}}. If we do, they will be added to the 
{{newlyLaunchedContainers}}, then to the {{nodeUpdateQueue}}, and, if I am not 
wrong, they will be propagated to the schedulers for the guaranteed containers, 
which will create problems. I have to look at it a bit more, but my hunch is 
that we should avoid doing it. Even if it does not affect the resource 
accounting, I don't see any advantage to adding them.

- In the {{OpportunisticContainerAllocatorAMService}} we are now calling the 
{{SchedulerNode::allocate}}, and then we do not update the used resources, but 
we do update some other counters, which leads to inconsistencies. For example, 
when releasing a container, I think at the moment we are not calling the 
release of the {{SchedulerNode}}, which means that the container count will 
become inconsistent.
-- Instead, I suggest to add some counters for opportunistic containers at the 
{{SchedulerNode}}, both for the number of containers and for the resources 
used. In this case, we need to make sure that those resources are released too.

- Maybe as part of a different JIRA, we should at some point extend the 
{{container.metrics}} in the {{ContainerImpl}} to keep track of the 
scheduled/queued containers.

h6. Nits:
- There seem to be two redundant parameters at {{YarnConfiguration}} at the 
moment: {{NM_CONTAINER_QUEUING_MIN_QUEUE_LENGTH}} and 
{{NM_OPPORTUNISTIC_CONTAINERS_MAX_QUEUE_LENGTH}}. If I am not missing 
something, we should keep one of the two.
- {{yarn-default.xml}}: numbed -> number (in a comment)
- {{TestNodeManagerResync}}: I think it is better to use one of the existing 
methods for waiting to get to the RUNNING state.
- In {{Container}}/{{ContainerImpl}} and all the associated classes, I would 
suggest to rename {{isMarkedToKill}} to {{isMarkedForKilling}}. I know it is 
minor, but it is more self-explanatory.

I will send more comments once I check the {{ContainerScheduler}}. 
Also, let's stress-test the code in a cluster before committing to make sure 
everything is good. I can help with that.


> Add SCHEDULE to NM container lifecycle
> --------------------------------------
>
>                 Key: YARN-4597
>                 URL: https://issues.apache.org/jira/browse/YARN-4597
>             Project: Hadoop YARN
>          Issue Type: New Feature
>          Components: nodemanager
>            Reporter: Chris Douglas
>            Assignee: Arun Suresh
>              Labels: oct16-hard
>         Attachments: YARN-4597.001.patch, YARN-4597.002.patch, 
> YARN-4597.003.patch, YARN-4597.004.patch, YARN-4597.005.patch, 
> YARN-4597.006.patch, YARN-4597.007.patch, YARN-4597.008.patch, 
> YARN-4597.009.patch
>
>
> Currently, the NM immediately launches containers after resource 
> localization. Several features could be more cleanly implemented if the NM 
> included a separate stage for reserving resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-4597) Add SCHEDULE to NM container lifecycle

Reply via email to