[
https://issues.apache.org/jira/browse/YARN-1593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15710178#comment-15710178
]
Haibo Chen edited comment on YARN-1593 at 11/30/16 11:25 PM:
-
Thanks for starting the work on this, [~vvasudev]!
I’d like to understand the proposal better. A few comments/questions on the
proposal. Please correct me as necessary.
It seems like system containers are overloaded in the design doc. From a NM’s
perspective, my understanding is that system containers are special container
runtime (relative to the container types we have today in NM) provided by NM to
be used by system services to run their components/instances. In other cases,
system containers represent components/instances of system services on the
worker nodes. In the former case, we may only need to be concerned with issues
such as classpath and container executors. For ShuffleHandler for instance, it
is an alternative of the in-process runtime it gets from NM today. The latter,
is where we discuss whether RM or NM does the heavy-lifting of managing system
containers.
As you mention, no one option suits all use cases. Option 1 suits some, while
option 3 suits others. I wonder if this is because we are conflating two
different types of containers in the proposal - (1) framework-specific services
like MR shuffle, and (2) application-specific services. Framework services are
to be run on all nodes that support the framework (e.g. MR). Since these run on
every node, node-level configs (option 3) would work best. Application-services
(e.g. ATS AM-companion-collector), on the other hand, are application specific
and need to run on a subset of cluster nodes; option 1 readily applies to
these. Is this categorization accurate? And, do you see merit in
differentiating between these two?
bq. Allow shuffle to run on the NodeManagers without requiring it to be setup
as an AuxiliaryService
Not sure if I understand this correctly, IHO, we could let the user continue
with their current configuration for AuxiliaryService, but just run them in
containers with AuxiliaryService proxy like Junping said in the jira
description.
bq. Handling container status for system-containers - we will need to add logic
to not act upon the container status of a system-container.
Can you please elaborate more on this? Shouldn’t NM try to relaunch system
containers? Does this mean that RM will take the responsibility of handling
system container failures?
bq. I think discovery is going to be one major piece that needs to be addressed
from the beginning
Agree with Sangjin that discovery problem needs to be addressed right at the
beginning. For option 3, I think we can add a queryable registry in
AuxiliaryServices when NM launches a proxied AuxiliaryService assuming that NM
will launch the AuxiliaryServices in the right order and each AuxiliaryService
knows its dependent services.
bq. the NodeManager will block container requests until all the
system-containers are running
With global scheduling and resource affinity, NM does not necessarily need to
block container launching. NM can launch system containers asynchronously and
report to resource manager upon launch success, and RM can only schedule
containers on those nodes if the services that the containers depend on have
been launched on those nodes. But that’s way in the future I guess
bq. We can’t solve the dependency management and affinity/anti-affinity
requirements. (One of cons in option 3)
Not quite sure how option 1 solves the affinity requirement. Can you elaborate
a little more on this? To solve the dependency management issue, one thing
that occurred to me, but I have not thought about in much details, is, we could
have RM manages all system services together and construct a DAG of system
services that need to be launched on each NM. Alternatively, RM can just decide
what services need to be launched on which nodes with their dependency clearly
defined, and then NM can construct the DAG themselves and launches them in
topological order. This however, does put some burden on RM.
was (Author: haibochen):
Thanks for starting the work on this, Varun Vasudev!
I’d like to understand the proposal better. A few comments/questions on the
proposal. Please correct me as necessary.
It seems like system containers are overloaded in the design doc. From a NM’s
perspective, my understanding is that system containers are special container
runtime (relative to the container types we have today in NM) provided by NM to
be used by system services to run their components/instances. In other cases,
system containers represent components/instances of system services on the
worker nodes. In the former case, we may only need to be concerned with issues
such as classpath and container executors. For ShuffleHandler for instance, it
is an alternative of the