[
https://issues.apache.org/jira/browse/YARN-1503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15432920#comment-15432920
]
Jason Lowe commented on YARN-1503:
----------------------------------
Thanks for driving this, Jian! Seems reasonable overall with just one concern
about placing the localization status in the container status.
The container status is currently sent from the NM to the RM during the NM
heartbeat. I know some cluster setups use container launch contexts that are
quite large due to tons of resources to localize. That means this design could
cause a very significant increase in the NM heartbeat request size when the NM
is localizing containers. If a job ran very wide then every node in the
cluster could be doing a very large localization. In addition each node could
have dozens of tasks that all need the same large set of resources. The
localization status would then be replicated for all those containers and the
NM heartbeat would grow even more.
This makes me wonder if we should do one of the following:
- Omit the localization info for NM heartbeats? The RM currently doesn't care,
so sending it is a waste.
- Provide a separate API to get the gory details of localization? That way
it's "cheap" for someone to ask about container state when they don't care
about localization details. Others that care whether the container is
currently localizing and how far along can call the localization status API.
Instead of a separate API we could extend GetContainerStatusesRequest with a
flag on whether localization details should be returned in the response.
> Continuous resource-localization for YARN containers
> ----------------------------------------------------
>
> Key: YARN-1503
> URL: https://issues.apache.org/jira/browse/YARN-1503
> Project: Hadoop YARN
> Issue Type: Improvement
> Reporter: Siddharth Seth
> Assignee: Jian He
> Attachments: Continuous-resource-localization.pdf
>
>
> We have a use case, where additional resources (jars, libraries etc) need to
> be made available to an already running container. Ideally, we'd like this to
> be done via YARN (instead of having potentially multiple containers per node
> download resources on their own).
> The goal is to enable NodeManagers to localize resources while container is
> running. Today, resource-localization is always the first step before
> starting a container. It will be useful if YARN can localize the resources
> continuously even while container is running.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]