[
https://issues.apache.org/jira/browse/YARN-5621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15524603#comment-15524603
]
Chris Douglas commented on YARN-5621:
-------------------------------------
That summary of work seems about right, thanks for putting it together.
You raise excellent points about error handling. Your sketch includes a channel
communicating which resources were (un)successfully linked. The script-driven
approach handles this in v05 by writing a separate bash script and invoking the
CE for each symlink (which, to be fair, isn't exactly "lightweight" when
compared to extending {{ContainerLocalizer}}). In v05, a failure affects only
one resource, but to take your earlier example linking a batch of resources in
the script: how would one handle partial failures? What's the state of the
container and resources when the script invocation fails?
On the CL proposal: either the CI initiates the symlink request to the
{{ResourceLocalizationService}} after download, or the two operations are
contained within that service. The complexity is comparable. The 2-phase
protocol you sketch (CI initiates download, then link) adds a gap when the CL
could be shut down before it receives the {{LINK}} commands (causing two CL
launches), but even a short timeout would likely cover that.
A single-message annotating the resource (download+symlink) could add states to
{{LocalizedResource}} if it were to notify starting containers directly
(current code) or handoff to the RLS for symlink. In this case, the protocol to
the {{ContainerImpl}} is simpler (resending/retry is idempotent b/c it doesn't
care if the download or symlink failed). Both {{FetchSuccessTransition}} and
{{LocalizedResourceTransition}} would need to send
{{LocalizerResourceRequestEvent}} for running containers to symlink. A failed
symlink would look like a failed download to the CI. Start container is
unaffected.
For the CL itself... sure, {{ResourceLocalizationSpec}} needs an another field
for symlinks. This side is pretty straightforward, right?
> Support LinuxContainerExecutor to create symlinks for continuously localized
> resources
> --------------------------------------------------------------------------------------
>
> Key: YARN-5621
> URL: https://issues.apache.org/jira/browse/YARN-5621
> Project: Hadoop YARN
> Issue Type: Sub-task
> Reporter: Jian He
> Assignee: Jian He
> Attachments: YARN-5621.1.patch, YARN-5621.2.patch, YARN-5621.3.patch,
> YARN-5621.4.patch, YARN-5621.5.patch
>
>
> When new resources are localized, new symlink needs to be created for the
> localized resource. This is the change for the LinuxContainerExecutor to
> create the symlinks.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]