[
https://issues.apache.org/jira/browse/YARN-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16596014#comment-16596014
]
Wangda Tan commented on YARN-8569:
----------------------------------
[~eyang],
{quote}Unless malicious user already hacked into yarn user account and populate
data as yarn user, there is no easy parameter hacking to container-executor to
trigger exploits.
{quote}
There were tons of debates regarding to yarn user should be treated as root or
not before. We saw some issues of c-e causes yarn user can manipulate other
user's directories, or directly escalate to root user. All of these issues
become CVE.
{quote}This is the reason that this solution is invented to lower the bar of
writing clustering software for Hadoop.
{quote}
It gonna be help if you can share some real-world examples.
>From YARN's design purpose, ideally all NM/RM logics should be as general as
>possible, all service-related stuffs should be handled by service framework
>like API server or ServiceMaster. I really don't like the idea of adding
>service-specific API to NM API.
If you do think update service spec json file is important, another approach
could be:
1) ServiceMaster ro mount a local directory (under the container's local dir)
when launch docker container (example like: ./service-info -> /service/sys/fs/)
2) ServiceMaster request to re-localize new service spec json file to the
./service-info folder.
> Create an interface to provide cluster information to application
> -----------------------------------------------------------------
>
> Key: YARN-8569
> URL: https://issues.apache.org/jira/browse/YARN-8569
> Project: Hadoop YARN
> Issue Type: Sub-task
> Reporter: Eric Yang
> Assignee: Eric Yang
> Priority: Major
> Labels: Docker
> Attachments: YARN-8569.001.patch, YARN-8569.002.patch
>
>
> Some program requires container hostnames to be known for application to run.
> For example, distributed tensorflow requires launch_command that looks like:
> {code}
> # On ps0.example.com:
> $ python trainer.py \
> --ps_hosts=ps0.example.com:2222,ps1.example.com:2222 \
> --worker_hosts=worker0.example.com:2222,worker1.example.com:2222 \
> --job_name=ps --task_index=0
> # On ps1.example.com:
> $ python trainer.py \
> --ps_hosts=ps0.example.com:2222,ps1.example.com:2222 \
> --worker_hosts=worker0.example.com:2222,worker1.example.com:2222 \
> --job_name=ps --task_index=1
> # On worker0.example.com:
> $ python trainer.py \
> --ps_hosts=ps0.example.com:2222,ps1.example.com:2222 \
> --worker_hosts=worker0.example.com:2222,worker1.example.com:2222 \
> --job_name=worker --task_index=0
> # On worker1.example.com:
> $ python trainer.py \
> --ps_hosts=ps0.example.com:2222,ps1.example.com:2222 \
> --worker_hosts=worker0.example.com:2222,worker1.example.com:2222 \
> --job_name=worker --task_index=1
> {code}
> This is a bit cumbersome to orchestrate via Distributed Shell, or YARN
> services launch_command. In addition, the dynamic parameters do not work
> with YARN flex command. This is the classic pain point for application
> developer attempt to automate system environment settings as parameter to end
> user application.
> It would be great if YARN Docker integration can provide a simple option to
> expose hostnames of the yarn service via a mounted file. The file content
> gets updated when flex command is performed. This allows application
> developer to consume system environment settings via a standard interface.
> It is like /proc/devices for Linux, but for Hadoop. This may involve
> updating a file in distributed cache, and allow mounting of the file via
> container-executor.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]