[ 
https://issues.apache.org/jira/browse/YARN-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16595986#comment-16595986
 ] 

Eric Yang commented on YARN-8569:
---------------------------------

{quote}
1) Frequently syncing service spec could cause cluster performance issue, a one 
thousand component instances service can cause several thousands of DFS calls 
for any service state change. Even if the feature is on by demand, a 
non-malicious user can still cause big perf impact to a cluster.
{quote}

There is no dfs calls in patch 2.  The spec information is transferred from 
AM's memory to node manager local directory.  It is light weight HTTP REST 
calls.  Ten years ago, I would agree with you that one thousand HTTP request to 
copy spec information to all node managers could choke some older generation of 
network equipment from 2005.  However, network has advanced a lot in the past 
10 years.  Most 10GB Ethernet can achieve [one million 
IOPS|https://phys.org/news/2010-01-intel-million-iops-gigabit-ethernet.html].  
Doing 1000 REST calls with 4kbytes payload can be completed in 1-5 millisecond 
and only require 4MB of bandwidth.  Base on my local testing, the 
synchronization happens in faction of milliseconds.  The described performance 
impact hard to achieve because the docker container launch and reach stable 
state took some amount of time.  Spec information distribution only happens 
when all information are reported to AM, and AM has reached stable state.  The 
spec information distribution is minuscule that machine stay idle through out 
the operation.

{quote}
2) Changes of this patch are very involved, all the way from native service to 
C code. It seems that local_dir is passed by user's command, is it possible a 
malicious user can use c-e to update service spec of other users which can 
potentially cause security issues? I haven't checked much details yet, just 
want to point out possibilities.
{quote}

Malicious user can not use container-executor to update service spec because 
container-executor checks the permission of the source directory is owned by 
node manager before it proceed with update.  This means the source of truth is 
all coming from yarn user.  Unless malicious user already hacked into yarn user 
account and populate data as yarn user, there is no easy parameter hacking to 
container-executor to trigger exploits.  The technique of local_dir passing is 
same as container launch, and check the local_dir is owned by yarn user.  This 
allows us to use local_dir from yarn-site.xml instead of hard code in 
container-executor.cfg for local directories.  Hence, this new patch to 
container-executor provides 0% more chance to cause exploits.

{quote}
Instead of doing this, to me it gonna be very useful to write 
container-specific information to spec or even ENV, such as resource, etc. It 
requires much less changes and 70-80% of issues can be addressed.
{quote}

Container information is already updated in spec file on HDFS by YARN service 
application master that is outside scope of this JIRA.  The limitation of this 
information on HDFS is still inaccessible to container that has no knowledge 
about Hadoop API.  This is the reason that this information is exposed as a 
file mounted to container without require container to carry any custom code to 
process cluster information.
 
ENV approach has been identified as a non-starter because there is no way to 
update ENV once docker container started running.  Hence, using flex command, 
or the container restarts on another node.  ENV variable will have out-dated 
information because ENV variable are immutable to the cluster changes.  This is 
the reason that this solution is invented to lower the bar of writing 
clustering software for Hadoop.

> Create an interface to provide cluster information to application
> -----------------------------------------------------------------
>
>                 Key: YARN-8569
>                 URL: https://issues.apache.org/jira/browse/YARN-8569
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Eric Yang
>            Assignee: Eric Yang
>            Priority: Major
>              Labels: Docker
>         Attachments: YARN-8569.001.patch, YARN-8569.002.patch
>
>
> Some program requires container hostnames to be known for application to run. 
>  For example, distributed tensorflow requires launch_command that looks like:
> {code}
> # On ps0.example.com:
> $ python trainer.py \
>      --ps_hosts=ps0.example.com:2222,ps1.example.com:2222 \
>      --worker_hosts=worker0.example.com:2222,worker1.example.com:2222 \
>      --job_name=ps --task_index=0
> # On ps1.example.com:
> $ python trainer.py \
>      --ps_hosts=ps0.example.com:2222,ps1.example.com:2222 \
>      --worker_hosts=worker0.example.com:2222,worker1.example.com:2222 \
>      --job_name=ps --task_index=1
> # On worker0.example.com:
> $ python trainer.py \
>      --ps_hosts=ps0.example.com:2222,ps1.example.com:2222 \
>      --worker_hosts=worker0.example.com:2222,worker1.example.com:2222 \
>      --job_name=worker --task_index=0
> # On worker1.example.com:
> $ python trainer.py \
>      --ps_hosts=ps0.example.com:2222,ps1.example.com:2222 \
>      --worker_hosts=worker0.example.com:2222,worker1.example.com:2222 \
>      --job_name=worker --task_index=1
> {code}
> This is a bit cumbersome to orchestrate via Distributed Shell, or YARN 
> services launch_command.  In addition, the dynamic parameters do not work 
> with YARN flex command.  This is the classic pain point for application 
> developer attempt to automate system environment settings as parameter to end 
> user application.
> It would be great if YARN Docker integration can provide a simple option to 
> expose hostnames of the yarn service via a mounted file.  The file content 
> gets updated when flex command is performed.  This allows application 
> developer to consume system environment settings via a standard interface.  
> It is like /proc/devices for Linux, but for Hadoop.  This may involve 
> updating a file in distributed cache, and allow mounting of the file via 
> container-executor.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to