[ 
https://issues.apache.org/jira/browse/YARN-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16586769#comment-16586769
 ] 

Eric Yang commented on YARN-8569:
---------------------------------

First patch for demo what the interface looks like.  Here is my testing json 
for launching the app:
{code}
{
  "name": "sleeper-service",
  "version": "1.0",
  "components" :
  [
    {
      "name": "ping",
      "number_of_containers": 2,
      "artifact": {
        "id": "hadoop/centos:latest",
        "type": "DOCKER"
      },
      "launch_command": "sleep,10000",
      "resource": {
        "cpus": 1,
        "memory": "256"
      },
      "restart_policy": "NEVER",
      "configuration": {
        "env": {
          "YARN_CONTAINER_RUNTIME_DOCKER_DELAYED_REMOVAL":"true",
          "YARN_CONTAINER_RUNTIME_DOCKER_RUN_OVERRIDE_DISABLE":"true",
          "YARN_CONTAINER_RUNTIME_YARN_SYSFS":"true"
        },
        "properties": {
          "docker.network": "host"
        }
      }
    }
  ]
}
{code}

The patch will localize a copy of service.json in HDFS, and distribute it as a 
localized resource to nodes that are going to start the container.  The other 
copy of [appname].json is not used because the content of the file changes too 
frequently that triggers IOException while localizing.

Let me know if this is the direction that we want to continue.  If all looks 
good, I will provide live update to this file when application completes 
transition between STARTED, FLEXING or STABLE, etc.

> Create an interface to provide cluster information to application
> -----------------------------------------------------------------
>
>                 Key: YARN-8569
>                 URL: https://issues.apache.org/jira/browse/YARN-8569
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Eric Yang
>            Assignee: Eric Yang
>            Priority: Major
>              Labels: Docker
>         Attachments: YARN-8569.001.patch
>
>
> Some program requires container hostnames to be known for application to run. 
>  For example, distributed tensorflow requires launch_command that looks like:
> {code}
> # On ps0.example.com:
> $ python trainer.py \
>      --ps_hosts=ps0.example.com:2222,ps1.example.com:2222 \
>      --worker_hosts=worker0.example.com:2222,worker1.example.com:2222 \
>      --job_name=ps --task_index=0
> # On ps1.example.com:
> $ python trainer.py \
>      --ps_hosts=ps0.example.com:2222,ps1.example.com:2222 \
>      --worker_hosts=worker0.example.com:2222,worker1.example.com:2222 \
>      --job_name=ps --task_index=1
> # On worker0.example.com:
> $ python trainer.py \
>      --ps_hosts=ps0.example.com:2222,ps1.example.com:2222 \
>      --worker_hosts=worker0.example.com:2222,worker1.example.com:2222 \
>      --job_name=worker --task_index=0
> # On worker1.example.com:
> $ python trainer.py \
>      --ps_hosts=ps0.example.com:2222,ps1.example.com:2222 \
>      --worker_hosts=worker0.example.com:2222,worker1.example.com:2222 \
>      --job_name=worker --task_index=1
> {code}
> This is a bit cumbersome to orchestrate via Distributed Shell, or YARN 
> services launch_command.  In addition, the dynamic parameters do not work 
> with YARN flex command.  This is the classic pain point for application 
> developer attempt to automate system environment settings as parameter to end 
> user application.
> It would be great if YARN Docker integration can provide a simple option to 
> expose hostnames of the yarn service via a mounted file.  The file content 
> gets updated when flex command is performed.  This allows application 
> developer to consume system environment settings via a standard interface.  
> It is like /proc/devices for Linux, but for Hadoop.  This may involve 
> updating a file in distributed cache, and allow mounting of the file via 
> container-executor.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to