[ 
https://issues.apache.org/jira/browse/YARN-8135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-8135:
-----------------------------
    Description: 
Description:

*Goals:*
 - Allow infra engineer / data scientist to run *unmodified* Tensorflow jobs on 
YARN.
 - Allow jobs easy access data/models in HDFS and other storages.
 - Can launch services to serve Tensorflow/MXNet models.
 - Support run distributed Tensorflow jobs with simple configs.
 - Support run user-specified Docker images.
 - Support specify GPU and other resources.
 - Support launch tensorboard if user specified.
 - Support customized DNS name for roles (like tensorboard.$user.$domain:6006)

*Why this name?*
 - Because Submarine is the only vehicle can let human to explore deep places. 
B-)

Compare to other projects:

!image-2018-04-09-14-44-41-101.png!

*Notes:*

*GPU Isolation of XLearning project is achieved by patched YARN, which is 
different from community’s GPU isolation solution.

**XLearning needs few modification to read ClusterSpec from env.

*References:*
 - TensorflowOnSpark (Yahoo): [https://github.com/yahoo/TensorFlowOnSpark]
 - TensorFlowOnYARN (Intel): [https://github.com/Intel-bigdata/TensorFlowOnYARN]
 - Spark Deep Learning (Databricks): 
[https://github.com/databricks/spark-deep-learning]
 - XLearning (Qihoo360): [https://github.com/Qihoo360/XLearning]
 - Kubeflow (Google): [https://github.com/kubeflow/kubeflow]

  was:
Description:

*Goals:*
 - Allow infra engineer / data scientist to run *unmodified* Tensorflow jobs on 
YARN.
 - Allow jobs easy access data/models in HDFS and other storages.
 - Can launch services to serve Tensorflow/MXNet models.
 - Support run distributed Tensorflow jobs with simple configs.
 - Support run user-specified Docker images.
 - Support specify GPU and other resources.
 - Support launch tensorboard if user specified.
 - Support customized DNS name for roles (like tensorboard.$user.$domain:6006)

*Why this name?*
 - Because Submarine is the only vehicle can take human to deep places. B-)

Compare to other projects:

!image-2018-04-09-14-44-41-101.png!

*Notes:*

*GPU Isolation of XLearning project is achieved by patched YARN, which is 
different from community’s GPU isolation solution.

**XLearning needs few modification to read ClusterSpec from env.

*References:*
 - TensorflowOnSpark (Yahoo): [https://github.com/yahoo/TensorFlowOnSpark]
 - TensorFlowOnYARN (Intel): [https://github.com/Intel-bigdata/TensorFlowOnYARN]
 - Spark Deep Learning (Databricks): 
[https://github.com/databricks/spark-deep-learning]
 - XLearning (Qihoo360): [https://github.com/Qihoo360/XLearning]
 - Kubeflow (Google): [https://github.com/kubeflow/kubeflow]


> Hadoop {Submarine} Project: Simple and scalable deployment of deep learning 
> training / serving jobs on Hadoop
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-8135
>                 URL: https://issues.apache.org/jira/browse/YARN-8135
>             Project: Hadoop YARN
>          Issue Type: New Feature
>            Reporter: Wangda Tan
>            Assignee: Wangda Tan
>            Priority: Major
>         Attachments: image-2018-04-09-14-35-16-778.png, 
> image-2018-04-09-14-44-41-101.png
>
>
> Description:
> *Goals:*
>  - Allow infra engineer / data scientist to run *unmodified* Tensorflow jobs 
> on YARN.
>  - Allow jobs easy access data/models in HDFS and other storages.
>  - Can launch services to serve Tensorflow/MXNet models.
>  - Support run distributed Tensorflow jobs with simple configs.
>  - Support run user-specified Docker images.
>  - Support specify GPU and other resources.
>  - Support launch tensorboard if user specified.
>  - Support customized DNS name for roles (like tensorboard.$user.$domain:6006)
> *Why this name?*
>  - Because Submarine is the only vehicle can let human to explore deep 
> places. B-)
> Compare to other projects:
> !image-2018-04-09-14-44-41-101.png!
> *Notes:*
> *GPU Isolation of XLearning project is achieved by patched YARN, which is 
> different from community’s GPU isolation solution.
> **XLearning needs few modification to read ClusterSpec from env.
> *References:*
>  - TensorflowOnSpark (Yahoo): [https://github.com/yahoo/TensorFlowOnSpark]
>  - TensorFlowOnYARN (Intel): 
> [https://github.com/Intel-bigdata/TensorFlowOnYARN]
>  - Spark Deep Learning (Databricks): 
> [https://github.com/databricks/spark-deep-learning]
>  - XLearning (Qihoo360): [https://github.com/Qihoo360/XLearning]
>  - Kubeflow (Google): [https://github.com/kubeflow/kubeflow]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to