Wangda Tan created YARN-8135:
--------------------------------

             Summary: Hadoop {Submarine} Project: Simple and scalable 
deployment of deep learning training / serving jobs on Hadoop
                 Key: YARN-8135
                 URL: https://issues.apache.org/jira/browse/YARN-8135
             Project: Hadoop YARN
          Issue Type: New Feature
            Reporter: Wangda Tan
            Assignee: Wangda Tan
         Attachments: image-2018-04-09-14-35-16-778.png

Description:

*Goals:*
 - Allow infra engineer / data scientist to run *unmodified* Tensorflow jobs on 
YARN.
 - Allow jobs easy access data/models in HDFS and other storages.
 - Can launch services to serve Tensorflow/MXNet models.
 - Support run distributed Tensorflow jobs with simple configs.
 - Support run user-specified Docker images.
 - Support specify GPU and other resources.
 - Support launch tensorboard if user specified.
 - Support customized DNS name for roles (like tensorboard.$user.$domain:6006)

*Why this name?*
 - Because Submarine is the only vehicle can take human to deep places. B-)

Compare to other projects:

!image-2018-04-09-14-35-16-778.png!

*Notes:*

* GPU Isolation of XLearning project is achieved by patched YARN, which is 
different from community’s GPU isolation solution.

** XLearning needs few modification to read ClusterSpec from env.

*References:*

- TensorflowOnSpark (Yahoo): https://github.com/yahoo/TensorFlowOnSpark
- TensorFlowOnYARN (Intel): https://github.com/Intel-bigdata/TensorFlowOnYARN
- Spark Deep Learning (Databricks): 
https://github.com/databricks/spark-deep-learning
- XLearning (Qihoo360): https://github.com/Qihoo360/XLearning
- Kubeflow (Google): https://github.com/kubeflow/kubeflow



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

Reply via email to