[
https://issues.apache.org/jira/browse/YARN-8135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16551490#comment-16551490
]
Wangda Tan commented on YARN-8135:
----------------------------------
Discussed with many folks, thanks inputs from:
[~sunilg], [~jhung], [[email protected]], [~erwaman], [~yanboliang],
[~zhz], [~vinodkv], Xun Liu, [[email protected]] and many others.
I just put the initial patch to YARN-8561 to get early feedbacks. I tested the
patch on a 3.1.0 cluster which runs fine. Please let me know your thoughts.
I'm going to on vacation in the next week, please expect some delays of my
responses.
> Hadoop {Submarine} Project: Simple and scalable deployment of deep learning
> training / serving jobs on Hadoop
> -------------------------------------------------------------------------------------------------------------
>
> Key: YARN-8135
> URL: https://issues.apache.org/jira/browse/YARN-8135
> Project: Hadoop YARN
> Issue Type: New Feature
> Reporter: Wangda Tan
> Assignee: Wangda Tan
> Priority: Major
> Attachments: YARN-8135.poc.001.patch
>
>
> Description:
> *Goals:*
> - Allow infra engineer / data scientist to run *unmodified* Tensorflow jobs
> on YARN.
> - Allow jobs easy access data/models in HDFS and other storages.
> - Can launch services to serve Tensorflow/MXNet models.
> - Support run distributed Tensorflow jobs with simple configs.
> - Support run user-specified Docker images.
> - Support specify GPU and other resources.
> - Support launch tensorboard if user specified.
> - Support customized DNS name for roles (like tensorboard.$user.$domain:6006)
> *Why this name?*
> - Because Submarine is the only vehicle can let human to explore deep
> places. B-)
> h3. {color:#FF0000}Please refer to on-going design doc, and add your
> thoughts:
> {color:#333333}[https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit#|https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit?usp=sharing]{color}{color}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]