[jira] [Updated] (YARN-8135) Hadoop {Submarine} Project: Simple and scalable deployment of deep learning training / serving jobs on Hadoop

2018-11-08 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-8135:
-
Description: 
Description:

*Goals:*
 - Allow infra engineer / data scientist to run *unmodified* Tensorflow jobs on 
YARN.
 - Allow jobs easy access data/models in HDFS and other storages.
 - Can launch services to serve Tensorflow/MXNet models.
 - Support run distributed Tensorflow jobs with simple configs.
 - Support run user-specified Docker images.
 - Support specify GPU and other resources.
 - Support launch tensorboard if user specified.
 - Support customized DNS name for roles (like tensorboard.$user.$domain:6006)

*Why this name?*
 - Because Submarine is the only vehicle can let human to explore deep places. 
B-)

h3. {color:#ff}Please refer to on-going design doc, and add your thoughts: 
[https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit#|https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit?usp=sharing]{color}

 

*{color:#33}See Also:{color}*
 * {color:#33}Zeppelin integration with Submarine design: 
[https://docs.google.com/document/d/16YN8Kjmxt1Ym3clx5pDnGNXGajUT36hzQxjaik1cP4A/edit#heading=h.4jov859x47qe]{color}

  was:
Description:

*Goals:*
 - Allow infra engineer / data scientist to run *unmodified* Tensorflow jobs on 
YARN.
 - Allow jobs easy access data/models in HDFS and other storages.
 - Can launch services to serve Tensorflow/MXNet models.
 - Support run distributed Tensorflow jobs with simple configs.
 - Support run user-specified Docker images.
 - Support specify GPU and other resources.
 - Support launch tensorboard if user specified.
 - Support customized DNS name for roles (like tensorboard.$user.$domain:6006)

*Why this name?*
 - Because Submarine is the only vehicle can let human to explore deep places. 
B-)

h3. {color:#FF}Please refer to on-going design doc, and add your thoughts: 
{color:#33}[https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit#|https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit?usp=sharing]{color}{color}


> Hadoop {Submarine} Project: Simple and scalable deployment of deep learning 
> training / serving jobs on Hadoop
> -
>
> Key: YARN-8135
> URL: https://issues.apache.org/jira/browse/YARN-8135
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Major
>
> Description:
> *Goals:*
>  - Allow infra engineer / data scientist to run *unmodified* Tensorflow jobs 
> on YARN.
>  - Allow jobs easy access data/models in HDFS and other storages.
>  - Can launch services to serve Tensorflow/MXNet models.
>  - Support run distributed Tensorflow jobs with simple configs.
>  - Support run user-specified Docker images.
>  - Support specify GPU and other resources.
>  - Support launch tensorboard if user specified.
>  - Support customized DNS name for roles (like tensorboard.$user.$domain:6006)
> *Why this name?*
>  - Because Submarine is the only vehicle can let human to explore deep 
> places. B-)
> h3. {color:#ff}Please refer to on-going design doc, and add your 
> thoughts: 
> [https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit#|https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit?usp=sharing]{color}
>  
> *{color:#33}See Also:{color}*
>  * {color:#33}Zeppelin integration with Submarine design: 
> [https://docs.google.com/document/d/16YN8Kjmxt1Ym3clx5pDnGNXGajUT36hzQxjaik1cP4A/edit#heading=h.4jov859x47qe]{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8135) Hadoop {Submarine} Project: Simple and scalable deployment of deep learning training / serving jobs on Hadoop

2018-07-20 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-8135:
-
Attachment: (was: YARN-8135.poc.001.patch)

> Hadoop {Submarine} Project: Simple and scalable deployment of deep learning 
> training / serving jobs on Hadoop
> -
>
> Key: YARN-8135
> URL: https://issues.apache.org/jira/browse/YARN-8135
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Major
>
> Description:
> *Goals:*
>  - Allow infra engineer / data scientist to run *unmodified* Tensorflow jobs 
> on YARN.
>  - Allow jobs easy access data/models in HDFS and other storages.
>  - Can launch services to serve Tensorflow/MXNet models.
>  - Support run distributed Tensorflow jobs with simple configs.
>  - Support run user-specified Docker images.
>  - Support specify GPU and other resources.
>  - Support launch tensorboard if user specified.
>  - Support customized DNS name for roles (like tensorboard.$user.$domain:6006)
> *Why this name?*
>  - Because Submarine is the only vehicle can let human to explore deep 
> places. B-)
> h3. {color:#FF}Please refer to on-going design doc, and add your 
> thoughts: 
> {color:#33}[https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit#|https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit?usp=sharing]{color}{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8135) Hadoop {Submarine} Project: Simple and scalable deployment of deep learning training / serving jobs on Hadoop

2018-04-14 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-8135:
-
Attachment: YARN-8135.poc.001.patch

> Hadoop {Submarine} Project: Simple and scalable deployment of deep learning 
> training / serving jobs on Hadoop
> -
>
> Key: YARN-8135
> URL: https://issues.apache.org/jira/browse/YARN-8135
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Major
> Attachments: YARN-8135.poc.001.patch
>
>
> Description:
> *Goals:*
>  - Allow infra engineer / data scientist to run *unmodified* Tensorflow jobs 
> on YARN.
>  - Allow jobs easy access data/models in HDFS and other storages.
>  - Can launch services to serve Tensorflow/MXNet models.
>  - Support run distributed Tensorflow jobs with simple configs.
>  - Support run user-specified Docker images.
>  - Support specify GPU and other resources.
>  - Support launch tensorboard if user specified.
>  - Support customized DNS name for roles (like tensorboard.$user.$domain:6006)
> *Why this name?*
>  - Because Submarine is the only vehicle can let human to explore deep 
> places. B-)
> h3. {color:#FF}Please refer to on-going design doc, and add your 
> thoughts: 
> {color:#33}[https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit#|https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit?usp=sharing]{color}{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8135) Hadoop {Submarine} Project: Simple and scalable deployment of deep learning training / serving jobs on Hadoop

2018-04-14 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-8135:
-
Description: 
Description:

*Goals:*
 - Allow infra engineer / data scientist to run *unmodified* Tensorflow jobs on 
YARN.
 - Allow jobs easy access data/models in HDFS and other storages.
 - Can launch services to serve Tensorflow/MXNet models.
 - Support run distributed Tensorflow jobs with simple configs.
 - Support run user-specified Docker images.
 - Support specify GPU and other resources.
 - Support launch tensorboard if user specified.
 - Support customized DNS name for roles (like tensorboard.$user.$domain:6006)

*Why this name?*
 - Because Submarine is the only vehicle can let human to explore deep places. 
B-)

h3. {color:#FF}Please refer to on-going design doc, and add your thoughts: 
{color:#33}[https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit#|https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit?usp=sharing]{color}{color}

  was:
Description:

*Goals:*
 - Allow infra engineer / data scientist to run *unmodified* Tensorflow jobs on 
YARN.
 - Allow jobs easy access data/models in HDFS and other storages.
 - Can launch services to serve Tensorflow/MXNet models.
 - Support run distributed Tensorflow jobs with simple configs.
 - Support run user-specified Docker images.
 - Support specify GPU and other resources.
 - Support launch tensorboard if user specified.
 - Support customized DNS name for roles (like tensorboard.$user.$domain:6006)

*Why this name?*
 - Because Submarine is the only vehicle can let human to explore deep places. 
B-)

Please refer to on-going design doc, and add your thoughts: 
[https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit#|https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit?usp=sharing]


> Hadoop {Submarine} Project: Simple and scalable deployment of deep learning 
> training / serving jobs on Hadoop
> -
>
> Key: YARN-8135
> URL: https://issues.apache.org/jira/browse/YARN-8135
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Major
>
> Description:
> *Goals:*
>  - Allow infra engineer / data scientist to run *unmodified* Tensorflow jobs 
> on YARN.
>  - Allow jobs easy access data/models in HDFS and other storages.
>  - Can launch services to serve Tensorflow/MXNet models.
>  - Support run distributed Tensorflow jobs with simple configs.
>  - Support run user-specified Docker images.
>  - Support specify GPU and other resources.
>  - Support launch tensorboard if user specified.
>  - Support customized DNS name for roles (like tensorboard.$user.$domain:6006)
> *Why this name?*
>  - Because Submarine is the only vehicle can let human to explore deep 
> places. B-)
> h3. {color:#FF}Please refer to on-going design doc, and add your 
> thoughts: 
> {color:#33}[https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit#|https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit?usp=sharing]{color}{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8135) Hadoop {Submarine} Project: Simple and scalable deployment of deep learning training / serving jobs on Hadoop

2018-04-14 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-8135:
-
Description: 
Description:

*Goals:*
 - Allow infra engineer / data scientist to run *unmodified* Tensorflow jobs on 
YARN.
 - Allow jobs easy access data/models in HDFS and other storages.
 - Can launch services to serve Tensorflow/MXNet models.
 - Support run distributed Tensorflow jobs with simple configs.
 - Support run user-specified Docker images.
 - Support specify GPU and other resources.
 - Support launch tensorboard if user specified.
 - Support customized DNS name for roles (like tensorboard.$user.$domain:6006)

*Why this name?*
 - Because Submarine is the only vehicle can let human to explore deep places. 
B-)

Please refer to on-going design doc, and add your thoughts: 
[https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit#|https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit?usp=sharing]

  was:
Description:

*Goals:*
 - Allow infra engineer / data scientist to run *unmodified* Tensorflow jobs on 
YARN.
 - Allow jobs easy access data/models in HDFS and other storages.
 - Can launch services to serve Tensorflow/MXNet models.
 - Support run distributed Tensorflow jobs with simple configs.
 - Support run user-specified Docker images.
 - Support specify GPU and other resources.
 - Support launch tensorboard if user specified.
 - Support customized DNS name for roles (like tensorboard.$user.$domain:6006)

*Why this name?*
 - Because Submarine is the only vehicle can let human to explore deep places. 
B-)

Compare to other projects:

!image-2018-04-09-14-44-41-101.png!

*Notes:*

*GPU Isolation of XLearning project is achieved by patched YARN, which is 
different from community’s GPU isolation solution.

**XLearning needs few modification to read ClusterSpec from env.

*References:*
 - TensorflowOnSpark (Yahoo): [https://github.com/yahoo/TensorFlowOnSpark]
 - TensorFlowOnYARN (Intel): [https://github.com/Intel-bigdata/TensorFlowOnYARN]
 - Spark Deep Learning (Databricks): 
[https://github.com/databricks/spark-deep-learning]
 - XLearning (Qihoo360): [https://github.com/Qihoo360/XLearning]
 - Kubeflow (Google): [https://github.com/kubeflow/kubeflow]


> Hadoop {Submarine} Project: Simple and scalable deployment of deep learning 
> training / serving jobs on Hadoop
> -
>
> Key: YARN-8135
> URL: https://issues.apache.org/jira/browse/YARN-8135
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Major
>
> Description:
> *Goals:*
>  - Allow infra engineer / data scientist to run *unmodified* Tensorflow jobs 
> on YARN.
>  - Allow jobs easy access data/models in HDFS and other storages.
>  - Can launch services to serve Tensorflow/MXNet models.
>  - Support run distributed Tensorflow jobs with simple configs.
>  - Support run user-specified Docker images.
>  - Support specify GPU and other resources.
>  - Support launch tensorboard if user specified.
>  - Support customized DNS name for roles (like tensorboard.$user.$domain:6006)
> *Why this name?*
>  - Because Submarine is the only vehicle can let human to explore deep 
> places. B-)
> Please refer to on-going design doc, and add your thoughts: 
> [https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit#|https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit?usp=sharing]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8135) Hadoop {Submarine} Project: Simple and scalable deployment of deep learning training / serving jobs on Hadoop

2018-04-14 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-8135:
-
Attachment: (was: image-2018-04-09-14-44-41-101.png)

> Hadoop {Submarine} Project: Simple and scalable deployment of deep learning 
> training / serving jobs on Hadoop
> -
>
> Key: YARN-8135
> URL: https://issues.apache.org/jira/browse/YARN-8135
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Major
>
> Description:
> *Goals:*
>  - Allow infra engineer / data scientist to run *unmodified* Tensorflow jobs 
> on YARN.
>  - Allow jobs easy access data/models in HDFS and other storages.
>  - Can launch services to serve Tensorflow/MXNet models.
>  - Support run distributed Tensorflow jobs with simple configs.
>  - Support run user-specified Docker images.
>  - Support specify GPU and other resources.
>  - Support launch tensorboard if user specified.
>  - Support customized DNS name for roles (like tensorboard.$user.$domain:6006)
> *Why this name?*
>  - Because Submarine is the only vehicle can let human to explore deep 
> places. B-)
> Compare to other projects:
> !image-2018-04-09-14-44-41-101.png!
> *Notes:*
> *GPU Isolation of XLearning project is achieved by patched YARN, which is 
> different from community’s GPU isolation solution.
> **XLearning needs few modification to read ClusterSpec from env.
> *References:*
>  - TensorflowOnSpark (Yahoo): [https://github.com/yahoo/TensorFlowOnSpark]
>  - TensorFlowOnYARN (Intel): 
> [https://github.com/Intel-bigdata/TensorFlowOnYARN]
>  - Spark Deep Learning (Databricks): 
> [https://github.com/databricks/spark-deep-learning]
>  - XLearning (Qihoo360): [https://github.com/Qihoo360/XLearning]
>  - Kubeflow (Google): [https://github.com/kubeflow/kubeflow]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8135) Hadoop {Submarine} Project: Simple and scalable deployment of deep learning training / serving jobs on Hadoop

2018-04-14 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-8135:
-
Attachment: (was: image-2018-04-09-14-35-16-778.png)

> Hadoop {Submarine} Project: Simple and scalable deployment of deep learning 
> training / serving jobs on Hadoop
> -
>
> Key: YARN-8135
> URL: https://issues.apache.org/jira/browse/YARN-8135
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Major
>
> Description:
> *Goals:*
>  - Allow infra engineer / data scientist to run *unmodified* Tensorflow jobs 
> on YARN.
>  - Allow jobs easy access data/models in HDFS and other storages.
>  - Can launch services to serve Tensorflow/MXNet models.
>  - Support run distributed Tensorflow jobs with simple configs.
>  - Support run user-specified Docker images.
>  - Support specify GPU and other resources.
>  - Support launch tensorboard if user specified.
>  - Support customized DNS name for roles (like tensorboard.$user.$domain:6006)
> *Why this name?*
>  - Because Submarine is the only vehicle can let human to explore deep 
> places. B-)
> Compare to other projects:
> !image-2018-04-09-14-44-41-101.png!
> *Notes:*
> *GPU Isolation of XLearning project is achieved by patched YARN, which is 
> different from community’s GPU isolation solution.
> **XLearning needs few modification to read ClusterSpec from env.
> *References:*
>  - TensorflowOnSpark (Yahoo): [https://github.com/yahoo/TensorFlowOnSpark]
>  - TensorFlowOnYARN (Intel): 
> [https://github.com/Intel-bigdata/TensorFlowOnYARN]
>  - Spark Deep Learning (Databricks): 
> [https://github.com/databricks/spark-deep-learning]
>  - XLearning (Qihoo360): [https://github.com/Qihoo360/XLearning]
>  - Kubeflow (Google): [https://github.com/kubeflow/kubeflow]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8135) Hadoop {Submarine} Project: Simple and scalable deployment of deep learning training / serving jobs on Hadoop

2018-04-09 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-8135:
-
Description: 
Description:

*Goals:*
 - Allow infra engineer / data scientist to run *unmodified* Tensorflow jobs on 
YARN.
 - Allow jobs easy access data/models in HDFS and other storages.
 - Can launch services to serve Tensorflow/MXNet models.
 - Support run distributed Tensorflow jobs with simple configs.
 - Support run user-specified Docker images.
 - Support specify GPU and other resources.
 - Support launch tensorboard if user specified.
 - Support customized DNS name for roles (like tensorboard.$user.$domain:6006)

*Why this name?*
 - Because Submarine is the only vehicle can let human to explore deep places. 
B-)

Compare to other projects:

!image-2018-04-09-14-44-41-101.png!

*Notes:*

*GPU Isolation of XLearning project is achieved by patched YARN, which is 
different from community’s GPU isolation solution.

**XLearning needs few modification to read ClusterSpec from env.

*References:*
 - TensorflowOnSpark (Yahoo): [https://github.com/yahoo/TensorFlowOnSpark]
 - TensorFlowOnYARN (Intel): [https://github.com/Intel-bigdata/TensorFlowOnYARN]
 - Spark Deep Learning (Databricks): 
[https://github.com/databricks/spark-deep-learning]
 - XLearning (Qihoo360): [https://github.com/Qihoo360/XLearning]
 - Kubeflow (Google): [https://github.com/kubeflow/kubeflow]

  was:
Description:

*Goals:*
 - Allow infra engineer / data scientist to run *unmodified* Tensorflow jobs on 
YARN.
 - Allow jobs easy access data/models in HDFS and other storages.
 - Can launch services to serve Tensorflow/MXNet models.
 - Support run distributed Tensorflow jobs with simple configs.
 - Support run user-specified Docker images.
 - Support specify GPU and other resources.
 - Support launch tensorboard if user specified.
 - Support customized DNS name for roles (like tensorboard.$user.$domain:6006)

*Why this name?*
 - Because Submarine is the only vehicle can take human to deep places. B-)

Compare to other projects:

!image-2018-04-09-14-44-41-101.png!

*Notes:*

*GPU Isolation of XLearning project is achieved by patched YARN, which is 
different from community’s GPU isolation solution.

**XLearning needs few modification to read ClusterSpec from env.

*References:*
 - TensorflowOnSpark (Yahoo): [https://github.com/yahoo/TensorFlowOnSpark]
 - TensorFlowOnYARN (Intel): [https://github.com/Intel-bigdata/TensorFlowOnYARN]
 - Spark Deep Learning (Databricks): 
[https://github.com/databricks/spark-deep-learning]
 - XLearning (Qihoo360): [https://github.com/Qihoo360/XLearning]
 - Kubeflow (Google): [https://github.com/kubeflow/kubeflow]


> Hadoop {Submarine} Project: Simple and scalable deployment of deep learning 
> training / serving jobs on Hadoop
> -
>
> Key: YARN-8135
> URL: https://issues.apache.org/jira/browse/YARN-8135
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Major
> Attachments: image-2018-04-09-14-35-16-778.png, 
> image-2018-04-09-14-44-41-101.png
>
>
> Description:
> *Goals:*
>  - Allow infra engineer / data scientist to run *unmodified* Tensorflow jobs 
> on YARN.
>  - Allow jobs easy access data/models in HDFS and other storages.
>  - Can launch services to serve Tensorflow/MXNet models.
>  - Support run distributed Tensorflow jobs with simple configs.
>  - Support run user-specified Docker images.
>  - Support specify GPU and other resources.
>  - Support launch tensorboard if user specified.
>  - Support customized DNS name for roles (like tensorboard.$user.$domain:6006)
> *Why this name?*
>  - Because Submarine is the only vehicle can let human to explore deep 
> places. B-)
> Compare to other projects:
> !image-2018-04-09-14-44-41-101.png!
> *Notes:*
> *GPU Isolation of XLearning project is achieved by patched YARN, which is 
> different from community’s GPU isolation solution.
> **XLearning needs few modification to read ClusterSpec from env.
> *References:*
>  - TensorflowOnSpark (Yahoo): [https://github.com/yahoo/TensorFlowOnSpark]
>  - TensorFlowOnYARN (Intel): 
> [https://github.com/Intel-bigdata/TensorFlowOnYARN]
>  - Spark Deep Learning (Databricks): 
> [https://github.com/databricks/spark-deep-learning]
>  - XLearning (Qihoo360): [https://github.com/Qihoo360/XLearning]
>  - Kubeflow (Google): [https://github.com/kubeflow/kubeflow]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8135) Hadoop {Submarine} Project: Simple and scalable deployment of deep learning training / serving jobs on Hadoop

2018-04-09 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-8135:
-
Description: 
Description:

*Goals:*
 - Allow infra engineer / data scientist to run *unmodified* Tensorflow jobs on 
YARN.
 - Allow jobs easy access data/models in HDFS and other storages.
 - Can launch services to serve Tensorflow/MXNet models.
 - Support run distributed Tensorflow jobs with simple configs.
 - Support run user-specified Docker images.
 - Support specify GPU and other resources.
 - Support launch tensorboard if user specified.
 - Support customized DNS name for roles (like tensorboard.$user.$domain:6006)

*Why this name?*
 - Because Submarine is the only vehicle can take human to deep places. B-)

Compare to other projects:

!image-2018-04-09-14-44-41-101.png!

*Notes:*

*GPU Isolation of XLearning project is achieved by patched YARN, which is 
different from community’s GPU isolation solution.

**XLearning needs few modification to read ClusterSpec from env.

*References:*
 - TensorflowOnSpark (Yahoo): [https://github.com/yahoo/TensorFlowOnSpark]
 - TensorFlowOnYARN (Intel): [https://github.com/Intel-bigdata/TensorFlowOnYARN]
 - Spark Deep Learning (Databricks): 
[https://github.com/databricks/spark-deep-learning]
 - XLearning (Qihoo360): [https://github.com/Qihoo360/XLearning]
 - Kubeflow (Google): [https://github.com/kubeflow/kubeflow]

  was:
Description:

*Goals:*
 - Allow infra engineer / data scientist to run *unmodified* Tensorflow jobs on 
YARN.
 - Allow jobs easy access data/models in HDFS and other storages.
 - Can launch services to serve Tensorflow/MXNet models.
 - Support run distributed Tensorflow jobs with simple configs.
 - Support run user-specified Docker images.
 - Support specify GPU and other resources.
 - Support launch tensorboard if user specified.
 - Support customized DNS name for roles (like tensorboard.$user.$domain:6006)

*Why this name?*
 - Because Submarine is the only vehicle can take human to deep places. B-)

Compare to other projects:

!image-2018-04-09-14-35-16-778.png!

*Notes:*

* GPU Isolation of XLearning project is achieved by patched YARN, which is 
different from community’s GPU isolation solution.

** XLearning needs few modification to read ClusterSpec from env.

*References:*

- TensorflowOnSpark (Yahoo): https://github.com/yahoo/TensorFlowOnSpark
- TensorFlowOnYARN (Intel): https://github.com/Intel-bigdata/TensorFlowOnYARN
- Spark Deep Learning (Databricks): 
https://github.com/databricks/spark-deep-learning
- XLearning (Qihoo360): https://github.com/Qihoo360/XLearning
- Kubeflow (Google): https://github.com/kubeflow/kubeflow


> Hadoop {Submarine} Project: Simple and scalable deployment of deep learning 
> training / serving jobs on Hadoop
> -
>
> Key: YARN-8135
> URL: https://issues.apache.org/jira/browse/YARN-8135
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Major
> Attachments: image-2018-04-09-14-35-16-778.png, 
> image-2018-04-09-14-44-41-101.png
>
>
> Description:
> *Goals:*
>  - Allow infra engineer / data scientist to run *unmodified* Tensorflow jobs 
> on YARN.
>  - Allow jobs easy access data/models in HDFS and other storages.
>  - Can launch services to serve Tensorflow/MXNet models.
>  - Support run distributed Tensorflow jobs with simple configs.
>  - Support run user-specified Docker images.
>  - Support specify GPU and other resources.
>  - Support launch tensorboard if user specified.
>  - Support customized DNS name for roles (like tensorboard.$user.$domain:6006)
> *Why this name?*
>  - Because Submarine is the only vehicle can take human to deep places. B-)
> Compare to other projects:
> !image-2018-04-09-14-44-41-101.png!
> *Notes:*
> *GPU Isolation of XLearning project is achieved by patched YARN, which is 
> different from community’s GPU isolation solution.
> **XLearning needs few modification to read ClusterSpec from env.
> *References:*
>  - TensorflowOnSpark (Yahoo): [https://github.com/yahoo/TensorFlowOnSpark]
>  - TensorFlowOnYARN (Intel): 
> [https://github.com/Intel-bigdata/TensorFlowOnYARN]
>  - Spark Deep Learning (Databricks): 
> [https://github.com/databricks/spark-deep-learning]
>  - XLearning (Qihoo360): [https://github.com/Qihoo360/XLearning]
>  - Kubeflow (Google): [https://github.com/kubeflow/kubeflow]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8135) Hadoop {Submarine} Project: Simple and scalable deployment of deep learning training / serving jobs on Hadoop

2018-04-09 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-8135:
-
Attachment: image-2018-04-09-14-44-41-101.png

> Hadoop {Submarine} Project: Simple and scalable deployment of deep learning 
> training / serving jobs on Hadoop
> -
>
> Key: YARN-8135
> URL: https://issues.apache.org/jira/browse/YARN-8135
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Major
> Attachments: image-2018-04-09-14-35-16-778.png, 
> image-2018-04-09-14-44-41-101.png
>
>
> Description:
> *Goals:*
>  - Allow infra engineer / data scientist to run *unmodified* Tensorflow jobs 
> on YARN.
>  - Allow jobs easy access data/models in HDFS and other storages.
>  - Can launch services to serve Tensorflow/MXNet models.
>  - Support run distributed Tensorflow jobs with simple configs.
>  - Support run user-specified Docker images.
>  - Support specify GPU and other resources.
>  - Support launch tensorboard if user specified.
>  - Support customized DNS name for roles (like tensorboard.$user.$domain:6006)
> *Why this name?*
>  - Because Submarine is the only vehicle can take human to deep places. B-)
> Compare to other projects:
> !image-2018-04-09-14-35-16-778.png!
> *Notes:*
> * GPU Isolation of XLearning project is achieved by patched YARN, which is 
> different from community’s GPU isolation solution.
> ** XLearning needs few modification to read ClusterSpec from env.
> *References:*
> - TensorflowOnSpark (Yahoo): https://github.com/yahoo/TensorFlowOnSpark
> - TensorFlowOnYARN (Intel): https://github.com/Intel-bigdata/TensorFlowOnYARN
> - Spark Deep Learning (Databricks): 
> https://github.com/databricks/spark-deep-learning
> - XLearning (Qihoo360): https://github.com/Qihoo360/XLearning
> - Kubeflow (Google): https://github.com/kubeflow/kubeflow



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org