[jira] [Updated] (YARN-8135) Hadoop {Submarine} Project: Simple and scalable deployment of deep learning training / serving jobs on Hadoop
[ https://issues.apache.org/jira/browse/YARN-8135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-8135: - Description: Description: *Goals:* - Allow infra engineer / data scientist to run *unmodified* Tensorflow jobs on YARN. - Allow jobs easy access data/models in HDFS and other storages. - Can launch services to serve Tensorflow/MXNet models. - Support run distributed Tensorflow jobs with simple configs. - Support run user-specified Docker images. - Support specify GPU and other resources. - Support launch tensorboard if user specified. - Support customized DNS name for roles (like tensorboard.$user.$domain:6006) *Why this name?* - Because Submarine is the only vehicle can let human to explore deep places. B-) h3. {color:#ff}Please refer to on-going design doc, and add your thoughts: [https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit#|https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit?usp=sharing]{color} *{color:#33}See Also:{color}* * {color:#33}Zeppelin integration with Submarine design: [https://docs.google.com/document/d/16YN8Kjmxt1Ym3clx5pDnGNXGajUT36hzQxjaik1cP4A/edit#heading=h.4jov859x47qe]{color} was: Description: *Goals:* - Allow infra engineer / data scientist to run *unmodified* Tensorflow jobs on YARN. - Allow jobs easy access data/models in HDFS and other storages. - Can launch services to serve Tensorflow/MXNet models. - Support run distributed Tensorflow jobs with simple configs. - Support run user-specified Docker images. - Support specify GPU and other resources. - Support launch tensorboard if user specified. - Support customized DNS name for roles (like tensorboard.$user.$domain:6006) *Why this name?* - Because Submarine is the only vehicle can let human to explore deep places. B-) h3. {color:#FF}Please refer to on-going design doc, and add your thoughts: {color:#33}[https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit#|https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit?usp=sharing]{color}{color} > Hadoop {Submarine} Project: Simple and scalable deployment of deep learning > training / serving jobs on Hadoop > - > > Key: YARN-8135 > URL: https://issues.apache.org/jira/browse/YARN-8135 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Major > > Description: > *Goals:* > - Allow infra engineer / data scientist to run *unmodified* Tensorflow jobs > on YARN. > - Allow jobs easy access data/models in HDFS and other storages. > - Can launch services to serve Tensorflow/MXNet models. > - Support run distributed Tensorflow jobs with simple configs. > - Support run user-specified Docker images. > - Support specify GPU and other resources. > - Support launch tensorboard if user specified. > - Support customized DNS name for roles (like tensorboard.$user.$domain:6006) > *Why this name?* > - Because Submarine is the only vehicle can let human to explore deep > places. B-) > h3. {color:#ff}Please refer to on-going design doc, and add your > thoughts: > [https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit#|https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit?usp=sharing]{color} > > *{color:#33}See Also:{color}* > * {color:#33}Zeppelin integration with Submarine design: > [https://docs.google.com/document/d/16YN8Kjmxt1Ym3clx5pDnGNXGajUT36hzQxjaik1cP4A/edit#heading=h.4jov859x47qe]{color} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8135) Hadoop {Submarine} Project: Simple and scalable deployment of deep learning training / serving jobs on Hadoop
[ https://issues.apache.org/jira/browse/YARN-8135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-8135: - Attachment: (was: YARN-8135.poc.001.patch) > Hadoop {Submarine} Project: Simple and scalable deployment of deep learning > training / serving jobs on Hadoop > - > > Key: YARN-8135 > URL: https://issues.apache.org/jira/browse/YARN-8135 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Major > > Description: > *Goals:* > - Allow infra engineer / data scientist to run *unmodified* Tensorflow jobs > on YARN. > - Allow jobs easy access data/models in HDFS and other storages. > - Can launch services to serve Tensorflow/MXNet models. > - Support run distributed Tensorflow jobs with simple configs. > - Support run user-specified Docker images. > - Support specify GPU and other resources. > - Support launch tensorboard if user specified. > - Support customized DNS name for roles (like tensorboard.$user.$domain:6006) > *Why this name?* > - Because Submarine is the only vehicle can let human to explore deep > places. B-) > h3. {color:#FF}Please refer to on-going design doc, and add your > thoughts: > {color:#33}[https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit#|https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit?usp=sharing]{color}{color} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8135) Hadoop {Submarine} Project: Simple and scalable deployment of deep learning training / serving jobs on Hadoop
[ https://issues.apache.org/jira/browse/YARN-8135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-8135: - Attachment: YARN-8135.poc.001.patch > Hadoop {Submarine} Project: Simple and scalable deployment of deep learning > training / serving jobs on Hadoop > - > > Key: YARN-8135 > URL: https://issues.apache.org/jira/browse/YARN-8135 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Major > Attachments: YARN-8135.poc.001.patch > > > Description: > *Goals:* > - Allow infra engineer / data scientist to run *unmodified* Tensorflow jobs > on YARN. > - Allow jobs easy access data/models in HDFS and other storages. > - Can launch services to serve Tensorflow/MXNet models. > - Support run distributed Tensorflow jobs with simple configs. > - Support run user-specified Docker images. > - Support specify GPU and other resources. > - Support launch tensorboard if user specified. > - Support customized DNS name for roles (like tensorboard.$user.$domain:6006) > *Why this name?* > - Because Submarine is the only vehicle can let human to explore deep > places. B-) > h3. {color:#FF}Please refer to on-going design doc, and add your > thoughts: > {color:#33}[https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit#|https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit?usp=sharing]{color}{color} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8135) Hadoop {Submarine} Project: Simple and scalable deployment of deep learning training / serving jobs on Hadoop
[ https://issues.apache.org/jira/browse/YARN-8135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-8135: - Description: Description: *Goals:* - Allow infra engineer / data scientist to run *unmodified* Tensorflow jobs on YARN. - Allow jobs easy access data/models in HDFS and other storages. - Can launch services to serve Tensorflow/MXNet models. - Support run distributed Tensorflow jobs with simple configs. - Support run user-specified Docker images. - Support specify GPU and other resources. - Support launch tensorboard if user specified. - Support customized DNS name for roles (like tensorboard.$user.$domain:6006) *Why this name?* - Because Submarine is the only vehicle can let human to explore deep places. B-) h3. {color:#FF}Please refer to on-going design doc, and add your thoughts: {color:#33}[https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit#|https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit?usp=sharing]{color}{color} was: Description: *Goals:* - Allow infra engineer / data scientist to run *unmodified* Tensorflow jobs on YARN. - Allow jobs easy access data/models in HDFS and other storages. - Can launch services to serve Tensorflow/MXNet models. - Support run distributed Tensorflow jobs with simple configs. - Support run user-specified Docker images. - Support specify GPU and other resources. - Support launch tensorboard if user specified. - Support customized DNS name for roles (like tensorboard.$user.$domain:6006) *Why this name?* - Because Submarine is the only vehicle can let human to explore deep places. B-) Please refer to on-going design doc, and add your thoughts: [https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit#|https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit?usp=sharing] > Hadoop {Submarine} Project: Simple and scalable deployment of deep learning > training / serving jobs on Hadoop > - > > Key: YARN-8135 > URL: https://issues.apache.org/jira/browse/YARN-8135 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Major > > Description: > *Goals:* > - Allow infra engineer / data scientist to run *unmodified* Tensorflow jobs > on YARN. > - Allow jobs easy access data/models in HDFS and other storages. > - Can launch services to serve Tensorflow/MXNet models. > - Support run distributed Tensorflow jobs with simple configs. > - Support run user-specified Docker images. > - Support specify GPU and other resources. > - Support launch tensorboard if user specified. > - Support customized DNS name for roles (like tensorboard.$user.$domain:6006) > *Why this name?* > - Because Submarine is the only vehicle can let human to explore deep > places. B-) > h3. {color:#FF}Please refer to on-going design doc, and add your > thoughts: > {color:#33}[https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit#|https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit?usp=sharing]{color}{color} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8135) Hadoop {Submarine} Project: Simple and scalable deployment of deep learning training / serving jobs on Hadoop
[ https://issues.apache.org/jira/browse/YARN-8135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-8135: - Description: Description: *Goals:* - Allow infra engineer / data scientist to run *unmodified* Tensorflow jobs on YARN. - Allow jobs easy access data/models in HDFS and other storages. - Can launch services to serve Tensorflow/MXNet models. - Support run distributed Tensorflow jobs with simple configs. - Support run user-specified Docker images. - Support specify GPU and other resources. - Support launch tensorboard if user specified. - Support customized DNS name for roles (like tensorboard.$user.$domain:6006) *Why this name?* - Because Submarine is the only vehicle can let human to explore deep places. B-) Please refer to on-going design doc, and add your thoughts: [https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit#|https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit?usp=sharing] was: Description: *Goals:* - Allow infra engineer / data scientist to run *unmodified* Tensorflow jobs on YARN. - Allow jobs easy access data/models in HDFS and other storages. - Can launch services to serve Tensorflow/MXNet models. - Support run distributed Tensorflow jobs with simple configs. - Support run user-specified Docker images. - Support specify GPU and other resources. - Support launch tensorboard if user specified. - Support customized DNS name for roles (like tensorboard.$user.$domain:6006) *Why this name?* - Because Submarine is the only vehicle can let human to explore deep places. B-) Compare to other projects: !image-2018-04-09-14-44-41-101.png! *Notes:* *GPU Isolation of XLearning project is achieved by patched YARN, which is different from community’s GPU isolation solution. **XLearning needs few modification to read ClusterSpec from env. *References:* - TensorflowOnSpark (Yahoo): [https://github.com/yahoo/TensorFlowOnSpark] - TensorFlowOnYARN (Intel): [https://github.com/Intel-bigdata/TensorFlowOnYARN] - Spark Deep Learning (Databricks): [https://github.com/databricks/spark-deep-learning] - XLearning (Qihoo360): [https://github.com/Qihoo360/XLearning] - Kubeflow (Google): [https://github.com/kubeflow/kubeflow] > Hadoop {Submarine} Project: Simple and scalable deployment of deep learning > training / serving jobs on Hadoop > - > > Key: YARN-8135 > URL: https://issues.apache.org/jira/browse/YARN-8135 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Major > > Description: > *Goals:* > - Allow infra engineer / data scientist to run *unmodified* Tensorflow jobs > on YARN. > - Allow jobs easy access data/models in HDFS and other storages. > - Can launch services to serve Tensorflow/MXNet models. > - Support run distributed Tensorflow jobs with simple configs. > - Support run user-specified Docker images. > - Support specify GPU and other resources. > - Support launch tensorboard if user specified. > - Support customized DNS name for roles (like tensorboard.$user.$domain:6006) > *Why this name?* > - Because Submarine is the only vehicle can let human to explore deep > places. B-) > Please refer to on-going design doc, and add your thoughts: > [https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit#|https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit?usp=sharing] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8135) Hadoop {Submarine} Project: Simple and scalable deployment of deep learning training / serving jobs on Hadoop
[ https://issues.apache.org/jira/browse/YARN-8135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-8135: - Attachment: (was: image-2018-04-09-14-44-41-101.png) > Hadoop {Submarine} Project: Simple and scalable deployment of deep learning > training / serving jobs on Hadoop > - > > Key: YARN-8135 > URL: https://issues.apache.org/jira/browse/YARN-8135 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Major > > Description: > *Goals:* > - Allow infra engineer / data scientist to run *unmodified* Tensorflow jobs > on YARN. > - Allow jobs easy access data/models in HDFS and other storages. > - Can launch services to serve Tensorflow/MXNet models. > - Support run distributed Tensorflow jobs with simple configs. > - Support run user-specified Docker images. > - Support specify GPU and other resources. > - Support launch tensorboard if user specified. > - Support customized DNS name for roles (like tensorboard.$user.$domain:6006) > *Why this name?* > - Because Submarine is the only vehicle can let human to explore deep > places. B-) > Compare to other projects: > !image-2018-04-09-14-44-41-101.png! > *Notes:* > *GPU Isolation of XLearning project is achieved by patched YARN, which is > different from community’s GPU isolation solution. > **XLearning needs few modification to read ClusterSpec from env. > *References:* > - TensorflowOnSpark (Yahoo): [https://github.com/yahoo/TensorFlowOnSpark] > - TensorFlowOnYARN (Intel): > [https://github.com/Intel-bigdata/TensorFlowOnYARN] > - Spark Deep Learning (Databricks): > [https://github.com/databricks/spark-deep-learning] > - XLearning (Qihoo360): [https://github.com/Qihoo360/XLearning] > - Kubeflow (Google): [https://github.com/kubeflow/kubeflow] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8135) Hadoop {Submarine} Project: Simple and scalable deployment of deep learning training / serving jobs on Hadoop
[ https://issues.apache.org/jira/browse/YARN-8135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-8135: - Attachment: (was: image-2018-04-09-14-35-16-778.png) > Hadoop {Submarine} Project: Simple and scalable deployment of deep learning > training / serving jobs on Hadoop > - > > Key: YARN-8135 > URL: https://issues.apache.org/jira/browse/YARN-8135 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Major > > Description: > *Goals:* > - Allow infra engineer / data scientist to run *unmodified* Tensorflow jobs > on YARN. > - Allow jobs easy access data/models in HDFS and other storages. > - Can launch services to serve Tensorflow/MXNet models. > - Support run distributed Tensorflow jobs with simple configs. > - Support run user-specified Docker images. > - Support specify GPU and other resources. > - Support launch tensorboard if user specified. > - Support customized DNS name for roles (like tensorboard.$user.$domain:6006) > *Why this name?* > - Because Submarine is the only vehicle can let human to explore deep > places. B-) > Compare to other projects: > !image-2018-04-09-14-44-41-101.png! > *Notes:* > *GPU Isolation of XLearning project is achieved by patched YARN, which is > different from community’s GPU isolation solution. > **XLearning needs few modification to read ClusterSpec from env. > *References:* > - TensorflowOnSpark (Yahoo): [https://github.com/yahoo/TensorFlowOnSpark] > - TensorFlowOnYARN (Intel): > [https://github.com/Intel-bigdata/TensorFlowOnYARN] > - Spark Deep Learning (Databricks): > [https://github.com/databricks/spark-deep-learning] > - XLearning (Qihoo360): [https://github.com/Qihoo360/XLearning] > - Kubeflow (Google): [https://github.com/kubeflow/kubeflow] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8135) Hadoop {Submarine} Project: Simple and scalable deployment of deep learning training / serving jobs on Hadoop
[ https://issues.apache.org/jira/browse/YARN-8135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-8135: - Description: Description: *Goals:* - Allow infra engineer / data scientist to run *unmodified* Tensorflow jobs on YARN. - Allow jobs easy access data/models in HDFS and other storages. - Can launch services to serve Tensorflow/MXNet models. - Support run distributed Tensorflow jobs with simple configs. - Support run user-specified Docker images. - Support specify GPU and other resources. - Support launch tensorboard if user specified. - Support customized DNS name for roles (like tensorboard.$user.$domain:6006) *Why this name?* - Because Submarine is the only vehicle can let human to explore deep places. B-) Compare to other projects: !image-2018-04-09-14-44-41-101.png! *Notes:* *GPU Isolation of XLearning project is achieved by patched YARN, which is different from community’s GPU isolation solution. **XLearning needs few modification to read ClusterSpec from env. *References:* - TensorflowOnSpark (Yahoo): [https://github.com/yahoo/TensorFlowOnSpark] - TensorFlowOnYARN (Intel): [https://github.com/Intel-bigdata/TensorFlowOnYARN] - Spark Deep Learning (Databricks): [https://github.com/databricks/spark-deep-learning] - XLearning (Qihoo360): [https://github.com/Qihoo360/XLearning] - Kubeflow (Google): [https://github.com/kubeflow/kubeflow] was: Description: *Goals:* - Allow infra engineer / data scientist to run *unmodified* Tensorflow jobs on YARN. - Allow jobs easy access data/models in HDFS and other storages. - Can launch services to serve Tensorflow/MXNet models. - Support run distributed Tensorflow jobs with simple configs. - Support run user-specified Docker images. - Support specify GPU and other resources. - Support launch tensorboard if user specified. - Support customized DNS name for roles (like tensorboard.$user.$domain:6006) *Why this name?* - Because Submarine is the only vehicle can take human to deep places. B-) Compare to other projects: !image-2018-04-09-14-44-41-101.png! *Notes:* *GPU Isolation of XLearning project is achieved by patched YARN, which is different from community’s GPU isolation solution. **XLearning needs few modification to read ClusterSpec from env. *References:* - TensorflowOnSpark (Yahoo): [https://github.com/yahoo/TensorFlowOnSpark] - TensorFlowOnYARN (Intel): [https://github.com/Intel-bigdata/TensorFlowOnYARN] - Spark Deep Learning (Databricks): [https://github.com/databricks/spark-deep-learning] - XLearning (Qihoo360): [https://github.com/Qihoo360/XLearning] - Kubeflow (Google): [https://github.com/kubeflow/kubeflow] > Hadoop {Submarine} Project: Simple and scalable deployment of deep learning > training / serving jobs on Hadoop > - > > Key: YARN-8135 > URL: https://issues.apache.org/jira/browse/YARN-8135 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Major > Attachments: image-2018-04-09-14-35-16-778.png, > image-2018-04-09-14-44-41-101.png > > > Description: > *Goals:* > - Allow infra engineer / data scientist to run *unmodified* Tensorflow jobs > on YARN. > - Allow jobs easy access data/models in HDFS and other storages. > - Can launch services to serve Tensorflow/MXNet models. > - Support run distributed Tensorflow jobs with simple configs. > - Support run user-specified Docker images. > - Support specify GPU and other resources. > - Support launch tensorboard if user specified. > - Support customized DNS name for roles (like tensorboard.$user.$domain:6006) > *Why this name?* > - Because Submarine is the only vehicle can let human to explore deep > places. B-) > Compare to other projects: > !image-2018-04-09-14-44-41-101.png! > *Notes:* > *GPU Isolation of XLearning project is achieved by patched YARN, which is > different from community’s GPU isolation solution. > **XLearning needs few modification to read ClusterSpec from env. > *References:* > - TensorflowOnSpark (Yahoo): [https://github.com/yahoo/TensorFlowOnSpark] > - TensorFlowOnYARN (Intel): > [https://github.com/Intel-bigdata/TensorFlowOnYARN] > - Spark Deep Learning (Databricks): > [https://github.com/databricks/spark-deep-learning] > - XLearning (Qihoo360): [https://github.com/Qihoo360/XLearning] > - Kubeflow (Google): [https://github.com/kubeflow/kubeflow] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8135) Hadoop {Submarine} Project: Simple and scalable deployment of deep learning training / serving jobs on Hadoop
[ https://issues.apache.org/jira/browse/YARN-8135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-8135: - Description: Description: *Goals:* - Allow infra engineer / data scientist to run *unmodified* Tensorflow jobs on YARN. - Allow jobs easy access data/models in HDFS and other storages. - Can launch services to serve Tensorflow/MXNet models. - Support run distributed Tensorflow jobs with simple configs. - Support run user-specified Docker images. - Support specify GPU and other resources. - Support launch tensorboard if user specified. - Support customized DNS name for roles (like tensorboard.$user.$domain:6006) *Why this name?* - Because Submarine is the only vehicle can take human to deep places. B-) Compare to other projects: !image-2018-04-09-14-44-41-101.png! *Notes:* *GPU Isolation of XLearning project is achieved by patched YARN, which is different from community’s GPU isolation solution. **XLearning needs few modification to read ClusterSpec from env. *References:* - TensorflowOnSpark (Yahoo): [https://github.com/yahoo/TensorFlowOnSpark] - TensorFlowOnYARN (Intel): [https://github.com/Intel-bigdata/TensorFlowOnYARN] - Spark Deep Learning (Databricks): [https://github.com/databricks/spark-deep-learning] - XLearning (Qihoo360): [https://github.com/Qihoo360/XLearning] - Kubeflow (Google): [https://github.com/kubeflow/kubeflow] was: Description: *Goals:* - Allow infra engineer / data scientist to run *unmodified* Tensorflow jobs on YARN. - Allow jobs easy access data/models in HDFS and other storages. - Can launch services to serve Tensorflow/MXNet models. - Support run distributed Tensorflow jobs with simple configs. - Support run user-specified Docker images. - Support specify GPU and other resources. - Support launch tensorboard if user specified. - Support customized DNS name for roles (like tensorboard.$user.$domain:6006) *Why this name?* - Because Submarine is the only vehicle can take human to deep places. B-) Compare to other projects: !image-2018-04-09-14-35-16-778.png! *Notes:* * GPU Isolation of XLearning project is achieved by patched YARN, which is different from community’s GPU isolation solution. ** XLearning needs few modification to read ClusterSpec from env. *References:* - TensorflowOnSpark (Yahoo): https://github.com/yahoo/TensorFlowOnSpark - TensorFlowOnYARN (Intel): https://github.com/Intel-bigdata/TensorFlowOnYARN - Spark Deep Learning (Databricks): https://github.com/databricks/spark-deep-learning - XLearning (Qihoo360): https://github.com/Qihoo360/XLearning - Kubeflow (Google): https://github.com/kubeflow/kubeflow > Hadoop {Submarine} Project: Simple and scalable deployment of deep learning > training / serving jobs on Hadoop > - > > Key: YARN-8135 > URL: https://issues.apache.org/jira/browse/YARN-8135 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Major > Attachments: image-2018-04-09-14-35-16-778.png, > image-2018-04-09-14-44-41-101.png > > > Description: > *Goals:* > - Allow infra engineer / data scientist to run *unmodified* Tensorflow jobs > on YARN. > - Allow jobs easy access data/models in HDFS and other storages. > - Can launch services to serve Tensorflow/MXNet models. > - Support run distributed Tensorflow jobs with simple configs. > - Support run user-specified Docker images. > - Support specify GPU and other resources. > - Support launch tensorboard if user specified. > - Support customized DNS name for roles (like tensorboard.$user.$domain:6006) > *Why this name?* > - Because Submarine is the only vehicle can take human to deep places. B-) > Compare to other projects: > !image-2018-04-09-14-44-41-101.png! > *Notes:* > *GPU Isolation of XLearning project is achieved by patched YARN, which is > different from community’s GPU isolation solution. > **XLearning needs few modification to read ClusterSpec from env. > *References:* > - TensorflowOnSpark (Yahoo): [https://github.com/yahoo/TensorFlowOnSpark] > - TensorFlowOnYARN (Intel): > [https://github.com/Intel-bigdata/TensorFlowOnYARN] > - Spark Deep Learning (Databricks): > [https://github.com/databricks/spark-deep-learning] > - XLearning (Qihoo360): [https://github.com/Qihoo360/XLearning] > - Kubeflow (Google): [https://github.com/kubeflow/kubeflow] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8135) Hadoop {Submarine} Project: Simple and scalable deployment of deep learning training / serving jobs on Hadoop
[ https://issues.apache.org/jira/browse/YARN-8135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-8135: - Attachment: image-2018-04-09-14-44-41-101.png > Hadoop {Submarine} Project: Simple and scalable deployment of deep learning > training / serving jobs on Hadoop > - > > Key: YARN-8135 > URL: https://issues.apache.org/jira/browse/YARN-8135 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Major > Attachments: image-2018-04-09-14-35-16-778.png, > image-2018-04-09-14-44-41-101.png > > > Description: > *Goals:* > - Allow infra engineer / data scientist to run *unmodified* Tensorflow jobs > on YARN. > - Allow jobs easy access data/models in HDFS and other storages. > - Can launch services to serve Tensorflow/MXNet models. > - Support run distributed Tensorflow jobs with simple configs. > - Support run user-specified Docker images. > - Support specify GPU and other resources. > - Support launch tensorboard if user specified. > - Support customized DNS name for roles (like tensorboard.$user.$domain:6006) > *Why this name?* > - Because Submarine is the only vehicle can take human to deep places. B-) > Compare to other projects: > !image-2018-04-09-14-35-16-778.png! > *Notes:* > * GPU Isolation of XLearning project is achieved by patched YARN, which is > different from community’s GPU isolation solution. > ** XLearning needs few modification to read ClusterSpec from env. > *References:* > - TensorflowOnSpark (Yahoo): https://github.com/yahoo/TensorFlowOnSpark > - TensorFlowOnYARN (Intel): https://github.com/Intel-bigdata/TensorFlowOnYARN > - Spark Deep Learning (Databricks): > https://github.com/databricks/spark-deep-learning > - XLearning (Qihoo360): https://github.com/Qihoo360/XLearning > - Kubeflow (Google): https://github.com/kubeflow/kubeflow -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org