[
https://issues.apache.org/jira/browse/SUBMARINE-14?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16821136#comment-16821136
]
Szilard Nemeth commented on SUBMARINE-14:
-----------------------------------------
[~sunilg], [~leftnoteasy], [~rreti]: Just a quick update where I stand with the
implementation:
I have a 3 machine cluster, running on Ubuntu 16.04.
These are the steps I started to take to test PyTorch on this cluster:
# Setup Docker: [https://docs.docker.com/install/linux/docker-ce/ubuntu/]
#
Pull pyTorch Dockerfile:
[https://github.com/pytorch/pytorch/blob/master/docker/pytorch/Dockerfile]
#
Build docker image:
[https://www.howtoforge.com/tutorial/how-to-create-docker-images-with-dockerfile/
]
{code:java}
https://github.com/pytorch/pytorch.git
cd docker/pytorch/
docker build -t pytorch -f docker/pytorch/Dockerfile .
{code}
# ALTERNATIVELY: Pull existing image with command:
{code:java}
docker pull pytorch/pytorch{code}
#
Start up a simple PyTorch example:
[https://github.com/pytorch/examples/tree/master/time_sequence_prediction]
*After all these steps are complete and a PyTorch job is running fine with the
Docker image, I have the following steps in mind:*
1. Find out what Launch command and additional environment variables are
required for PyTorch integration
2. Define a service spec file (JSON format, for YARN Native Service to consume)
3. Start a simple sleep job with YARN Native Service to make a test-run of NS
4. Generate service spec JSON with test code (TensorFlow) in order to check
what I should generate for PyTorch (as these frameworks are having similar
nature)
5. Check if TF works file with YAML parser: Test if everything is fine with my
patch
PyTorch setup seems quite trivial for a single-node approach so this should be
ready soon if I don't have any roadblocks with YARN Native Service.
[~sunilg], [~leftnoteasy]: Is there a tutorial out there that describes how to
use Native Services or should I fallback to the upstream documentation?
Thanks!
> [Umbrella] Support using Submarine to submit Pytorch job
> --------------------------------------------------------
>
> Key: SUBMARINE-14
> URL: https://issues.apache.org/jira/browse/SUBMARINE-14
> Project: Hadoop Submarine
> Issue Type: New Feature
> Reporter: Wangda Tan
> Assignee: Szilard Nemeth
> Priority: Major
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)