[ 
https://issues.apache.org/jira/browse/SUBMARINE-14?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16821136#comment-16821136
 ] 

Szilard Nemeth commented on SUBMARINE-14:
-----------------------------------------

[~sunilg], [~leftnoteasy], [~rreti]: Just a quick update where I stand with the 
implementation: 

I have a 3 machine cluster, running on Ubuntu 16.04.

These are the steps I started to take to test PyTorch on this cluster: 
 # Setup Docker: [https://docs.docker.com/install/linux/docker-ce/ubuntu/]
 # 
Pull pyTorch Dockerfile: 
[https://github.com/pytorch/pytorch/blob/master/docker/pytorch/Dockerfile]
 # 
Build docker image: 
[https://www.howtoforge.com/tutorial/how-to-create-docker-images-with-dockerfile/
]
{code:java}
https://github.com/pytorch/pytorch.git
cd docker/pytorch/
docker build -t pytorch -f docker/pytorch/Dockerfile .
{code}

 # ALTERNATIVELY: Pull existing image with command: 

{code:java}
docker pull pytorch/pytorch{code}
 

 # 
Start up a simple PyTorch example: 
[https://github.com/pytorch/examples/tree/master/time_sequence_prediction]

 

*After all these steps are complete and a PyTorch job is running fine with the 
Docker image, I have the following steps in mind:* 
1. Find out what Launch command and additional environment variables are 
required for PyTorch integration
2. Define a service spec file (JSON format, for YARN Native Service to consume)
3. Start a simple sleep job with YARN Native Service to make a test-run of NS
4. Generate service spec JSON with test code (TensorFlow) in order to check 
what I should generate for PyTorch (as these frameworks are having similar 
nature)
5. Check if TF works file with YAML parser: Test if everything is fine with my 
patch 
 
PyTorch setup seems quite trivial for a single-node approach so this should be 
ready soon if I don't have any roadblocks with YARN Native Service. 
[~sunilg], [~leftnoteasy]: Is there a tutorial out there that describes how to 
use Native Services or should I fallback to the upstream documentation?
 
Thanks!

> [Umbrella] Support using Submarine to submit Pytorch job
> --------------------------------------------------------
>
>                 Key: SUBMARINE-14
>                 URL: https://issues.apache.org/jira/browse/SUBMARINE-14
>             Project: Hadoop Submarine
>          Issue Type: New Feature
>            Reporter: Wangda Tan
>            Assignee: Szilard Nemeth
>            Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to