mathewjacob1002 opened a new pull request, #42067:
URL: https://github.com/apache/spark/pull/42067

   ### What changes were proposed in this pull request?
   Made the DeepspeedTorchDistributor run() method use the _run() function as 
the backbone.
   ### Why are the changes needed?
   It allows the user to run distributed training of a function with deepspeed 
easily. 
   
   
   ### Does this PR introduce _any_ user-facing change?
   This adds the ability for the user to pass in a function as the train_object 
when calling DeepspeedTorchDistributor.run(). The user must have all necessary 
imports within the function itself, and the function must be picklable. An 
example use case can be found in the python file linked in the JIRA ticket. 
   
   
   ### How was this patch tested?
   Notebook/file linked in the JIRA ticket. Formal e2e tests will come in 
future PR.
   
   ### Next Steps/Timeline
   
   - [ ] Add more e2e tests for both running a regular pytorch file and running 
a function for training
   - [ ] Write more documentation
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to