mathewjacob1002 opened a new pull request, #41770: URL: https://github.com/apache/spark/pull/41770
### What changes were proposed in this pull request? Implemented a distributed learning class meant for deepspeed workloads using the torch.distributed.run command. Also made some tests for some of the functions. Need to add tests for the distributed workloads in the create_torchrun_command. ### Why are the changes needed? Special commands are needed for deepspeed workloads. This class makes it easier to run the deepspeed applications without ever needing to touch the terminal. If a user needs to use the torch.distributed.run launcher, this class will let them do that. This class also has a very similar API and workflow to the TorchDistributor class, where you simply create an instance and then invoke distributor.run(...). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
