[GitHub] [spark] WeichenXu123 commented on a diff in pull request #39267: [WIP][SPARK-41592][PYTHON][ML] Pytorch file Distributed Training

2023-01-17 Thread GitBox
WeichenXu123 commented on code in PR #39267: URL: https://github.com/apache/spark/pull/39267#discussion_r1072309036 ## python/pyspark/ml/torch/distributor.py: ## @@ -325,8 +329,15 @@ def _create_torchrun_command( torchrun_args = ["--standalone", "--nnodes=1"]

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #39267: [WIP][SPARK-41592][PYTHON][ML] Pytorch file Distributed Training

2023-01-17 Thread GitBox
WeichenXu123 commented on code in PR #39267: URL: https://github.com/apache/spark/pull/39267#discussion_r1072300627 ## python/pyspark/ml/torch/distributor.py: ## @@ -325,8 +329,15 @@ def _create_torchrun_command( torchrun_args = ["--standalone", "--nnodes=1"]

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #39267: [WIP][SPARK-41592][PYTHON][ML] Pytorch file Distributed Training

2023-01-17 Thread GitBox
WeichenXu123 commented on code in PR #39267: URL: https://github.com/apache/spark/pull/39267#discussion_r1072294810 ## python/pyspark/ml/torch/distributor.py: ## @@ -428,6 +432,84 @@ def _run_local_training( return output +def _get_spark_task_program( +

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #39267: [WIP][SPARK-41592][PYTHON][ML] Pytorch file Distributed Training

2023-01-17 Thread GitBox
WeichenXu123 commented on code in PR #39267: URL: https://github.com/apache/spark/pull/39267#discussion_r1072290872 ## python/pyspark/ml/torch/distributor.py: ## @@ -428,6 +432,84 @@ def _run_local_training( return output +def _get_spark_task_program( +

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #39267: [WIP][SPARK-41592][PYTHON][ML] Pytorch file Distributed Training

2023-01-17 Thread GitBox
WeichenXu123 commented on code in PR #39267: URL: https://github.com/apache/spark/pull/39267#discussion_r1072289202 ## python/pyspark/ml/torch/distributor.py: ## @@ -428,6 +432,84 @@ def _run_local_training( return output +def _get_spark_task_program( +

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #39267: [WIP][SPARK-41592][PYTHON][ML] Pytorch file Distributed Training

2023-01-17 Thread GitBox
WeichenXu123 commented on code in PR #39267: URL: https://github.com/apache/spark/pull/39267#discussion_r1072285042 ## python/pyspark/ml/torch/distributor.py: ## @@ -428,6 +432,84 @@ def _run_local_training( return output +def _get_spark_task_program( +

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #39267: [WIP][SPARK-41592][PYTHON][ML] Pytorch file Distributed Training

2023-01-17 Thread GitBox
WeichenXu123 commented on code in PR #39267: URL: https://github.com/apache/spark/pull/39267#discussion_r1072267585 ## python/pyspark/ml/torch/distributor.py: ## @@ -428,6 +432,84 @@ def _run_local_training( return output +def _get_spark_task_program( +

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #39267: [WIP][SPARK-41592][PYTHON][ML] Pytorch file Distributed Training

2023-01-17 Thread GitBox
WeichenXu123 commented on code in PR #39267: URL: https://github.com/apache/spark/pull/39267#discussion_r1072267585 ## python/pyspark/ml/torch/distributor.py: ## @@ -428,6 +432,84 @@ def _run_local_training( return output +def _get_spark_task_program( +

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #39267: [WIP][SPARK-41592][PYTHON][ML] Pytorch file Distributed Training

2023-01-17 Thread GitBox
WeichenXu123 commented on code in PR #39267: URL: https://github.com/apache/spark/pull/39267#discussion_r1072266412 ## python/pyspark/ml/torch/distributor.py: ## @@ -428,6 +432,84 @@ def _run_local_training( return output +def _get_spark_task_program( +