[GitHub] [spark] lu-wang-dl commented on a diff in pull request #41973: [SPARK-44264] Refactoring TorchDistributor To Allow for Custom "run_training_on_file" Function Pointer

via GitHub Fri, 14 Jul 2023 20:19:12 -0700


lu-wang-dl commented on code in PR #41973:
URL: https://github.com/apache/spark/pull/41973#discussion_r1264298751



##########
python/pyspark/ml/torch/distributor.py:
##########
@@ -514,11 +514,54 @@ def _execute_command(
                 f"Command {cmd} failed with return code {task.returncode}. "
                 f"The {last_n_msg} included below: {task_output}"
             )
+    @staticmethod  
+    def _get_output_from_framework_wrapper(framework_wrapper: 
Optional[Callable], input_params: Dict, train_object: Union[Callable, str], 
run_pytorch_file_fn: Optional[Callable], *args, **kwargs) -> Optional[Any]:
+        """
+        This function is meant to get the output from framework wrapper 
function by passing in the correct arguments,
+        depending on the type of train_object.
+
+        Parameters
+        ----------
+        framework_wrapper: Optional[Callable]
+            Function pointer that will be invoked. Can either be the function 
that runs distributed training on
+            files if train_object is a string. Otherwise, it will be the 
function that runs distributed training
+            for functions if the train_object is a Callable
+        input_params: Dict
+            A dictionary that maps parameter to arguments for the command to 
be created.
+        train_object: Union[Callable, str]
+            Either a function to be used for distributed training, or a string 
representing the path of a 
+            file to be run in a distributed fashion.
+        run_pytorch_file_fn: Optional[Callable]
+            The function that will be used to run distributed training of a 
file; mainly used for the 
+            distributed training using a function.
+        *args: Any
+            Extra arguments to be used by framework wrapper.
+        **kwargs: Any
+            Extra keyword args to be used. Not currently supported but kept 
for future improvement.
+        
+        Returns
+        -------
+        Optional[Any]
+            Returns the result of the framework_wrapper

Review Comment:
   Do we expect `framework_wrapper` return anything?



##########
python/pyspark/ml/torch/distributor.py:
##########
@@ -514,11 +514,54 @@ def _execute_command(
                 f"Command {cmd} failed with return code {task.returncode}. "
                 f"The {last_n_msg} included below: {task_output}"
             )
+    @staticmethod  
+    def _get_output_from_framework_wrapper(framework_wrapper: 
Optional[Callable], input_params: Dict, train_object: Union[Callable, str], 
run_pytorch_file_fn: Optional[Callable], *args, **kwargs) -> Optional[Any]:
+        """
+        This function is meant to get the output from framework wrapper 
function by passing in the correct arguments,
+        depending on the type of train_object.
+
+        Parameters
+        ----------
+        framework_wrapper: Optional[Callable]
+            Function pointer that will be invoked. Can either be the function 
that runs distributed training on
+            files if train_object is a string. Otherwise, it will be the 
function that runs distributed training
+            for functions if the train_object is a Callable
+        input_params: Dict
+            A dictionary that maps parameter to arguments for the command to 
be created.
+        train_object: Union[Callable, str]

Review Comment:
   I cannot tell the difference between  `train_object` and `framework_wrapper` 
from the comments.



##########
python/pyspark/ml/torch/distributor.py:
##########
@@ -514,11 +514,54 @@ def _execute_command(
                 f"Command {cmd} failed with return code {task.returncode}. "
                 f"The {last_n_msg} included below: {task_output}"
             )
+    @staticmethod  
+    def _get_output_from_framework_wrapper(framework_wrapper: 
Optional[Callable], input_params: Dict, train_object: Union[Callable, str], 
run_pytorch_file_fn: Optional[Callable], *args, **kwargs) -> Optional[Any]:
+        """
+        This function is meant to get the output from framework wrapper 
function by passing in the correct arguments,
+        depending on the type of train_object.
+
+        Parameters
+        ----------
+        framework_wrapper: Optional[Callable]
+            Function pointer that will be invoked. Can either be the function 
that runs distributed training on

Review Comment:
   User provided function?



##########
python/pyspark/ml/torch/distributor.py:
##########
@@ -514,11 +514,54 @@ def _execute_command(
                 f"Command {cmd} failed with return code {task.returncode}. "
                 f"The {last_n_msg} included below: {task_output}"
             )
+    @staticmethod  
+    def _get_output_from_framework_wrapper(framework_wrapper: 
Optional[Callable], input_params: Dict, train_object: Union[Callable, str], 
run_pytorch_file_fn: Optional[Callable], *args, **kwargs) -> Optional[Any]:
+        """
+        This function is meant to get the output from framework wrapper 
function by passing in the correct arguments,
+        depending on the type of train_object.
+
+        Parameters
+        ----------
+        framework_wrapper: Optional[Callable]
+            Function pointer that will be invoked. Can either be the function 
that runs distributed training on
+            files if train_object is a string. Otherwise, it will be the 
function that runs distributed training
+            for functions if the train_object is a Callable
+        input_params: Dict
+            A dictionary that maps parameter to arguments for the command to 
be created.
+        train_object: Union[Callable, str]
+            Either a function to be used for distributed training, or a string 
representing the path of a 
+            file to be run in a distributed fashion.
+        run_pytorch_file_fn: Optional[Callable]
+            The function that will be used to run distributed training of a 
file; mainly used for the 
+            distributed training using a function.
+        *args: Any
+            Extra arguments to be used by framework wrapper.
+        **kwargs: Any
+            Extra keyword args to be used. Not currently supported but kept 
for future improvement.
+        
+        Returns
+        -------
+        Optional[Any]
+            Returns the result of the framework_wrapper
+        """
+        if not framework_wrapper:
+            raise RuntimeError("In the _get_output_from_framework_wrapper 
function, found a framework wrapper that is not set. framework_wrapper must 
always be a valid function pointer!")

Review Comment:
   You don't need to put the function name in the message. User can see it from 
stacktrace.
   ```suggestion
               raise RuntimeError("`framework_wrapper` is not set. ...")
   ```
   Then if possible, you can add how the user could resolve the issue. 



##########
python/pyspark/ml/torch/distributor.py:
##########
@@ -514,11 +514,54 @@ def _execute_command(
                 f"Command {cmd} failed with return code {task.returncode}. "
                 f"The {last_n_msg} included below: {task_output}"
             )
+    @staticmethod  
+    def _get_output_from_framework_wrapper(framework_wrapper: 
Optional[Callable], input_params: Dict, train_object: Union[Callable, str], 
run_pytorch_file_fn: Optional[Callable], *args, **kwargs) -> Optional[Any]:
+        """
+        This function is meant to get the output from framework wrapper 
function by passing in the correct arguments,
+        depending on the type of train_object.
+
+        Parameters
+        ----------
+        framework_wrapper: Optional[Callable]
+            Function pointer that will be invoked. Can either be the function 
that runs distributed training on

Review Comment:
   Could we add a coment to indicate which one is from the user input?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] lu-wang-dl commented on a diff in pull request #41973: [SPARK-44264] Refactoring TorchDistributor To Allow for Custom "run_training_on_file" Function Pointer

Reply via email to