[GitHub] [spark] harupy commented on a diff in pull request #42887: [SPARK-45130][CONNECT][ML][PYTHON] Avoid Spark connect ML model to change input pandas dataframe

via GitHub Wed, 13 Sep 2023 02:57:50 -0700


harupy commented on code in PR #42887:
URL: https://github.com/apache/spark/pull/42887#discussion_r1324276184



##########
python/pyspark/ml/connect/base.py:
##########
@@ -154,8 +154,9 @@ def transform(
         The dataset can be either pandas dataframe or spark dataframe,
         if it is a spark DataFrame, the result of transformation is a new 
spark DataFrame
         that contains all existing columns and output columns with names.
-        if it is a pandas DataFrame, the input pandas dataframe is appended 
with output
-        columns in place.
+        if it is a pandas DataFrame, the input pandas dataframe is intact, and 
the output
+        dataframe shallow copies all existing columns from input dataframe and 
appends
+        output columns.

Review Comment:
   ```suggestion
           If it is a pandas DataFrame, the result of transformation is a 
shallow copy
           of the input pandas dataframe with output columns with names.
   ```
   
   to be consistent?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] harupy commented on a diff in pull request #42887: [SPARK-45130][CONNECT][ML][PYTHON] Avoid Spark connect ML model to change input pandas dataframe

Reply via email to