[GitHub] [spark] ueshin opened a new pull request #35062: [SPARK-37782][SQL][PYTHON] Make DataFrame.transform take the parameters for the function

GitBox Wed, 29 Dec 2021 14:38:24 -0800


ueshin opened a new pull request #35062:
URL: https://github.com/apache/spark/pull/35062



   ### What changes were proposed in this pull request?
   
   Makes `DataFrame.transform` take the parameters for the function.
   
   ### Why are the changes needed?
   
   Currently when a function which takes parameters besides DataFrame is passed 
to `DataFrame.transform`, `lambda` needs to be used.
   
   
   ```py
   >>> from pyspark.sql.functions import col
   >>> df = spark.createDataFrame([(1, 1.0), (2, 2.0)], ["int", "float"])
   >>> def add_n(input_df, n):
   ...     return input_df.select([(col(col_name) + n).alias(col_name)
   ...                             for col_name in input_df.columns])
   >>> df.transform(lambda input_df: add_n(input_df, 1)).transform(lambda 
input_df: add_n(input_df, n=10)).show()
   +---+-----+
   |int|float|
   +---+-----+
   | 12| 12.0|
   | 13| 13.0|
   +---+-----+
   ```
   
   Making `DataFrame.transform` take the parameters would be more convenient.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes, `DataFrame.transform` can take the parameters for the function.
   
   ```py
   >>> df.transform(add_n, 1).transform(add_n, n=10).show()
   +---+-----+
   |int|float|
   +---+-----+
   | 12| 12.0|
   | 13| 13.0|
   +---+-----+
   ```
   
   ### How was this patch tested?
   
   Added the corresponding doctests.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] ueshin opened a new pull request #35062: [SPARK-37782][SQL][PYTHON] Make DataFrame.transform take the parameters for the function

Reply via email to