ueshin opened a new pull request #35062: URL: https://github.com/apache/spark/pull/35062
### What changes were proposed in this pull request? Makes `DataFrame.transform` take the parameters for the function. ### Why are the changes needed? Currently when a function which takes parameters besides DataFrame is passed to `DataFrame.transform`, `lambda` needs to be used. ```py >>> from pyspark.sql.functions import col >>> df = spark.createDataFrame([(1, 1.0), (2, 2.0)], ["int", "float"]) >>> def add_n(input_df, n): ... return input_df.select([(col(col_name) + n).alias(col_name) ... for col_name in input_df.columns]) >>> df.transform(lambda input_df: add_n(input_df, 1)).transform(lambda input_df: add_n(input_df, n=10)).show() +---+-----+ |int|float| +---+-----+ | 12| 12.0| | 13| 13.0| +---+-----+ ``` Making `DataFrame.transform` take the parameters would be more convenient. ### Does this PR introduce _any_ user-facing change? Yes, `DataFrame.transform` can take the parameters for the function. ```py >>> df.transform(add_n, 1).transform(add_n, n=10).show() +---+-----+ |int|float| +---+-----+ | 12| 12.0| | 13| 13.0| +---+-----+ ``` ### How was this patch tested? Added the corresponding doctests. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
