MrPowers commented on pull request #23877: URL: https://github.com/apache/spark/pull/23877#issuecomment-649065369
Thanks for getting this merged in! Looks like [my blog post](https://medium.com/@mrpowers/chaining-custom-pyspark-transformations-4f38a8c7ae55) motivated [this feature request](https://issues.apache.org/jira/browse/SPARK-26449). Can we provide an example that shows how to structure transformations that take arguments? How does this look? ```python from pyspark.sql.functions import col, lit df = spark.createDataFrame([(1, 1.0), (2, 2.)], ["int", "float"]) def with_funny(word): def inner(df): return df.withColumn("funny", lit(word)) return inner def cast_all_to_int(input_df): return input_df.select([col(col_name).cast("int") for col_name in input_df.columns]) df.transform(cast_all_to_int).transform(with_funny("bumfuzzle")).show() ``` ``` +---+-----+---------+ |int|float| funny| +---+-----+---------+ | 1| 1|bumfuzzle| | 2| 2|bumfuzzle| +---+-----+---------+ ``` @HyukjinKwon - do you think this is the best way to structure the `with_funny` code? [This blog post](https://mungingdata.com/pyspark/chaining-dataframe-transformations/) explains how to structure this transformation with functools.partial and cytoolz as well. Looking forward to figuring out the best way to write PySpark transformations and sharing it with the community! ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
