[GitHub] [spark] MrPowers commented on pull request #23877: [SPARK-26449][PYTHON] Add transform method to DataFrame API

GitBox Wed, 24 Jun 2020 13:53:25 -0700


MrPowers commented on pull request #23877:
URL: https://github.com/apache/spark/pull/23877#issuecomment-649065369



   Thanks for getting this merged in!  Looks like [my blog 
post](https://medium.com/@mrpowers/chaining-custom-pyspark-transformations-4f38a8c7ae55)
 motivated [this feature 
request](https://issues.apache.org/jira/browse/SPARK-26449).
   
   Can we provide an example that shows how to structure transformations that 
take arguments?  How does this look?
   
   ```python
   from pyspark.sql.functions import col, lit
   
   df = spark.createDataFrame([(1, 1.0), (2, 2.)], ["int", "float"])
   
   def with_funny(word):
       def inner(df):
           return df.withColumn("funny", lit(word))
       return inner
   
   def cast_all_to_int(input_df):
       return input_df.select([col(col_name).cast("int") for col_name in 
input_df.columns])
   
   df.transform(cast_all_to_int).transform(with_funny("bumfuzzle")).show()
   ```
   
   ```
   +---+-----+---------+
   |int|float|    funny|
   +---+-----+---------+
   |  1|    1|bumfuzzle|
   |  2|    2|bumfuzzle|
   +---+-----+---------+
   ```
   
   @HyukjinKwon - do you think this is the best way to structure the 
`with_funny` code?  [This blog 
post](https://mungingdata.com/pyspark/chaining-dataframe-transformations/) 
explains how to structure this transformation with functools.partial and 
cytoolz as well.  Looking forward to figuring out the best way to write PySpark 
transformations and sharing it with the community!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] MrPowers commented on pull request #23877: [SPARK-26449][PYTHON] Add transform method to DataFrame API

Reply via email to