[GitHub] [spark] Yikun opened a new pull request #32276: [SPARK-35173][PYTHON][SQL] Add multiple columns adding support for PySpark

GitBox Wed, 21 Apr 2021 07:23:50 -0700


Yikun opened a new pull request #32276:
URL: https://github.com/apache/spark/pull/32276



   ### What changes were proposed in this pull request?
   This PR added the multiple columns adding support for 
PySpark.dataframe.withColumn.
   
   ### Why are the changes needed?
   Now, the spark `withColumn` can add columns at one pass [1]:
    
https://github.com/apache/spark/blob/b5241c97b17a1139a4ff719bfce7f68aef094d95/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala#L2396
   but the PySpark user can only use `withColumn` to add one column or 
replacing the existing one column that has the same name. 
   
   For example, if the PySpark user want to add multiple columns, they should 
call `withColumn` again and again like:
   ```Python
   self.df.withColumn("key1", col("key1")).withColumn("key2", 
col("key2")).withColumn("key3", col("key3"))
   ```
   After this patch, the user can use the `withColumn` with columns list args 
complete columns adding at one pass:
   ```Python
   self.df.withColumn(["key1", "key2", "key3"], [col("key1"), col("key2"), 
col("key3")])
   ```
   
   ### Does this PR introduce _any_ user-facing change?
   Yes, the input types of withColumn are changed, the PySpark can use 
withColumn to add multiple columns directly.
   
   
   ### How was this patch tested?
   - Add new multiple columns adding test, passed
   - Existing test, passed


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] Yikun opened a new pull request #32276: [SPARK-35173][PYTHON][SQL] Add multiple columns adding support for PySpark

Reply via email to