Yikun opened a new pull request #32431:
URL: https://github.com/apache/spark/pull/32431


   ### What changes were proposed in this pull request?
   This PR added the multiple columns adding support for Spark 
scala/java/python API.
   - Expose `withColumns` as public API in Scala/Java
   - Add `with_columns` in PySpark
   
   There was also some discussion about adding multiple columns in past 
JIRA([SPARK-1225](https://issues.apache.org/jira/browse/SPARK-12225), 
[SPARK-26224](https://issues.apache.org/jira/browse/SPARK-26224)) and 
[ML](http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-Multiple-columns-adding-replacing-support-in-PySpark-DataFrame-API-td31164.html).
   
   ### Why are the changes needed?
   There were a private method `withColumns` can add columns at one pass [1]:
   
https://github.com/apache/spark/blob/b5241c97b17a1139a4ff719bfce7f68aef094d95/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala#L2402
   
   However, it was not exposed as public API in Scala/Java, and also PySpark 
user can only use `withColumn` to add one column or replacing the existing one 
column that has the same name. 
   
   For example, if the PySpark user want to add multiple columns, they should 
call `withColumn` again and again like:
   ```Python
   df.withColumn("key1", col("key1")).withColumn("key2", 
col("key2")).withColumn("key3", col("key3"))
   ```
   After this patch, the user can use the `withColumn` with columns list args 
complete columns adding at one pass:
   ```Python
   df.withColumn(["key1", "key2", "key3"], [col("key1"), col("key2"), 
col("key3")])
   ```
   
   ### Does this PR introduce _any_ user-facing change?
   Yes, this PR exposes `withColumns` as public API, and also adds 
`with_columns` API in PySpark .
   
   
   ### How was this patch tested?
   - Add new multiple columns adding test, passed
   - Existing test, passed


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to