Yikun opened a new pull request #32276: URL: https://github.com/apache/spark/pull/32276
### What changes were proposed in this pull request? This PR added the multiple columns adding support for PySpark.dataframe.withColumn. ### Why are the changes needed? Now, the spark `withColumn` can add columns at one pass [1]: https://github.com/apache/spark/blob/b5241c97b17a1139a4ff719bfce7f68aef094d95/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala#L2396 but the PySpark user can only use `withColumn` to add one column or replacing the existing one column that has the same name. For example, if the PySpark user want to add multiple columns, they should call `withColumn` again and again like: ```Python self.df.withColumn("key1", col("key1")).withColumn("key2", col("key2")).withColumn("key3", col("key3")) ``` After this patch, the user can use the `withColumn` with columns list args complete columns adding at one pass: ```Python self.df.withColumn(["key1", "key2", "key3"], [col("key1"), col("key2"), col("key3")]) ``` ### Does this PR introduce _any_ user-facing change? Yes, the input types of withColumn are changed, the PySpark can use withColumn to add multiple columns directly. ### How was this patch tested? - Add new multiple columns adding test, passed - Existing test, passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
