MrPowers commented on pull request #29935: URL: https://github.com/apache/spark/pull/29935#issuecomment-761705061
@zero323 - Adding `CalendarIntervalType` to PySpark is a great idea. [CalendarIntervalType](https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/types/CalendarIntervalType.html) is already in the Scala API and allows for some awesome functionality. Here's the Spark 3.0.1 behavior with Scala: ```scala import java.sql.Date import org.apache.spark.sql.functions._ val df = Seq( (Date.valueOf("2021-01-23"), Date.valueOf("2021-01-21")) ).toDF("date1", "date2") df.withColumn("new_datediff", $"date1" - $"date2").show() //+----------+----------+------------+ //| date1| date2|new_datediff| //+----------+----------+------------+ //|2021-01-23|2021-01-21| 2 days| //+----------+----------+------------+ df.withColumn("new_datediff", $"date1" - $"date2").printSchema() //root // |-- date1: date (nullable = true) // |-- date2: date (nullable = true) // |-- new_datediff: interval (nullable = true) ``` Getting this functionality in PySpark would be a huge win. Let me know if there is anything I can do to help you move this PR forward @zero323! ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
