HyukjinKwon opened a new pull request #34631:
URL: https://github.com/apache/spark/pull/34631


   ### What changes were proposed in this pull request?
   
   This PR proposes to support `DayTimeIntervalType` in pandas UDF and Arrow 
optimization. 
   
   - Change the mapping of Arrow's  `IntervalType` to `DurationType` for 
`DayTimeIntervalType` (migration guide updated for Arrow developer APIs).
   - Add a type mapping for other code paths: `numpy.timedelta64` <> 
`pyarrow.duration("us")` <> `DayTimeIntervalType`
   
   This PR is dependent on #34614
   
   ### Why are the changes needed?
   
   For changing the mapping of Arrow's  `Interval` type to `Duration` type for 
`DayTimeIntervalType`, please refer to 
https://github.com/apache/spark/pull/32340#discussion_r750909587.
   
   `DayTimeIntervalType` is already mapped to the concept of duration instead 
of calendar instance: it's is matched to `pyarrow.duration("us")`, 
`datetime.timedelta`, and `java.util.Duration`.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes, after this change, users can use `DayTimeIntervalType` in 
`SparkSession.createDataFrame(pandas_df)`, `DataFrame.to_pandas`, and pandas 
UDFs:
   
   ```python
   >>> import datetime
   >>> import pandas as pd
   >>> from pyspark.sql.functions import pandas_udf
   >>>
   >>> @pandas_udf("interval day to second")
   ... def noop(s: pd.Series) -> pd.Series:
   ...     assert s.iloc[0] == datetime.timedelta(microseconds=123)
   ...     return s
   ...
   >>> df = spark.createDataFrame(pd.DataFrame({"a": 
[pd.Timedelta(microseconds=123)]}))
   >>> df.toPandas()
                          a
   0 0 days 00:00:00.000123
   ```
   
   ### How was this patch tested?
   
   Manually tested, and unittests were added.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to