HyukjinKwon opened a new pull request #34631:
URL: https://github.com/apache/spark/pull/34631
### What changes were proposed in this pull request?
This PR proposes to support `DayTimeIntervalType` in pandas UDF and Arrow
optimization.
- Change the mapping of Arrow's `IntervalType` to `DurationType` for
`DayTimeIntervalType` (migration guide updated for Arrow developer APIs).
- Add a type mapping for other code paths: `numpy.timedelta64` <>
`pyarrow.duration("us")` <> `DayTimeIntervalType`
This PR is dependent on #34614
### Why are the changes needed?
For changing the mapping of Arrow's `Interval` type to `Duration` type for
`DayTimeIntervalType`, please refer to
https://github.com/apache/spark/pull/32340#discussion_r750909587.
`DayTimeIntervalType` is already mapped to the concept of duration instead
of calendar instance: it's is matched to `pyarrow.duration("us")`,
`datetime.timedelta`, and `java.util.Duration`.
### Does this PR introduce _any_ user-facing change?
Yes, after this change, users can use `DayTimeIntervalType` in
`SparkSession.createDataFrame(pandas_df)`, `DataFrame.to_pandas`, and pandas
UDFs:
```python
>>> import datetime
>>> import pandas as pd
>>> from pyspark.sql.functions import pandas_udf
>>>
>>> @pandas_udf("interval day to second")
... def noop(s: pd.Series) -> pd.Series:
... assert s.iloc[0] == datetime.timedelta(microseconds=123)
... return s
...
>>> df = spark.createDataFrame(pd.DataFrame({"a":
[pd.Timedelta(microseconds=123)]}))
>>> df.toPandas()
a
0 0 days 00:00:00.000123
```
### How was this patch tested?
Manually tested, and unittests were added.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]