HyukjinKwon opened a new pull request #33876:
URL: https://github.com/apache/spark/pull/33876
### What changes were proposed in this pull request?
This PR proposes to implement `TimestampNTZType` support in PySpark's
`SparkSession.createDataFrame`, `DataFrame.toPandas`, Python UDFs, and pandas
UDFs with and without Arrow.
This PR is dependent on #33875.
### Why are the changes needed?
To complete `TimestampNTZType` support.
### Does this PR introduce _any_ user-facing change?
Yes.
- Users now can use `TimestampNTZType` type in
`SparkSession.createDataFrame`, `DataFrame.toPandas`, Python UDFs, and pandas
UDFs with and without Arrow.
- If `spark.sql.timestampType` is configured to `TIMESTAMP_NTZ`, PySpark
will infer the `datetime` without timezone as `TimestampNTZType`. If it has a
timezone, it will be inferred as `TimestampType` in
`SparkSession.createDataFrame`.
- If `TimestampType` and `TimestampNTZType` conflict during merging
inferred schema, `TimestampType` has a higher precedence.
- If the type is `TimestampNTZType`, treat this internally as UTC (same as
JVM), and avoid localization externally.
### How was this patch tested?
Manually tested and unittests were added.
Closes #33517
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]