HyukjinKwon commented on a change in pull request #33877:
URL: https://github.com/apache/spark/pull/33877#discussion_r703936859
##########
File path: python/pyspark/pandas/data_type_ops/datetime_ops.py
##########
@@ -58,15 +66,18 @@ def sub(self, left: IndexOpsLike, right: Any) ->
SeriesOrIndex:
"The timestamp subtraction returns an integer in seconds, "
"whereas pandas returns 'timedelta64[ns]'."
)
- if isinstance(right, IndexOpsMixin) and
isinstance(right.spark.data_type, TimestampType):
+ if isinstance(right, IndexOpsMixin) and isinstance(
+ right.spark.data_type, (TimestampType, TimestampNTZType)
+ ):
warnings.warn(msg, UserWarning)
return left.astype("long") - right.astype("long")
Review comment:
Do you suggest something like `(right - left).astype("int")`? This won't
work because:
1. interval can't be converted to longs. To natively support this, it
requires internal implementation on PySpark
2. Here in pandas context, `TimestampNTZType` is considered as unix
timestamp in UTC, and `TimestampType` is considered as a local (session) time.
But in Spark SQL `TIMESTAMP_LZT - TIMESTAMP_NZT` or `TIMESTAMP_LZT -
TIMESTAMP_NZT` will assume both are in local session timezone or an unknown
timezone. e.g.):
```scala
scala> sql("SELECT TIMESTAMP '1970-01-01 00:00:00' - TIMESTAMP_NTZ
'1970-01-01 00:00:00'").show(false)
```
should result in something like `INTERVAL '0 09:00:00' DAY TO SECOND` (I
am in KST) but it result in `INTERVAL '0 00:00:00' DAY TO SECOND`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]