HyukjinKwon commented on a change in pull request #33877:
URL: https://github.com/apache/spark/pull/33877#discussion_r703936859
##########
File path: python/pyspark/pandas/data_type_ops/datetime_ops.py
##########
@@ -58,15 +66,18 @@ def sub(self, left: IndexOpsLike, right: Any) ->
SeriesOrIndex:
"The timestamp subtraction returns an integer in seconds, "
"whereas pandas returns 'timedelta64[ns]'."
)
- if isinstance(right, IndexOpsMixin) and
isinstance(right.spark.data_type, TimestampType):
+ if isinstance(right, IndexOpsMixin) and isinstance(
+ right.spark.data_type, (TimestampType, TimestampNTZType)
+ ):
warnings.warn(msg, UserWarning)
return left.astype("long") - right.astype("long")
Review comment:
Do you suggest something like `(right - left).astype("int")`? This won't
work because:
1. interval can't be converted to longs. To natively support this, it
requires internal implementation on PySpark
2. `TimestampNTZ` is considered as unix timestamp in UTC but `TIMESTAMP_LZT
- TIMESTAMP_NZT` or `TIMESTAMP_LZT - TIMESTAMP_NZT` will assume `TIMESTAMP_NZT`
is in local session timezone. e.g.):
```scala
scala> sql("SELECT TIMESTAMP '1970-01-01 00:00:00' - TIMESTAMP_NTZ
'1970-01-01 00:00:00'").show(false)
```
should result in something like `INTERVAL '0 09:00:00' DAY TO SECOND` (I
am in KST) but it result in `INTERVAL '0 00:00:00' DAY TO SECOND`
##########
File path: python/pyspark/pandas/data_type_ops/datetime_ops.py
##########
@@ -58,15 +66,18 @@ def sub(self, left: IndexOpsLike, right: Any) ->
SeriesOrIndex:
"The timestamp subtraction returns an integer in seconds, "
"whereas pandas returns 'timedelta64[ns]'."
)
- if isinstance(right, IndexOpsMixin) and
isinstance(right.spark.data_type, TimestampType):
+ if isinstance(right, IndexOpsMixin) and isinstance(
+ right.spark.data_type, (TimestampType, TimestampNTZType)
+ ):
warnings.warn(msg, UserWarning)
return left.astype("long") - right.astype("long")
Review comment:
Yeah. So this one has to be removed once intervals are implemented in
PySpark. At this moment, we cannot remove this or let Spark SQL to decide it by
implicit cast. Not because Spark SQL does not have the implicit cast on NTZ and
LTZ, but because PySpark doesn't have interval implementation.
Or do you suggest to implement the type coercion between NTZ and LTZ in this
PR, and use something like `(right - left).astype("int")` in this PR?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]