[GitHub] [spark] HyukjinKwon commented on a change in pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

GitBox Tue, 07 Sep 2021 17:42:21 -0700


HyukjinKwon commented on a change in pull request #33877:
URL: https://github.com/apache/spark/pull/33877#discussion_r703936859




##########
File path: python/pyspark/pandas/data_type_ops/datetime_ops.py
##########
@@ -58,15 +66,18 @@ def sub(self, left: IndexOpsLike, right: Any) -> 
SeriesOrIndex:
             "The timestamp subtraction returns an integer in seconds, "
             "whereas pandas returns 'timedelta64[ns]'."
         )
-        if isinstance(right, IndexOpsMixin) and 
isinstance(right.spark.data_type, TimestampType):
+        if isinstance(right, IndexOpsMixin) and isinstance(
+            right.spark.data_type, (TimestampType, TimestampNTZType)
+        ):
             warnings.warn(msg, UserWarning)
             return left.astype("long") - right.astype("long")

Review comment:
       Do you suggest something like `(right - left).astype("int")`? This won't 
work because:
   1. interval can't be converted to longs. To natively support this, it 
requires internal implementation on PySpark
   2. Here in pandas context, `TimestampNTZType` is considered as unix 
timestamp in UTC, and `TimestampType` is considered as a local (session) time. 
But in Spark SQL `TIMESTAMP_LZT - TIMESTAMP_NZT` or `TIMESTAMP_LZT - 
TIMESTAMP_NZT` will assume both are in local session timezone or an unknown 
timezone. e.g.):
       ```scala
       scala> sql("SELECT TIMESTAMP '1970-01-01 00:00:00' - TIMESTAMP_NTZ 
'1970-01-01 00:00:00'").show(false)
       ```
       should result in something like `INTERVAL '0 09:00:00' DAY TO SECOND` (I 
am in KST) but it result in `INTERVAL '0 00:00:00' DAY TO SECOND`
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] HyukjinKwon commented on a change in pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Reply via email to