Re: [PR] [SPARK-57574][PANDAS] Support the TIME data type in pandas API on Spark [spark]

via GitHub Sat, 20 Jun 2026 21:50:54 -0700


MaxGekk commented on code in PR #56635:
URL: https://github.com/apache/spark/pull/56635#discussion_r3447921598



##########
python/pyspark/pandas/tests/data_type_ops/test_time_ops.py:
##########
@@ -0,0 +1,179 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+import datetime
+
+import pandas as pd
+
+from pyspark import pandas as ps
+from pyspark.testing.pandasutils import PandasOnSparkTestCase
+from pyspark.pandas.tests.data_type_ops.testing_utils import OpsTestBase
+
+
+class TimeOpsTestsMixin:

Review Comment:
   This suite covers arithmetic rejection and the four ordering comparisons, 
but is missing cases the peer `DateOpsTestsMixin` has:
   - `test_eq` / `test_ne` — eq/ne are inherited and reachable for `TimeType` 
but never exercised here.
   - `test_isnull`.
   - `test_from_to_pandas` — nothing asserts the spark→pandas round-trip of 
actual TIME values (the new `TimeType → object` mapping); the comparison tests 
only assert boolean results.
   - The peer comparison tests also assert that a pandas-Series RHS raises 
`TypeError` (e.g. `self.assertRaises(TypeError, lambda: psdf["this"] == 
pdf["this"])`); worth adding here too.



##########
python/pyspark/pandas/data_type_ops/time_ops.py:
##########
@@ -0,0 +1,77 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from typing import Any, Union
+
+import numpy as np
+from pandas.api.types import CategoricalDtype
+
+from pyspark.sql import Column as PySparkColumn
+from pyspark.sql.types import BooleanType, StringType
+from pyspark.pandas._typing import Dtype, IndexOpsLike, SeriesOrIndex
+from pyspark.pandas.base import column_op
+from pyspark.pandas.data_type_ops.base import (
+    DataTypeOps,
+    _as_categorical_type,
+    _as_other_type,
+    _as_string_type,
+    _sanitize_list_like,
+)
+from pyspark.pandas.typedef import pandas_on_spark_type
+
+
+class TimeOps(DataTypeOps):
+    """
+    The class for binary operations of pandas-on-Spark objects with spark 
type: TimeType.
+    """
+
+    @property
+    def pretty_name(self) -> str:
+        return "times"
+
+    def lt(self, left: IndexOpsLike, right: Any) -> SeriesOrIndex:
+        _sanitize_list_like(right)
+        return column_op(PySparkColumn.__lt__)(left, right)
+
+    def le(self, left: IndexOpsLike, right: Any) -> SeriesOrIndex:
+        _sanitize_list_like(right)
+        return column_op(PySparkColumn.__le__)(left, right)
+
+    def ge(self, left: IndexOpsLike, right: Any) -> SeriesOrIndex:
+        _sanitize_list_like(right)
+        return column_op(PySparkColumn.__ge__)(left, right)
+
+    def gt(self, left: IndexOpsLike, right: Any) -> SeriesOrIndex:
+        _sanitize_list_like(right)
+        return column_op(PySparkColumn.__gt__)(left, right)
+
+    def astype(self, index_ops: IndexOpsLike, dtype: Union[str, type, Dtype]) 
-> IndexOpsLike:

Review Comment:
   `astype` is the only custom (non-inherited) logic in `TimeOps` — categorical 
/ bool / string / other branches — but the suite has no `test_astype`. 
`test_date_ops.py:190` tests `astype(str)`, `astype(bool)`, and a categorical 
cast; please mirror it.
   
   The string branch is the one to watch: `null_str=str(None)` plus Spark 
`CAST(TIME AS STRING)` is exactly where pandas-vs-Spark formatting can diverge 
for sub-second precision (pandas `str(time(.., 500000))` → `"...:00.500000"` vs 
Spark `"...:00.5"`). A `test_astype` would confirm or refute this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-57574][PANDAS] Support the TIME data type in pandas API on Spark [spark]

Reply via email to