Re: [PR] [SPARK-54903][PYTHON] Make to_arrow_schema/to_arrow_type able to set timezone [spark]

via GitHub Mon, 05 Jan 2026 01:51:02 -0800


zhengruifeng commented on code in PR #53678:
URL: https://github.com/apache/spark/pull/53678#discussion_r2660905321



##########
python/pyspark/sql/pandas/types.py:
##########
@@ -115,21 +111,15 @@ def to_arrow_type(
         arrow_type = pa.float64()
     elif type(dt) == DecimalType:
         arrow_type = pa.decimal128(dt.precision, dt.scale)
-    elif type(dt) == StringType and prefers_large_types:
-        arrow_type = pa.large_string()
     elif type(dt) == StringType:
-        arrow_type = pa.string()
-    elif type(dt) == BinaryType and prefers_large_types:
-        arrow_type = pa.large_binary()
+        arrow_type = pa.large_string() if prefers_large_types else pa.string()
     elif type(dt) == BinaryType:
-        arrow_type = pa.binary()
+        arrow_type = pa.large_binary() if prefers_large_types else pa.binary()
     elif type(dt) == DateType:
         arrow_type = pa.date32()
-    elif type(dt) == TimestampType and timestamp_utc:
-        # Timestamps should be in UTC, JVM Arrow timestamps require a timezone 
to be read

Review Comment:
   this comment seems not ture, the pyarrow timestamps always store UTC time as 
the underlying values, but it doesn't require the timezone to be UTC
   
   
   ```
   In [32]: ts1 = datetime.datetime(2026, 1, 5, 15, 0, 1, 
tzinfo=ZoneInfo('Asia/Singapore'))
   
   In [33]: ts2 = datetime.datetime(2026, 1, 5, 16, 0, 1, 
tzinfo=ZoneInfo('Asia/Tokyo'))
   
   In [34]: s1 = pa.scalar(ts1)
   
   In [35]: s2 = pa.scalar(ts2)
   
   In [36]: s1 == s2
   Out[36]: False
   
   In [37]: s1.value
   Out[37]: 1767596401000000
   
   In [38]: s2.value
   Out[38]: 1767596401000000
   
   In [39]: s3 = pa.compute.cast(s1, pa.timestamp('us', tz="UTC"))
   
   In [40]: s3.value
   Out[40]: 1767596401000000
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-54903][PYTHON] Make to_arrow_schema/to_arrow_type able to set timezone [spark]

Reply via email to