ianmcook commented on code in PR #46529:
URL: https://github.com/apache/spark/pull/46529#discussion_r1608878348
##########
python/pyspark/sql/pandas/types.py:
##########
@@ -232,6 +312,124 @@ def _get_local_timezone() -> str:
return os.environ.get("TZ", "dateutil/:")
+def _check_arrow_table_timestamps_localize(
+ table: "pa.Table", schema: StructType, truncate: bool = True, timezone:
Optional[str] = None
+) -> "pa.Table":
+ """
+ Convert timestamps in a PyArrow Table to timezone-naive in the specified
timezone if the
+ corresponding Spark data type is TimestampType in the specified Spark
schema is TimestampType,
+ and optionally truncate nanosecond timestamps to microseconds.
+
+ Parameters
+ ----------
+ table : :class:`pyarrow.Table`
+ schema : :class:`StructType`
+ The Spark schema corresponding to the schema of the Arrow Table.
+ truncate : bool, default True
+ Whether to truncate nanosecond timestamps to microseconds. (default
``True``)
+ timezone : str, optional
+ The timezone to convert from. If there is a timestamp type, it's
required.
+
+ Returns
+ -------
+ :class:`pyarrow.Table`
+ """
+ import pyarrow as pa
+
+ assert len(table.schema) == len(schema.fields)
+
+ return pa.Table.from_arrays(
+ [
+ _check_arrow_array_timestamps_localize(a, f.dataType, truncate,
timezone)
+ for a, f in zip(table.columns, schema.fields)
+ ],
+ schema=table.schema,
+ )
+
+
+def _check_arrow_array_timestamps_localize(
Review Comment:
I am sorry if it is hard to follow what this function does. It looks for
Arrow Timestamp columns that will be converted to Spark TIMESTAMP_LTZ columns.
It localizes them (`pc.assume_timezone`) and optionally truncates them to
microseconds (`pc.floor_temporal`). If it finds a chunked array, list array,
map array, struct array, or dictionary array, then it recurses into the child
arrays.
cc @jorisvandenbossche in case you want to take a look
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]