BryanCutler commented on a change in pull request #33980:
URL: https://github.com/apache/spark/pull/33980#discussion_r718964803



##########
File path: python/pyspark/sql/pandas/types.py
##########
@@ -296,7 +337,34 @@ def _check_series_convert_timestamps_localize(s, 
from_timezone, to_timezone):
         return s
 
 
-def _check_series_convert_timestamps_local_tz(s, timezone):
+def __handle_array_of_timestamps(series, to_tz,  from_tz=None):
+    """
+
+    :param series: Pandas series
+    :param to_tz: to timezone
+    :param from_tz: from time zone
+    :return: return series respecting timezone
+    """
+    from pandas.api.types import is_datetime64tz_dtype, is_datetime64_dtype
+    import pandas as pd
+    from pandas import Series
+    data_after_conversion = []
+    for data in series:

Review comment:
       So this is iterating over each timestamp array in the series, applying 
conversions each time and then building a new series back from a list? That 
seems pretty inefficient and looks to be a lot of specialized conversion going 
on here for just arrays of timestamps.

##########
File path: python/pyspark/sql/pandas/types.py
##########
@@ -253,14 +278,29 @@ def _check_series_convert_timestamps_internal(s, 
timezone):
         # >>> str(tz.localize(t))
         # '2015-11-01 01:30:00-05:00'
         tz = timezone or _get_local_timezone()
-        return s.dt.tz_localize(tz, ambiguous=False).dt.tz_convert('UTC')
+        data = s.dt.tz_localize(tz, ambiguous=False).dt.tz_convert('UTC')
+        return __modified_series(data, is_array)
     elif is_datetime64tz_dtype(s.dtype):
-        return s.dt.tz_convert('UTC')
+        data = s.dt.tz_convert('UTC')
+        return __modified_series(data, is_array)
     else:
-        return s
+        return __modified_series(s, is_array)
+
+
+def __modified_series(data, is_array):
+    """
+    :param data: Converted data
+    :param is_array: If the input data type is type of array
+    :return: If input type is array ,then return series with array of data
+    else return series as it is.
+    """
+    from pandas import Series
+    if is_array:
+        return Series([data])

Review comment:
       Sorry, I don't quite understand what this function is doing, it looks 
like it's making a Series with 1 array?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to