Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/20537#discussion_r166826644
--- Diff: python/pyspark/sql/types.py ---
@@ -1730,7 +1730,28 @@ def _check_series_convert_timestamps_internal(s,
timezone):
# TODO: handle nested timestamps, such as ArrayType(TimestampType())?
if is_datetime64_dtype(s.dtype):
tz = timezone or 'tzlocal()'
- return s.dt.tz_localize(tz).dt.tz_convert('UTC')
+ """
+ tz_localize with ambiguous=False has the same behavior of
pytz.localize
+ >>> import datetime
+ >>> import pandas as pd
+ >>> import pytz
+ >>>
+ >>> t = datetime.datetime(2015, 11, 1, 1, 23, 24)
+ >>> ts = pd.Series([t])
+ >>> tz = pytz.timezone('America/New_York')
+ >>>
+ >>> ts.dt.tz_localize(tz, ambiguous=False)
+ >>> 0 2015-11-01 01:23:24-05:00
+ >>> dtype: datetime64[ns, America/New_York]
+ >>>
+ >>> ts.dt.tz_localize(tz, ambiguous=True)
+ >>> 0 2015-11-01 01:23:24-04:00
+ >>> dtype: datetime64[ns, America/New_York]
+ >>>
+ >>> str(tz.localize(t))
+ >>> '2015-11-01 01:23:24-05:00'
--- End diff --
Hm .. this one seems a bit weird. Shouldn't it be `... '2015-11-01
01:23:24-05:00'`?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]