[ https://issues.apache.org/jira/browse/ARROW-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wes McKinney updated ARROW-1680: -------------------------------- Fix Version/s: 0.8.0 > [Python] Timestamp unit change not done in from_pandas() conversion > ------------------------------------------------------------------- > > Key: ARROW-1680 > URL: https://issues.apache.org/jira/browse/ARROW-1680 > Project: Apache Arrow > Issue Type: Bug > Components: Python > Reporter: Bryan Cutler > Fix For: 0.8.0 > > > When calling {{Array.from_pandas}} with a pandas.Series of timestamps that > have 'ns' unit and specifying a type to coerce to with 'us' causes problems. > When the series has timestamps with a timezone, the unit is ignored. When > the series does not have a timezone, it is applied but causes an > OverflowError when printing. > {noformat} > >>> import pandas as pd > >>> import pyarrow as pa > >>> from datetime import datetime > >>> s = pd.Series([datetime.now()]) > >>> s_nyc = s.dt.tz_localize('tzlocal()').dt.tz_convert('America/New_York') > >>> arr = pa.Array.from_pandas(s_nyc, type=pa.timestamp('us', > >>> tz='America/New_York')) > >>> arr.type > TimestampType(timestamp[ns, tz=America/New_York]) > >>> arr = pa.Array.from_pandas(s, type=pa.timestamp('us')) > >>> arr.type > TimestampType(timestamp[us]) > >>> print(arr) > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > File "pyarrow/array.pxi", line 295, in pyarrow.lib.Array.__repr__ > (/home/bryan/git/arrow/python/build/temp.linux-x86_64-2.7/lib.cxx:26221) > values = array_format(self, window=10) > File "pyarrow/formatting.py", line 28, in array_format > values.append(value_format(x, 0)) > File "pyarrow/formatting.py", line 49, in value_format > return repr(x) > File "pyarrow/scalar.pxi", line 63, in pyarrow.lib.ArrayValue.__repr__ > (/home/bryan/git/arrow/python/build/temp.linux-x86_64-2.7/lib.cxx:19535) > return repr(self.as_py()) > File "pyarrow/scalar.pxi", line 240, in pyarrow.lib.TimestampValue.as_py > (/home/bryan/git/arrow/python/build/temp.linux-x86_64-2.7/lib.cxx:21600) > return converter(value, tzinfo=tzinfo) > File "pyarrow/scalar.pxi", line 204, in pyarrow.lib.lambda5 > (/home/bryan/git/arrow/python/build/temp.linux-x86_64-2.7/lib.cxx:7295) > TimeUnit_MICRO: lambda x, tzinfo: pd.Timestamp( > File "pandas/_libs/tslib.pyx", line 402, in > pandas._libs.tslib.Timestamp.__new__ (pandas/_libs/tslib.c:10051) > File "pandas/_libs/tslib.pyx", line 1467, in > pandas._libs.tslib.convert_to_tsobject (pandas/_libs/tslib.c:27665) > OverflowError: Python int too large to convert to C long > {noformat} > A workaround is to manually change values with astype > {noformat} > >>> arr = pa.Array.from_pandas(s.values.astype('datetime64[us]')) > >>> arr.type > TimestampType(timestamp[us]) > >>> print(arr) > <pyarrow.lib.TimestampArray object at 0x7f6a67e0a3c0> > [ > Timestamp('2017-10-17 11:04:44.308233') > ] > >>> > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)