Joe Muruganandam created ARROW-5359:
---------------------------------------

             Summary: timestamp_as_object support for pa.Table.to_pandas in 
pyarrow
                 Key: ARROW-5359
                 URL: https://issues.apache.org/jira/browse/ARROW-5359
             Project: Apache Arrow
          Issue Type: Bug
          Components: Python
    Affects Versions: 0.13.0
         Environment: Ubuntu
            Reporter: Joe Muruganandam


Creating ticket for issue reported in 
github([https://github.com/apache/arrow/issues/4284])
h2. pyarrow (Issue with timestamp conversion from arrow to pandas)

pyarrow Table.to_pandas has option date_as_object but does not have similar 
option for timestamp. When a timestamp column in arrow table is converted to 
pandas the target datetype is pd.Timestamp and pd.Timestamp does not handle 
time > 2262-04-11 23:47:16.854775807 and hence in the below scenario the date 
is transformed to incorrect value. Adding timestamp_as_object option in 
pa.Table.to_pandas will help in this scenario.

#Python(3.6.8)

import pandas as pd

import pyarrow as pa

pd.*version*
'0.24.1'

pa.*version*
'0.13.0'

import datetime

df = pd.DataFrame(\{"test_date": 
[datetime.datetime(3000,12,31,12,0),datetime.datetime(3100,12,31,12,0)]})

df
test_date
0 3000-12-31 12:00:00
1 3100-12-31 12:00:00

pa_table = pa.Table.from_pandas(df)

pa_table[0]
Column name='test_date' type=TimestampType(timestamp[us])
[
[
32535172800000000,
35690846400000000
]
]

pa_table.to_pandas()
test_date
0 1831-11-22 12:50:52.580896768
1 1931-11-22 12:50:52.580896768



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to