Dave Challis created ARROW-2429: ----------------------------------- Summary: [Python] Timestamp unit in schema changes when writing to Parquet file then reading back Key: ARROW-2429 URL: https://issues.apache.org/jira/browse/ARROW-2429 Project: Apache Arrow Issue Type: Bug Components: Python Affects Versions: 0.9.0 Environment: Mac OS High Sierra PyArrow 0.9.0 (py36_1) Python Reporter: Dave Challis
When creating an Arrow table from a Pandas DataFrame, the table schema contains a field of type `timestamp[ns]`. When serialising that table to a parquet file and then immediately reading it back, the schema of the table read instead contains a field with type `timestamp[us]`. {code:python} #!/usr/bin/env python import pyarrow as pa import pyarrow.parquet as pq import pandas as pd # create DataFrame with a datetime column df = pd.DataFrame({'created': ['2018-04-04T10:14:14Z']}) df['created'] = pd.to_datetime(df['created']) # create Arrow table from DataFrame table = pa.Table.from_pandas(df, preserve_index=False) # write the table as a parquet file, then read it back again pq.write_table(table, 'foo.parquet') table2 = pq.read_table('foo.parquet') print(table.schema[0]) # pyarrow.Field<created: timestamp[ns]> (nanosecond units) print(table2.schema[0]) # pyarrow.Field<created: timestamp[us]> (microsecond units) {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)