[jira] [Updated] (ARROW-2026) [Python] µs timestamps saved as int64 even if use_deprecated_int96_timestamps=True
[ https://issues.apache.org/jira/browse/ARROW-2026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-2026: Fix Version/s: 0.12.0 > [Python] µs timestamps saved as int64 even if > use_deprecated_int96_timestamps=True > -- > > Key: ARROW-2026 > URL: https://issues.apache.org/jira/browse/ARROW-2026 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.8.0 > Environment: OS: Mac OS X 10.13.2 > Python: 3.6.4 > PyArrow: 0.8.0 >Reporter: Diego Argueta >Priority: Major > Labels: parquet, redshift, timestamps > Fix For: 0.12.0 > > > When writing to a Parquet file, if `use_deprecated_int96_timestamps` is True, > timestamps are only written as 96-bit integers if the timestamp has > nanosecond resolution. This is a problem because Amazon Redshift timestamps > only have microsecond resolution but require them to be stored in 96-bit > format in Parquet files. > I'd expect the use_deprecated_int96_timestamps flag to cause _all_ timestamps > to be written as 96 bits, regardless of resolution. If this is a deliberate > design decision, it'd be immensely helpful if it were explicitly documented > as part of the argument. > > To reproduce: > > 1. Create a table with a timestamp having microsecond or millisecond > resolution, and save it to a Parquet file. Be sure to set > `use_deprecated_int96_timestamps` to True. > > {code:java} > import datetime > import pyarrow > from pyarrow import parquet > schema = pyarrow.schema([ > pyarrow.field('last_updated', pyarrow.timestamp('us')), > ]) > data = [ > pyarrow.array([datetime.datetime.now()], pyarrow.timestamp('us')), > ] > table = pyarrow.Table.from_arrays(data, ['last_updated']) > with open('test_file.parquet', 'wb') as fdesc: > parquet.write_table(table, fdesc, > use_deprecated_int96_timestamps=True) > {code} > > 2. Inspect the file. I used parquet-tools: > > {noformat} > dak@tux ~ $ parquet-tools meta test_file.parquet > file: file:/Users/dak/test_file.parquet > creator: parquet-cpp version 1.3.2-SNAPSHOT > file schema: schema > > last_updated: OPTIONAL INT64 O:TIMESTAMP_MICROS R:0 D:1 > row group 1: RC:1 TS:76 OFFSET:4 > > last_updated: INT64 SNAPPY DO:4 FPO:28 SZ:76/72/0.95 VC:1 > ENC:PLAIN,PLAIN_DICTIONARY,RLE{noformat} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-2026) [Python] µs timestamps saved as int64 even if use_deprecated_int96_timestamps=True
[ https://issues.apache.org/jira/browse/ARROW-2026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-2026: Fix Version/s: (was: 0.12.0) > [Python] µs timestamps saved as int64 even if > use_deprecated_int96_timestamps=True > -- > > Key: ARROW-2026 > URL: https://issues.apache.org/jira/browse/ARROW-2026 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.8.0 > Environment: OS: Mac OS X 10.13.2 > Python: 3.6.4 > PyArrow: 0.8.0 >Reporter: Diego Argueta >Priority: Major > Labels: parquet, redshift, timestamps > > When writing to a Parquet file, if `use_deprecated_int96_timestamps` is True, > timestamps are only written as 96-bit integers if the timestamp has > nanosecond resolution. This is a problem because Amazon Redshift timestamps > only have microsecond resolution but require them to be stored in 96-bit > format in Parquet files. > I'd expect the use_deprecated_int96_timestamps flag to cause _all_ timestamps > to be written as 96 bits, regardless of resolution. If this is a deliberate > design decision, it'd be immensely helpful if it were explicitly documented > as part of the argument. > > To reproduce: > > 1. Create a table with a timestamp having microsecond or millisecond > resolution, and save it to a Parquet file. Be sure to set > `use_deprecated_int96_timestamps` to True. > > {code:java} > import datetime > import pyarrow > from pyarrow import parquet > schema = pyarrow.schema([ > pyarrow.field('last_updated', pyarrow.timestamp('us')), > ]) > data = [ > pyarrow.array([datetime.datetime.now()], pyarrow.timestamp('us')), > ] > table = pyarrow.Table.from_arrays(data, ['last_updated']) > with open('test_file.parquet', 'wb') as fdesc: > parquet.write_table(table, fdesc, > use_deprecated_int96_timestamps=True) > {code} > > 2. Inspect the file. I used parquet-tools: > > {noformat} > dak@tux ~ $ parquet-tools meta test_file.parquet > file: file:/Users/dak/test_file.parquet > creator: parquet-cpp version 1.3.2-SNAPSHOT > file schema: schema > > last_updated: OPTIONAL INT64 O:TIMESTAMP_MICROS R:0 D:1 > row group 1: RC:1 TS:76 OFFSET:4 > > last_updated: INT64 SNAPPY DO:4 FPO:28 SZ:76/72/0.95 VC:1 > ENC:PLAIN,PLAIN_DICTIONARY,RLE{noformat} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-2026) [Python] µs timestamps saved as int64 even if use_deprecated_int96_timestamps=True
[ https://issues.apache.org/jira/browse/ARROW-2026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-2026: Labels: parquet redshift timestamps (was: redshift timestamps) > [Python] µs timestamps saved as int64 even if > use_deprecated_int96_timestamps=True > -- > > Key: ARROW-2026 > URL: https://issues.apache.org/jira/browse/ARROW-2026 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.8.0 > Environment: OS: Mac OS X 10.13.2 > Python: 3.6.4 > PyArrow: 0.8.0 >Reporter: Diego Argueta >Priority: Major > Labels: parquet, redshift, timestamps > Fix For: 0.12.0 > > > When writing to a Parquet file, if `use_deprecated_int96_timestamps` is True, > timestamps are only written as 96-bit integers if the timestamp has > nanosecond resolution. This is a problem because Amazon Redshift timestamps > only have microsecond resolution but require them to be stored in 96-bit > format in Parquet files. > I'd expect the use_deprecated_int96_timestamps flag to cause _all_ timestamps > to be written as 96 bits, regardless of resolution. If this is a deliberate > design decision, it'd be immensely helpful if it were explicitly documented > as part of the argument. > > To reproduce: > > 1. Create a table with a timestamp having microsecond or millisecond > resolution, and save it to a Parquet file. Be sure to set > `use_deprecated_int96_timestamps` to True. > > {code:java} > import datetime > import pyarrow > from pyarrow import parquet > schema = pyarrow.schema([ > pyarrow.field('last_updated', pyarrow.timestamp('us')), > ]) > data = [ > pyarrow.array([datetime.datetime.now()], pyarrow.timestamp('us')), > ] > table = pyarrow.Table.from_arrays(data, ['last_updated']) > with open('test_file.parquet', 'wb') as fdesc: > parquet.write_table(table, fdesc, > use_deprecated_int96_timestamps=True) > {code} > > 2. Inspect the file. I used parquet-tools: > > {noformat} > dak@tux ~ $ parquet-tools meta test_file.parquet > file: file:/Users/dak/test_file.parquet > creator: parquet-cpp version 1.3.2-SNAPSHOT > file schema: schema > > last_updated: OPTIONAL INT64 O:TIMESTAMP_MICROS R:0 D:1 > row group 1: RC:1 TS:76 OFFSET:4 > > last_updated: INT64 SNAPPY DO:4 FPO:28 SZ:76/72/0.95 VC:1 > ENC:PLAIN,PLAIN_DICTIONARY,RLE{noformat} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-2026) [Python] µs timestamps saved as int64 even if use_deprecated_int96_timestamps=True
[ https://issues.apache.org/jira/browse/ARROW-2026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-2026: Fix Version/s: (was: 0.11.0) 0.12.0 > [Python] µs timestamps saved as int64 even if > use_deprecated_int96_timestamps=True > -- > > Key: ARROW-2026 > URL: https://issues.apache.org/jira/browse/ARROW-2026 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.8.0 > Environment: OS: Mac OS X 10.13.2 > Python: 3.6.4 > PyArrow: 0.8.0 >Reporter: Diego Argueta >Priority: Major > Labels: redshift, timestamps > Fix For: 0.12.0 > > > When writing to a Parquet file, if `use_deprecated_int96_timestamps` is True, > timestamps are only written as 96-bit integers if the timestamp has > nanosecond resolution. This is a problem because Amazon Redshift timestamps > only have microsecond resolution but require them to be stored in 96-bit > format in Parquet files. > I'd expect the use_deprecated_int96_timestamps flag to cause _all_ timestamps > to be written as 96 bits, regardless of resolution. If this is a deliberate > design decision, it'd be immensely helpful if it were explicitly documented > as part of the argument. > > To reproduce: > > 1. Create a table with a timestamp having microsecond or millisecond > resolution, and save it to a Parquet file. Be sure to set > `use_deprecated_int96_timestamps` to True. > > {code:java} > import datetime > import pyarrow > from pyarrow import parquet > schema = pyarrow.schema([ > pyarrow.field('last_updated', pyarrow.timestamp('us')), > ]) > data = [ > pyarrow.array([datetime.datetime.now()], pyarrow.timestamp('us')), > ] > table = pyarrow.Table.from_arrays(data, ['last_updated']) > with open('test_file.parquet', 'wb') as fdesc: > parquet.write_table(table, fdesc, > use_deprecated_int96_timestamps=True) > {code} > > 2. Inspect the file. I used parquet-tools: > > {noformat} > dak@tux ~ $ parquet-tools meta test_file.parquet > file: file:/Users/dak/test_file.parquet > creator: parquet-cpp version 1.3.2-SNAPSHOT > file schema: schema > > last_updated: OPTIONAL INT64 O:TIMESTAMP_MICROS R:0 D:1 > row group 1: RC:1 TS:76 OFFSET:4 > > last_updated: INT64 SNAPPY DO:4 FPO:28 SZ:76/72/0.95 VC:1 > ENC:PLAIN,PLAIN_DICTIONARY,RLE{noformat} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-2026) [Python] µs timestamps saved as int64 even if use_deprecated_int96_timestamps=True
[ https://issues.apache.org/jira/browse/ARROW-2026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-2026: -- Fix Version/s: (was: 0.10.0) 0.11.0 > [Python] µs timestamps saved as int64 even if > use_deprecated_int96_timestamps=True > -- > > Key: ARROW-2026 > URL: https://issues.apache.org/jira/browse/ARROW-2026 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.8.0 > Environment: OS: Mac OS X 10.13.2 > Python: 3.6.4 > PyArrow: 0.8.0 >Reporter: Diego Argueta >Priority: Major > Labels: redshift, timestamps > Fix For: 0.11.0 > > > When writing to a Parquet file, if `use_deprecated_int96_timestamps` is True, > timestamps are only written as 96-bit integers if the timestamp has > nanosecond resolution. This is a problem because Amazon Redshift timestamps > only have microsecond resolution but require them to be stored in 96-bit > format in Parquet files. > I'd expect the use_deprecated_int96_timestamps flag to cause _all_ timestamps > to be written as 96 bits, regardless of resolution. If this is a deliberate > design decision, it'd be immensely helpful if it were explicitly documented > as part of the argument. > > To reproduce: > > 1. Create a table with a timestamp having microsecond or millisecond > resolution, and save it to a Parquet file. Be sure to set > `use_deprecated_int96_timestamps` to True. > > {code:java} > import datetime > import pyarrow > from pyarrow import parquet > schema = pyarrow.schema([ > pyarrow.field('last_updated', pyarrow.timestamp('us')), > ]) > data = [ > pyarrow.array([datetime.datetime.now()], pyarrow.timestamp('us')), > ] > table = pyarrow.Table.from_arrays(data, ['last_updated']) > with open('test_file.parquet', 'wb') as fdesc: > parquet.write_table(table, fdesc, > use_deprecated_int96_timestamps=True) > {code} > > 2. Inspect the file. I used parquet-tools: > > {noformat} > dak@tux ~ $ parquet-tools meta test_file.parquet > file: file:/Users/dak/test_file.parquet > creator: parquet-cpp version 1.3.2-SNAPSHOT > file schema: schema > > last_updated: OPTIONAL INT64 O:TIMESTAMP_MICROS R:0 D:1 > row group 1: RC:1 TS:76 OFFSET:4 > > last_updated: INT64 SNAPPY DO:4 FPO:28 SZ:76/72/0.95 VC:1 > ENC:PLAIN,PLAIN_DICTIONARY,RLE{noformat} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-2026) [Python] µs timestamps saved as int64 even if use_deprecated_int96_timestamps=True
[ https://issues.apache.org/jira/browse/ARROW-2026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-2026: Fix Version/s: (was: 0.9.0) 0.10.0 > [Python] µs timestamps saved as int64 even if > use_deprecated_int96_timestamps=True > -- > > Key: ARROW-2026 > URL: https://issues.apache.org/jira/browse/ARROW-2026 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.8.0 > Environment: OS: Mac OS X 10.13.2 > Python: 3.6.4 > PyArrow: 0.8.0 >Reporter: Diego Argueta >Priority: Major > Labels: redshift, timestamps > Fix For: 0.10.0 > > > When writing to a Parquet file, if `use_deprecated_int96_timestamps` is True, > timestamps are only written as 96-bit integers if the timestamp has > nanosecond resolution. This is a problem because Amazon Redshift timestamps > only have microsecond resolution but require them to be stored in 96-bit > format in Parquet files. > I'd expect the use_deprecated_int96_timestamps flag to cause _all_ timestamps > to be written as 96 bits, regardless of resolution. If this is a deliberate > design decision, it'd be immensely helpful if it were explicitly documented > as part of the argument. > > To reproduce: > > 1. Create a table with a timestamp having microsecond or millisecond > resolution, and save it to a Parquet file. Be sure to set > `use_deprecated_int96_timestamps` to True. > > {code:java} > import datetime > import pyarrow > from pyarrow import parquet > schema = pyarrow.schema([ > pyarrow.field('last_updated', pyarrow.timestamp('us')), > ]) > data = [ > pyarrow.array([datetime.datetime.now()], pyarrow.timestamp('us')), > ] > table = pyarrow.Table.from_arrays(data, ['last_updated']) > with open('test_file.parquet', 'wb') as fdesc: > parquet.write_table(table, fdesc, > use_deprecated_int96_timestamps=True) > {code} > > 2. Inspect the file. I used parquet-tools: > > {noformat} > dak@tux ~ $ parquet-tools meta test_file.parquet > file: file:/Users/dak/test_file.parquet > creator: parquet-cpp version 1.3.2-SNAPSHOT > file schema: schema > > last_updated: OPTIONAL INT64 O:TIMESTAMP_MICROS R:0 D:1 > row group 1: RC:1 TS:76 OFFSET:4 > > last_updated: INT64 SNAPPY DO:4 FPO:28 SZ:76/72/0.95 VC:1 > ENC:PLAIN,PLAIN_DICTIONARY,RLE{noformat} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-2026) [Python] µs timestamps saved as int64 even if use_deprecated_int96_timestamps=True
[ https://issues.apache.org/jira/browse/ARROW-2026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe L. Korn updated ARROW-2026: --- Fix Version/s: 0.9.0 > [Python] µs timestamps saved as int64 even if > use_deprecated_int96_timestamps=True > -- > > Key: ARROW-2026 > URL: https://issues.apache.org/jira/browse/ARROW-2026 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.8.0 > Environment: OS: Mac OS X 10.13.2 > Python: 3.6.4 > PyArrow: 0.8.0 >Reporter: Diego Argueta >Priority: Major > Labels: redshift, timestamps > Fix For: 0.9.0 > > > When writing to a Parquet file, if `use_deprecated_int96_timestamps` is True, > timestamps are only written as 96-bit integers if the timestamp has > nanosecond resolution. This is a problem because Amazon Redshift timestamps > only have microsecond resolution but require them to be stored in 96-bit > format in Parquet files. > I'd expect the use_deprecated_int96_timestamps flag to cause _all_ timestamps > to be written as 96 bits, regardless of resolution. If this is a deliberate > design decision, it'd be immensely helpful if it were explicitly documented > as part of the argument. > > To reproduce: > > 1. Create a table with a timestamp having microsecond or millisecond > resolution, and save it to a Parquet file. Be sure to set > `use_deprecated_int96_timestamps` to True. > > {code:java} > import datetime > import pyarrow > from pyarrow import parquet > schema = pyarrow.schema([ > pyarrow.field('last_updated', pyarrow.timestamp('us')), > ]) > data = [ > pyarrow.array([datetime.datetime.now()], pyarrow.timestamp('us')), > ] > table = pyarrow.Table.from_arrays(data, ['last_updated']) > with open('test_file.parquet', 'wb') as fdesc: > parquet.write_table(table, fdesc, > use_deprecated_int96_timestamps=True) > {code} > > 2. Inspect the file. I used parquet-tools: > > {noformat} > dak@tux ~ $ parquet-tools meta test_file.parquet > file: file:/Users/dak/test_file.parquet > creator: parquet-cpp version 1.3.2-SNAPSHOT > file schema: schema > > last_updated: OPTIONAL INT64 O:TIMESTAMP_MICROS R:0 D:1 > row group 1: RC:1 TS:76 OFFSET:4 > > last_updated: INT64 SNAPPY DO:4 FPO:28 SZ:76/72/0.95 VC:1 > ENC:PLAIN,PLAIN_DICTIONARY,RLE{noformat} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-2026) [Python] µs timestamps saved as int64 even if use_deprecated_int96_timestamps=True
[ https://issues.apache.org/jira/browse/ARROW-2026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Diego Argueta updated ARROW-2026: - Summary: [Python] µs timestamps saved as int64 even if use_deprecated_int96_timestamps=True (was: [Python] Timestamps saved as int64 even if use_deprecated_int96_timestamps=True) > [Python] µs timestamps saved as int64 even if > use_deprecated_int96_timestamps=True > -- > > Key: ARROW-2026 > URL: https://issues.apache.org/jira/browse/ARROW-2026 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.8.0 > Environment: OS: Mac OS X 10.13.2 > Python: 3.6.4 > PyArrow: 0.8.0 >Reporter: Diego Argueta >Priority: Major > Labels: redshift, timestamps > > When writing to a Parquet file, if `use_deprecated_int96_timestamps` is True, > timestamps are only written as 96-bit integers if the timestamp has > nanosecond resolution. This is a problem because Amazon Redshift timestamps > only have microsecond resolution but require them to be stored in 96-bit > format in Parquet files. > I'd expect the use_deprecated_int96_timestamps flag to cause _all_ timestamps > to be written as 96 bits, regardless of resolution. If this is a deliberate > design decision, it'd be immensely helpful if it were explicitly documented > as part of the argument. > > To reproduce: > > 1. Create a table with a timestamp having microsecond or millisecond > resolution, and save it to a Parquet file. Be sure to set > `use_deprecated_int96_timestamps` to True. > > {code:java} > import datetime > import pyarrow > from pyarrow import parquet > schema = pyarrow.schema([ > pyarrow.field('last_updated', pyarrow.timestamp('us')), > ]) > data = [ > pyarrow.array([datetime.datetime.now()], pyarrow.timestamp('us')), > ] > table = pyarrow.Table.from_arrays(data, ['last_updated']) > with open('test_file.parquet', 'wb') as fdesc: > parquet.write_table(table, fdesc, > use_deprecated_int96_timestamps=True) > {code} > > 2. Inspect the file. I used parquet-tools: > > {noformat} > dak@tux ~ $ parquet-tools meta test_file.parquet > file: file:/Users/dak/test_file.parquet > creator: parquet-cpp version 1.3.2-SNAPSHOT > file schema: schema > > last_updated: OPTIONAL INT64 O:TIMESTAMP_MICROS R:0 D:1 > row group 1: RC:1 TS:76 OFFSET:4 > > last_updated: INT64 SNAPPY DO:4 FPO:28 SZ:76/72/0.95 VC:1 > ENC:PLAIN,PLAIN_DICTIONARY,RLE{noformat} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)