[jira] [Updated] (ARROW-2026) [Python] µs timestamps saved as int64 even if use_deprecated_int96_timestamps=True

2018-11-25 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-2026:

Fix Version/s: 0.12.0

> [Python] µs timestamps saved as int64 even if 
> use_deprecated_int96_timestamps=True
> --
>
> Key: ARROW-2026
> URL: https://issues.apache.org/jira/browse/ARROW-2026
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.8.0
> Environment: OS: Mac OS X 10.13.2
> Python: 3.6.4
> PyArrow: 0.8.0
>Reporter: Diego Argueta
>Priority: Major
>  Labels: parquet, redshift, timestamps
> Fix For: 0.12.0
>
>
> When writing to a Parquet file, if `use_deprecated_int96_timestamps` is True, 
> timestamps are only written as 96-bit integers if the timestamp has 
> nanosecond resolution. This is a problem because Amazon Redshift timestamps 
> only have microsecond resolution but require them to be stored in 96-bit 
> format in Parquet files.
> I'd expect the use_deprecated_int96_timestamps flag to cause _all_ timestamps 
> to be written as 96 bits, regardless of resolution. If this is a deliberate 
> design decision, it'd be immensely helpful if it were explicitly documented 
> as part of the argument.
>  
> To reproduce:
>  
> 1. Create a table with a timestamp having microsecond or millisecond 
> resolution, and save it to a Parquet file. Be sure to set 
> `use_deprecated_int96_timestamps` to True.
>  
> {code:java}
> import datetime
> import pyarrow
> from pyarrow import parquet
> schema = pyarrow.schema([
> pyarrow.field('last_updated', pyarrow.timestamp('us')),
> ])
> data = [
> pyarrow.array([datetime.datetime.now()], pyarrow.timestamp('us')),
> ]
> table = pyarrow.Table.from_arrays(data, ['last_updated'])
> with open('test_file.parquet', 'wb') as fdesc:
> parquet.write_table(table, fdesc,
> use_deprecated_int96_timestamps=True)
> {code}
>  
> 2. Inspect the file. I used parquet-tools:
>  
> {noformat}
> dak@tux ~ $ parquet-tools meta test_file.parquet
> file:         file:/Users/dak/test_file.parquet
> creator:      parquet-cpp version 1.3.2-SNAPSHOT
> file schema:  schema
> 
> last_updated: OPTIONAL INT64 O:TIMESTAMP_MICROS R:0 D:1
> row group 1:  RC:1 TS:76 OFFSET:4
> 
> last_updated:  INT64 SNAPPY DO:4 FPO:28 SZ:76/72/0.95 VC:1 
> ENC:PLAIN,PLAIN_DICTIONARY,RLE{noformat}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2026) [Python] µs timestamps saved as int64 even if use_deprecated_int96_timestamps=True

2018-11-25 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-2026:

Fix Version/s: (was: 0.12.0)

> [Python] µs timestamps saved as int64 even if 
> use_deprecated_int96_timestamps=True
> --
>
> Key: ARROW-2026
> URL: https://issues.apache.org/jira/browse/ARROW-2026
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.8.0
> Environment: OS: Mac OS X 10.13.2
> Python: 3.6.4
> PyArrow: 0.8.0
>Reporter: Diego Argueta
>Priority: Major
>  Labels: parquet, redshift, timestamps
>
> When writing to a Parquet file, if `use_deprecated_int96_timestamps` is True, 
> timestamps are only written as 96-bit integers if the timestamp has 
> nanosecond resolution. This is a problem because Amazon Redshift timestamps 
> only have microsecond resolution but require them to be stored in 96-bit 
> format in Parquet files.
> I'd expect the use_deprecated_int96_timestamps flag to cause _all_ timestamps 
> to be written as 96 bits, regardless of resolution. If this is a deliberate 
> design decision, it'd be immensely helpful if it were explicitly documented 
> as part of the argument.
>  
> To reproduce:
>  
> 1. Create a table with a timestamp having microsecond or millisecond 
> resolution, and save it to a Parquet file. Be sure to set 
> `use_deprecated_int96_timestamps` to True.
>  
> {code:java}
> import datetime
> import pyarrow
> from pyarrow import parquet
> schema = pyarrow.schema([
> pyarrow.field('last_updated', pyarrow.timestamp('us')),
> ])
> data = [
> pyarrow.array([datetime.datetime.now()], pyarrow.timestamp('us')),
> ]
> table = pyarrow.Table.from_arrays(data, ['last_updated'])
> with open('test_file.parquet', 'wb') as fdesc:
> parquet.write_table(table, fdesc,
> use_deprecated_int96_timestamps=True)
> {code}
>  
> 2. Inspect the file. I used parquet-tools:
>  
> {noformat}
> dak@tux ~ $ parquet-tools meta test_file.parquet
> file:         file:/Users/dak/test_file.parquet
> creator:      parquet-cpp version 1.3.2-SNAPSHOT
> file schema:  schema
> 
> last_updated: OPTIONAL INT64 O:TIMESTAMP_MICROS R:0 D:1
> row group 1:  RC:1 TS:76 OFFSET:4
> 
> last_updated:  INT64 SNAPPY DO:4 FPO:28 SZ:76/72/0.95 VC:1 
> ENC:PLAIN,PLAIN_DICTIONARY,RLE{noformat}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2026) [Python] µs timestamps saved as int64 even if use_deprecated_int96_timestamps=True

2018-11-13 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-2026:

Labels: parquet redshift timestamps  (was: redshift timestamps)

> [Python] µs timestamps saved as int64 even if 
> use_deprecated_int96_timestamps=True
> --
>
> Key: ARROW-2026
> URL: https://issues.apache.org/jira/browse/ARROW-2026
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.8.0
> Environment: OS: Mac OS X 10.13.2
> Python: 3.6.4
> PyArrow: 0.8.0
>Reporter: Diego Argueta
>Priority: Major
>  Labels: parquet, redshift, timestamps
> Fix For: 0.12.0
>
>
> When writing to a Parquet file, if `use_deprecated_int96_timestamps` is True, 
> timestamps are only written as 96-bit integers if the timestamp has 
> nanosecond resolution. This is a problem because Amazon Redshift timestamps 
> only have microsecond resolution but require them to be stored in 96-bit 
> format in Parquet files.
> I'd expect the use_deprecated_int96_timestamps flag to cause _all_ timestamps 
> to be written as 96 bits, regardless of resolution. If this is a deliberate 
> design decision, it'd be immensely helpful if it were explicitly documented 
> as part of the argument.
>  
> To reproduce:
>  
> 1. Create a table with a timestamp having microsecond or millisecond 
> resolution, and save it to a Parquet file. Be sure to set 
> `use_deprecated_int96_timestamps` to True.
>  
> {code:java}
> import datetime
> import pyarrow
> from pyarrow import parquet
> schema = pyarrow.schema([
> pyarrow.field('last_updated', pyarrow.timestamp('us')),
> ])
> data = [
> pyarrow.array([datetime.datetime.now()], pyarrow.timestamp('us')),
> ]
> table = pyarrow.Table.from_arrays(data, ['last_updated'])
> with open('test_file.parquet', 'wb') as fdesc:
> parquet.write_table(table, fdesc,
> use_deprecated_int96_timestamps=True)
> {code}
>  
> 2. Inspect the file. I used parquet-tools:
>  
> {noformat}
> dak@tux ~ $ parquet-tools meta test_file.parquet
> file:         file:/Users/dak/test_file.parquet
> creator:      parquet-cpp version 1.3.2-SNAPSHOT
> file schema:  schema
> 
> last_updated: OPTIONAL INT64 O:TIMESTAMP_MICROS R:0 D:1
> row group 1:  RC:1 TS:76 OFFSET:4
> 
> last_updated:  INT64 SNAPPY DO:4 FPO:28 SZ:76/72/0.95 VC:1 
> ENC:PLAIN,PLAIN_DICTIONARY,RLE{noformat}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2026) [Python] µs timestamps saved as int64 even if use_deprecated_int96_timestamps=True

2018-09-15 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-2026:

Fix Version/s: (was: 0.11.0)
   0.12.0

> [Python] µs timestamps saved as int64 even if 
> use_deprecated_int96_timestamps=True
> --
>
> Key: ARROW-2026
> URL: https://issues.apache.org/jira/browse/ARROW-2026
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.8.0
> Environment: OS: Mac OS X 10.13.2
> Python: 3.6.4
> PyArrow: 0.8.0
>Reporter: Diego Argueta
>Priority: Major
>  Labels: redshift, timestamps
> Fix For: 0.12.0
>
>
> When writing to a Parquet file, if `use_deprecated_int96_timestamps` is True, 
> timestamps are only written as 96-bit integers if the timestamp has 
> nanosecond resolution. This is a problem because Amazon Redshift timestamps 
> only have microsecond resolution but require them to be stored in 96-bit 
> format in Parquet files.
> I'd expect the use_deprecated_int96_timestamps flag to cause _all_ timestamps 
> to be written as 96 bits, regardless of resolution. If this is a deliberate 
> design decision, it'd be immensely helpful if it were explicitly documented 
> as part of the argument.
>  
> To reproduce:
>  
> 1. Create a table with a timestamp having microsecond or millisecond 
> resolution, and save it to a Parquet file. Be sure to set 
> `use_deprecated_int96_timestamps` to True.
>  
> {code:java}
> import datetime
> import pyarrow
> from pyarrow import parquet
> schema = pyarrow.schema([
> pyarrow.field('last_updated', pyarrow.timestamp('us')),
> ])
> data = [
> pyarrow.array([datetime.datetime.now()], pyarrow.timestamp('us')),
> ]
> table = pyarrow.Table.from_arrays(data, ['last_updated'])
> with open('test_file.parquet', 'wb') as fdesc:
> parquet.write_table(table, fdesc,
> use_deprecated_int96_timestamps=True)
> {code}
>  
> 2. Inspect the file. I used parquet-tools:
>  
> {noformat}
> dak@tux ~ $ parquet-tools meta test_file.parquet
> file:         file:/Users/dak/test_file.parquet
> creator:      parquet-cpp version 1.3.2-SNAPSHOT
> file schema:  schema
> 
> last_updated: OPTIONAL INT64 O:TIMESTAMP_MICROS R:0 D:1
> row group 1:  RC:1 TS:76 OFFSET:4
> 
> last_updated:  INT64 SNAPPY DO:4 FPO:28 SZ:76/72/0.95 VC:1 
> ENC:PLAIN,PLAIN_DICTIONARY,RLE{noformat}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2026) [Python] µs timestamps saved as int64 even if use_deprecated_int96_timestamps=True

2018-06-29 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-2026:
--
Fix Version/s: (was: 0.10.0)
   0.11.0

> [Python] µs timestamps saved as int64 even if 
> use_deprecated_int96_timestamps=True
> --
>
> Key: ARROW-2026
> URL: https://issues.apache.org/jira/browse/ARROW-2026
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.8.0
> Environment: OS: Mac OS X 10.13.2
> Python: 3.6.4
> PyArrow: 0.8.0
>Reporter: Diego Argueta
>Priority: Major
>  Labels: redshift, timestamps
> Fix For: 0.11.0
>
>
> When writing to a Parquet file, if `use_deprecated_int96_timestamps` is True, 
> timestamps are only written as 96-bit integers if the timestamp has 
> nanosecond resolution. This is a problem because Amazon Redshift timestamps 
> only have microsecond resolution but require them to be stored in 96-bit 
> format in Parquet files.
> I'd expect the use_deprecated_int96_timestamps flag to cause _all_ timestamps 
> to be written as 96 bits, regardless of resolution. If this is a deliberate 
> design decision, it'd be immensely helpful if it were explicitly documented 
> as part of the argument.
>  
> To reproduce:
>  
> 1. Create a table with a timestamp having microsecond or millisecond 
> resolution, and save it to a Parquet file. Be sure to set 
> `use_deprecated_int96_timestamps` to True.
>  
> {code:java}
> import datetime
> import pyarrow
> from pyarrow import parquet
> schema = pyarrow.schema([
> pyarrow.field('last_updated', pyarrow.timestamp('us')),
> ])
> data = [
> pyarrow.array([datetime.datetime.now()], pyarrow.timestamp('us')),
> ]
> table = pyarrow.Table.from_arrays(data, ['last_updated'])
> with open('test_file.parquet', 'wb') as fdesc:
> parquet.write_table(table, fdesc,
> use_deprecated_int96_timestamps=True)
> {code}
>  
> 2. Inspect the file. I used parquet-tools:
>  
> {noformat}
> dak@tux ~ $ parquet-tools meta test_file.parquet
> file:         file:/Users/dak/test_file.parquet
> creator:      parquet-cpp version 1.3.2-SNAPSHOT
> file schema:  schema
> 
> last_updated: OPTIONAL INT64 O:TIMESTAMP_MICROS R:0 D:1
> row group 1:  RC:1 TS:76 OFFSET:4
> 
> last_updated:  INT64 SNAPPY DO:4 FPO:28 SZ:76/72/0.95 VC:1 
> ENC:PLAIN,PLAIN_DICTIONARY,RLE{noformat}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2026) [Python] µs timestamps saved as int64 even if use_deprecated_int96_timestamps=True

2018-03-11 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-2026:

Fix Version/s: (was: 0.9.0)
   0.10.0

> [Python] µs timestamps saved as int64 even if 
> use_deprecated_int96_timestamps=True
> --
>
> Key: ARROW-2026
> URL: https://issues.apache.org/jira/browse/ARROW-2026
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.8.0
> Environment: OS: Mac OS X 10.13.2
> Python: 3.6.4
> PyArrow: 0.8.0
>Reporter: Diego Argueta
>Priority: Major
>  Labels: redshift, timestamps
> Fix For: 0.10.0
>
>
> When writing to a Parquet file, if `use_deprecated_int96_timestamps` is True, 
> timestamps are only written as 96-bit integers if the timestamp has 
> nanosecond resolution. This is a problem because Amazon Redshift timestamps 
> only have microsecond resolution but require them to be stored in 96-bit 
> format in Parquet files.
> I'd expect the use_deprecated_int96_timestamps flag to cause _all_ timestamps 
> to be written as 96 bits, regardless of resolution. If this is a deliberate 
> design decision, it'd be immensely helpful if it were explicitly documented 
> as part of the argument.
>  
> To reproduce:
>  
> 1. Create a table with a timestamp having microsecond or millisecond 
> resolution, and save it to a Parquet file. Be sure to set 
> `use_deprecated_int96_timestamps` to True.
>  
> {code:java}
> import datetime
> import pyarrow
> from pyarrow import parquet
> schema = pyarrow.schema([
> pyarrow.field('last_updated', pyarrow.timestamp('us')),
> ])
> data = [
> pyarrow.array([datetime.datetime.now()], pyarrow.timestamp('us')),
> ]
> table = pyarrow.Table.from_arrays(data, ['last_updated'])
> with open('test_file.parquet', 'wb') as fdesc:
> parquet.write_table(table, fdesc,
> use_deprecated_int96_timestamps=True)
> {code}
>  
> 2. Inspect the file. I used parquet-tools:
>  
> {noformat}
> dak@tux ~ $ parquet-tools meta test_file.parquet
> file:         file:/Users/dak/test_file.parquet
> creator:      parquet-cpp version 1.3.2-SNAPSHOT
> file schema:  schema
> 
> last_updated: OPTIONAL INT64 O:TIMESTAMP_MICROS R:0 D:1
> row group 1:  RC:1 TS:76 OFFSET:4
> 
> last_updated:  INT64 SNAPPY DO:4 FPO:28 SZ:76/72/0.95 VC:1 
> ENC:PLAIN,PLAIN_DICTIONARY,RLE{noformat}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2026) [Python] µs timestamps saved as int64 even if use_deprecated_int96_timestamps=True

2018-01-24 Thread Uwe L. Korn (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn updated ARROW-2026:
---
Fix Version/s: 0.9.0

> [Python] µs timestamps saved as int64 even if 
> use_deprecated_int96_timestamps=True
> --
>
> Key: ARROW-2026
> URL: https://issues.apache.org/jira/browse/ARROW-2026
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.8.0
> Environment: OS: Mac OS X 10.13.2
> Python: 3.6.4
> PyArrow: 0.8.0
>Reporter: Diego Argueta
>Priority: Major
>  Labels: redshift, timestamps
> Fix For: 0.9.0
>
>
> When writing to a Parquet file, if `use_deprecated_int96_timestamps` is True, 
> timestamps are only written as 96-bit integers if the timestamp has 
> nanosecond resolution. This is a problem because Amazon Redshift timestamps 
> only have microsecond resolution but require them to be stored in 96-bit 
> format in Parquet files.
> I'd expect the use_deprecated_int96_timestamps flag to cause _all_ timestamps 
> to be written as 96 bits, regardless of resolution. If this is a deliberate 
> design decision, it'd be immensely helpful if it were explicitly documented 
> as part of the argument.
>  
> To reproduce:
>  
> 1. Create a table with a timestamp having microsecond or millisecond 
> resolution, and save it to a Parquet file. Be sure to set 
> `use_deprecated_int96_timestamps` to True.
>  
> {code:java}
> import datetime
> import pyarrow
> from pyarrow import parquet
> schema = pyarrow.schema([
> pyarrow.field('last_updated', pyarrow.timestamp('us')),
> ])
> data = [
> pyarrow.array([datetime.datetime.now()], pyarrow.timestamp('us')),
> ]
> table = pyarrow.Table.from_arrays(data, ['last_updated'])
> with open('test_file.parquet', 'wb') as fdesc:
> parquet.write_table(table, fdesc,
> use_deprecated_int96_timestamps=True)
> {code}
>  
> 2. Inspect the file. I used parquet-tools:
>  
> {noformat}
> dak@tux ~ $ parquet-tools meta test_file.parquet
> file:         file:/Users/dak/test_file.parquet
> creator:      parquet-cpp version 1.3.2-SNAPSHOT
> file schema:  schema
> 
> last_updated: OPTIONAL INT64 O:TIMESTAMP_MICROS R:0 D:1
> row group 1:  RC:1 TS:76 OFFSET:4
> 
> last_updated:  INT64 SNAPPY DO:4 FPO:28 SZ:76/72/0.95 VC:1 
> ENC:PLAIN,PLAIN_DICTIONARY,RLE{noformat}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2026) [Python] µs timestamps saved as int64 even if use_deprecated_int96_timestamps=True

2018-01-24 Thread Diego Argueta (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Diego Argueta updated ARROW-2026:
-
Summary: [Python] µs timestamps saved as int64 even if 
use_deprecated_int96_timestamps=True  (was: [Python] Timestamps saved as int64 
even if use_deprecated_int96_timestamps=True)

> [Python] µs timestamps saved as int64 even if 
> use_deprecated_int96_timestamps=True
> --
>
> Key: ARROW-2026
> URL: https://issues.apache.org/jira/browse/ARROW-2026
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.8.0
> Environment: OS: Mac OS X 10.13.2
> Python: 3.6.4
> PyArrow: 0.8.0
>Reporter: Diego Argueta
>Priority: Major
>  Labels: redshift, timestamps
>
> When writing to a Parquet file, if `use_deprecated_int96_timestamps` is True, 
> timestamps are only written as 96-bit integers if the timestamp has 
> nanosecond resolution. This is a problem because Amazon Redshift timestamps 
> only have microsecond resolution but require them to be stored in 96-bit 
> format in Parquet files.
> I'd expect the use_deprecated_int96_timestamps flag to cause _all_ timestamps 
> to be written as 96 bits, regardless of resolution. If this is a deliberate 
> design decision, it'd be immensely helpful if it were explicitly documented 
> as part of the argument.
>  
> To reproduce:
>  
> 1. Create a table with a timestamp having microsecond or millisecond 
> resolution, and save it to a Parquet file. Be sure to set 
> `use_deprecated_int96_timestamps` to True.
>  
> {code:java}
> import datetime
> import pyarrow
> from pyarrow import parquet
> schema = pyarrow.schema([
> pyarrow.field('last_updated', pyarrow.timestamp('us')),
> ])
> data = [
> pyarrow.array([datetime.datetime.now()], pyarrow.timestamp('us')),
> ]
> table = pyarrow.Table.from_arrays(data, ['last_updated'])
> with open('test_file.parquet', 'wb') as fdesc:
> parquet.write_table(table, fdesc,
> use_deprecated_int96_timestamps=True)
> {code}
>  
> 2. Inspect the file. I used parquet-tools:
>  
> {noformat}
> dak@tux ~ $ parquet-tools meta test_file.parquet
> file:         file:/Users/dak/test_file.parquet
> creator:      parquet-cpp version 1.3.2-SNAPSHOT
> file schema:  schema
> 
> last_updated: OPTIONAL INT64 O:TIMESTAMP_MICROS R:0 D:1
> row group 1:  RC:1 TS:76 OFFSET:4
> 
> last_updated:  INT64 SNAPPY DO:4 FPO:28 SZ:76/72/0.95 VC:1 
> ENC:PLAIN,PLAIN_DICTIONARY,RLE{noformat}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)