[jira] [Updated] (ARROW-8657) [Python][C++][Parquet] Forward compatibility issue from 0.16 to 0.17 when using version='2.0'

2020-05-03 Thread Wes McKinney (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-8657:

Fix Version/s: 1.0.0

> [Python][C++][Parquet] Forward compatibility issue from 0.16 to 0.17 when 
> using version='2.0'
> -
>
> Key: ARROW-8657
> URL: https://issues.apache.org/jira/browse/ARROW-8657
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Python
>Affects Versions: 0.17.0
>Reporter: Pierre Belzile
>Assignee: Micah Kornfield
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0, 0.17.1
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> With the recent release of 0.17, the ParquetVersion is used to define the 
> logical type interpretation of fields and the selection of the DataPage 
> format.
> As a result all parquet files that were created with ParquetVersion::V2 to 
> get features such as unsigned int32s, timestamps with nanosecond resolution, 
> etc are not forward compatible (cannot be read with 0.16.0). That's TBs of 
> data in my case.
> Those two concerns should be separated. Given that that DataPageV2 pages were 
> not written prior to 0.17 and in order to allow reading existing files, the 
> existing version property should continue to operate as in 0.16 and inform 
> the logical type mapping.
> Some consideration should be given to issue a release 0.17.1.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8657) [Python][C++][Parquet] Forward compatibility issue from 0.16 to 0.17 when using version='2.0'

2020-05-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-8657:
--
Labels: pull-request-available  (was: )

> [Python][C++][Parquet] Forward compatibility issue from 0.16 to 0.17 when 
> using version='2.0'
> -
>
> Key: ARROW-8657
> URL: https://issues.apache.org/jira/browse/ARROW-8657
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Python
>Affects Versions: 0.17.0
>Reporter: Pierre Belzile
>Assignee: Micah Kornfield
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.17.1
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> With the recent release of 0.17, the ParquetVersion is used to define the 
> logical type interpretation of fields and the selection of the DataPage 
> format.
> As a result all parquet files that were created with ParquetVersion::V2 to 
> get features such as unsigned int32s, timestamps with nanosecond resolution, 
> etc are not forward compatible (cannot be read with 0.16.0). That's TBs of 
> data in my case.
> Those two concerns should be separated. Given that that DataPageV2 pages were 
> not written prior to 0.17 and in order to allow reading existing files, the 
> existing version property should continue to operate as in 0.16 and inform 
> the logical type mapping.
> Some consideration should be given to issue a release 0.17.1.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8657) [Python][C++][Parquet] Forward compatibility issue from 0.16 to 0.17 when using version='2.0'

2020-04-30 Thread Wes McKinney (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-8657:

Description: 
With the recent release of 0.17, the ParquetVersion is used to define the 
logical type interpretation of fields and the selection of the DataPage format.

As a result all parquet files that were created with ParquetVersion::V2 to get 
features such as unsigned int32s, timestamps with nanosecond resolution, etc 
are not forward compatible (cannot be read with 0.16.0). That's TBs of data in 
my case.

Those two concerns should be separated. Given that that DataPageV2 pages were 
not written prior to 0.17 and in order to allow reading existing files, the 
existing version property should continue to operate as in 0.16 and inform the 
logical type mapping.

Some consideration should be given to issue a release 0.17.1.

 

  was:
With the recent release of 0.17, the ParquetVersion is used to define the 
logical type interpretation of fields and the selection of the DataPage format.

As a result all parquet files that were created with ParquetVersion::V2 to get 
features such as unsigned int32s, timestamps with nanosecond resolution, etc 
are now unreadable. That's TBs of data in my case.

Those two concerns should be separated. Given that that DataPageV2 pages were 
not written prior to 0.17 and in order to allow reading existing files, the 
existing version property should continue to operate as in 0.16 and inform the 
logical type mapping.

Some consideration should be given to issue a release 0.17.1.

 


> [Python][C++][Parquet] Forward compatibility issue from 0.16 to 0.17 when 
> using version='2.0'
> -
>
> Key: ARROW-8657
> URL: https://issues.apache.org/jira/browse/ARROW-8657
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Python
>Affects Versions: 0.17.0
>Reporter: Pierre Belzile
>Priority: Major
> Fix For: 0.17.1
>
>
> With the recent release of 0.17, the ParquetVersion is used to define the 
> logical type interpretation of fields and the selection of the DataPage 
> format.
> As a result all parquet files that were created with ParquetVersion::V2 to 
> get features such as unsigned int32s, timestamps with nanosecond resolution, 
> etc are not forward compatible (cannot be read with 0.16.0). That's TBs of 
> data in my case.
> Those two concerns should be separated. Given that that DataPageV2 pages were 
> not written prior to 0.17 and in order to allow reading existing files, the 
> existing version property should continue to operate as in 0.16 and inform 
> the logical type mapping.
> Some consideration should be given to issue a release 0.17.1.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8657) [Python][C++][Parquet] Forward compatibility issue from 0.16 to 0.17 when using version='2.0'

2020-04-30 Thread Wes McKinney (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-8657:

Fix Version/s: 0.17.1

> [Python][C++][Parquet] Forward compatibility issue from 0.16 to 0.17 when 
> using version='2.0'
> -
>
> Key: ARROW-8657
> URL: https://issues.apache.org/jira/browse/ARROW-8657
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Python
>Affects Versions: 0.17.0
>Reporter: Pierre Belzile
>Priority: Major
> Fix For: 0.17.1
>
>
> With the recent release of 0.17, the ParquetVersion is used to define the 
> logical type interpretation of fields and the selection of the DataPage 
> format.
> As a result all parquet files that were created with ParquetVersion::V2 to 
> get features such as unsigned int32s, timestamps with nanosecond resolution, 
> etc are now unreadable. That's TBs of data in my case.
> Those two concerns should be separated. Given that that DataPageV2 pages were 
> not written prior to 0.17 and in order to allow reading existing files, the 
> existing version property should continue to operate as in 0.16 and inform 
> the logical type mapping.
> Some consideration should be given to issue a release 0.17.1.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8657) [Python][C++][Parquet] Forward compatibility issue from 0.16 to 0.17 when using version='2.0'

2020-04-30 Thread Wes McKinney (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-8657:

Summary: [Python][C++][Parquet] Forward compatibility issue from 0.16 to 
0.17 when using version='2.0'  (was: Distinguish parquet version 2 logical type 
vs DataPageV2)

> [Python][C++][Parquet] Forward compatibility issue from 0.16 to 0.17 when 
> using version='2.0'
> -
>
> Key: ARROW-8657
> URL: https://issues.apache.org/jira/browse/ARROW-8657
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Python
>Affects Versions: 0.17.0
>Reporter: Pierre Belzile
>Priority: Major
>
> With the recent release of 0.17, the ParquetVersion is used to define the 
> logical type interpretation of fields and the selection of the DataPage 
> format.
> As a result all parquet files that were created with ParquetVersion::V2 to 
> get features such as unsigned int32s, timestamps with nanosecond resolution, 
> etc are now unreadable. That's TBs of data in my case.
> Those two concerns should be separated. Given that that DataPageV2 pages were 
> not written prior to 0.17 and in order to allow reading existing files, the 
> existing version property should continue to operate as in 0.16 and inform 
> the logical type mapping.
> Some consideration should be given to issue a release 0.17.1.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)