[jira] [Created] (PARQUET-1620) Schema creation from another schema will not be possible - deprecated

2019-07-10 Thread Werner Daehn (JIRA)
Werner Daehn created PARQUET-1620: - Summary: Schema creation from another schema will not be possible - deprecated Key: PARQUET-1620 URL: https://issues.apache.org/jira/browse/PARQUET-1620 Project:

Re: Forward compatibility issues with TIMESTAMP_MILLIS/MICROS ConvertedType

2019-07-10 Thread TP Boudreau
Sorry for the quick self-reply, but after brief reflection I think two changes to my alternative proposal are required: 1. The proposed new field should be a parameter to the TimestampType, not FileMetaData -- file level adds unnecessary complication / opportunities for mischief. 2. Although

Re: Forward compatibility issues with TIMESTAMP_MILLIS/MICROS ConvertedType

2019-07-10 Thread TP Boudreau
Hi Zoltan, Thank you for the helpful clarification of the community's understanding of the TIMESTAMP annotation. The core of the problem (IMHO) is that there no way to distinguish in the new LogicalType TimestampType between the case where UTC-normalization has been directly reported (via a user

[jira] [Updated] (PARQUET-1222) Specify a well-defined sorting order for float and double types

2019-07-10 Thread Zoltan Ivanfi (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Ivanfi updated PARQUET-1222: --- Description: Currently parquet-format specifies the sort order for floating point numbers

Re: Forward compatibility issues with TIMESTAMP_MILLIS/MICROS ConvertedType

2019-07-10 Thread Wes McKinney
Correct On Wed, Jul 10, 2019 at 9:21 AM Zoltan Ivanfi wrote: > > Hi Wes, > > Do you mean that the new logical types have already been released in 0.14.0 > and a 0.14.1 is needed ASAP to fix this regression? > > Thanks, > > Zoltan > > On Wed, Jul 10, 2019 at 4:13 PM Wes McKinney wrote: > > > hi

Re: Forward compatibility issues with TIMESTAMP_MILLIS/MICROS ConvertedType

2019-07-10 Thread Zoltan Ivanfi
Hi Wes, Do you mean that the new logical types have already been released in 0.14.0 and a 0.14.1 is needed ASAP to fix this regression? Thanks, Zoltan On Wed, Jul 10, 2019 at 4:13 PM Wes McKinney wrote: > hi Zoltan -- given the raging fire that is 0.14.0 as a result of these > issues and

Re: Forward compatibility issues with TIMESTAMP_MILLIS/MICROS ConvertedType

2019-07-10 Thread Wes McKinney
hi Zoltan -- given the raging fire that is 0.14.0 as a result of these issues and others we need to make a new release within the next 7-10 days. We can point you to nightly Python builds to make testing for you easier so you don't have to build the project yourself. - Wes On Wed, Jul 10, 2019

Re: Forward compatibility issues with TIMESTAMP_MILLIS/MICROS ConvertedType

2019-07-10 Thread Zoltan Ivanfi
Hi, Oh, and one more thing: Before releasing the next Arrow version incorporating the new logical types, we should definitely test that their behaviour matches that of parquet-mr. When is the next release planned to come out? Br, Zoltan On Wed, Jul 10, 2019 at 3:57 PM Zoltan Ivanfi wrote: >

Re: Forward compatibility issues with TIMESTAMP_MILLIS/MICROS ConvertedType

2019-07-10 Thread Zoltan Ivanfi
Hi Wes, Yes, I agree that we should do that, but then we have a problem of what to do in the other direction, i.e. when we use the new logical types API to read a TIMESTAMP_MILLIS or TIMESTAMP_MICROS, how should we set the UTC normalized flag? Tim has started a discussion about that, suggesting

Re: Forward compatibility issues with TIMESTAMP_MILLIS/MICROS ConvertedType

2019-07-10 Thread Wes McKinney
Thank for the comments. So in summary I think that we need to set the TIMESTAMP_* converted types to maintain forward compatibility and stay consistent with what we were doing in the C++ library prior to the introduction of the LogicalType metadata. On Wed, Jul 10, 2019 at 8:20 AM Zoltan Ivanfi

Re: Forward compatibility issues with TIMESTAMP_MILLIS/MICROS ConvertedType

2019-07-10 Thread Zoltan Ivanfi
Hi Tim, In my opinion the specification of the older timestamp types only allowed UTC-normalized storage, since these types were defined as the number of milli/microseconds elapsed since the Unix epoch. This clearly defines the meaning of the numeric value 0 as 0 seconds after the Unix epoch,

Re: [VOTE] Parquet Bloom filter spec sign-off

2019-07-10 Thread 俊杰陈
I see, will resume this next week. Thanks. On Wed, Jul 10, 2019 at 5:26 PM Zoltan Ivanfi wrote: > > Hi Junjie, > > Since there are ongoing improvements addressing review comments, I would > hold off with the vote for a few more days until the specification settles. > > Br, > > Zoltan > > On

Re: Forward compatibility issues with TIMESTAMP_MILLIS/MICROS ConvertedType

2019-07-10 Thread Zoltan Ivanfi
Hi Wes, Both of the semantics are deterministic in one aspect and indeterministic in another. Timestamps of instant semantic will always refer to the same instant, but their user-facing representation (how they get displayed) depends on the user's time zone. Timestamps of local semantics always

[jira] [Commented] (PARQUET-1609) support xxhash in bloom filter

2019-07-10 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16881985#comment-16881985 ] ASF GitHub Bot commented on PARQUET-1609: - jbapple commented on pull request #143:

[jira] [Updated] (PARQUET-1609) support xxhash in bloom filter

2019-07-10 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated PARQUET-1609: Labels: pull-request-available (was: ) > support xxhash in bloom filter >

[jira] [Commented] (PARQUET-1617) Add more details to bloom filter spec

2019-07-10 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16881940#comment-16881940 ] ASF GitHub Bot commented on PARQUET-1617: - zivanfi commented on pull request #140:

[jira] [Commented] (PARQUET-1619) Merge crypto spec and structures to format master

2019-07-10 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16881937#comment-16881937 ] ASF GitHub Bot commented on PARQUET-1619: - ggershinsky commented on pull request #142:

[jira] [Updated] (PARQUET-1619) Merge crypto spec and structures to format master

2019-07-10 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated PARQUET-1619: Labels: pull-request-available (was: ) > Merge crypto spec and structures to format

[jira] [Created] (PARQUET-1619) Merge crypto spec and structures to format master

2019-07-10 Thread Gidon Gershinsky (JIRA)
Gidon Gershinsky created PARQUET-1619: - Summary: Merge crypto spec and structures to format master Key: PARQUET-1619 URL: https://issues.apache.org/jira/browse/PARQUET-1619 Project: Parquet

Re: [VOTE] Parquet Bloom filter spec sign-off

2019-07-10 Thread Zoltan Ivanfi
Hi Junjie, Since there are ongoing improvements addressing review comments, I would hold off with the vote for a few more days until the specification settles. Br, Zoltan On Wed, Jul 10, 2019 at 9:32 AM 俊杰陈 wrote: > Hi Parquet committers and developers > > We are waiting for your important

[jira] [Commented] (PARQUET-1618) Update encryption spec for Bloom filter encryption

2019-07-10 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16881851#comment-16881851 ] ASF GitHub Bot commented on PARQUET-1618: - ggershinsky commented on pull request #141:

[jira] [Updated] (PARQUET-1618) Update encryption spec for Bloom filter encryption

2019-07-10 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated PARQUET-1618: Labels: pull-request-available (was: ) > Update encryption spec for Bloom filter

[jira] [Resolved] (PARQUET-1552) upgrade protoc-jar-maven-plugin to 3.8.0

2019-07-10 Thread Nandor Kollar (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nandor Kollar resolved PARQUET-1552. Resolution: Fixed > upgrade protoc-jar-maven-plugin to 3.8.0 >

[jira] [Commented] (PARQUET-1552) upgrade protoc-jar-maven-plugin to 3.8.0

2019-07-10 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16881827#comment-16881827 ] ASF GitHub Bot commented on PARQUET-1552: - nandorKollar commented on pull request #659:

Re: [VOTE] Parquet Bloom filter spec sign-off

2019-07-10 Thread 俊杰陈
Hi Parquet committers and developers We are waiting for your important ballot:) On Tue, Jul 9, 2019 at 10:21 AM 俊杰陈 wrote: > > Yes, there are some public benchmark results, such as the official > benchmark from xxhash site (http://www.xxhash.com/) and published > comparison from smhasher

[jira] [Created] (PARQUET-1618) Update encryption spec for Bloom filter encryption

2019-07-10 Thread Gidon Gershinsky (JIRA)
Gidon Gershinsky created PARQUET-1618: - Summary: Update encryption spec for Bloom filter encryption Key: PARQUET-1618 URL: https://issues.apache.org/jira/browse/PARQUET-1618 Project: Parquet