Re: [VOTE] Clarifications and forward compatibility changes for Dictionary Encoding (second iteration)

2019-11-26 Thread Sutou Kouhei
+1 (binding)

In 
  "[VOTE] Clarifications and forward compatibility changes for Dictionary 
Encoding (second iteration)" on Wed, 20 Nov 2019 20:41:57 -0800,
  Micah Kornfield  wrote:

> Hello,
> As discussed on [1], I've proposed clarifications in a PR [2] that
> clarifies:
> 
> 1.  It is not required that all dictionary batches occur at the beginning
> of the IPC stream format (if a the first record batch has an all null
> dictionary encoded column, the null column's dictionary might not be sent
> until later in the stream).
> 
> 2.  A second dictionary batch for the same ID that is not a "delta batch"
> in an IPC stream indicates the dictionary should be replaced.
> 
> 3.  Clarifies that the file format, can only contain 1 "NON-delta"
> dictionary batch and multiple "delta" dictionary batches. Dictionary
> replacement is not supported in the file format.
> 
> 4.  Add an enum to dictionary metadata for possible future changes in what
> format dictionary batches can be sent. (the most likely would be an array
> Map).  An enum is needed as a place holder to allow for forward
> compatibility past the release 1.0.0.
> 
> If accepted there will be work in all implementations to make sure that
> they cover the edge cases highlighted and additional integration testing
> will be needed.
> 
> Please vote whether to accept these additions. The vote will be open for at
> least 72 hours.
> 
> [ ] +1 Accept these change to the specification
> [ ] +0
> [ ] -1 Do not accept the changes because...
> 
> Thanks,
> Micah
> 
> 
> [1]
> https://lists.apache.org/thread.html/d0f137e9db0abfcfde2ef879ca517a710f620e5be4dd749923d22c37@%3Cdev.arrow.apache.org%3E
> [2] https://github.com/apache/arrow/pull/5585


Re: [VOTE] Clarifications and forward compatibility changes for Dictionary Encoding (second iteration)

2019-11-25 Thread Ji Liu
To clarify, we have already implemented option #1 ("It is not required that all 
dictionary batches occur at the beginning") in the previous PR[1].

So hope this proposal will be accepted and I would like to take follow-up works 
in Java side if possible.

Thanks,
Ji Liu


[1] https://github.com/apache/arrow/pull/4960


--
From:Ji Liu 
Send Time:2019年11月26日(星期二) 14:04
To:dev ; Micah Kornfield 
Cc:Wes McKinney 
Subject:Re: [VOTE] Clarifications and forward compatibility changes for 
Dictionary Encoding (second iteration)

+1 (non-binding)

Thanks
Ji Liu


--
From:Fan Liya 
Send Time:2019年11月26日(星期二) 14:01
To:dev ; Micah Kornfield 
Cc:Wes McKinney 
Subject:Re: [VOTE] Clarifications and forward compatibility changes for 
Dictionary Encoding (second iteration)

I am sorry I did not follow the thread closely (will follow up later).
However, the proposal above looks good to me.
So I am +0.5 for this.

Best,
Liya Fan

On Tue, Nov 26, 2019 at 1:12 PM Micah Kornfield 
wrote:

> Could other members of the community chime in on this?  In particular
> getting views from other language maintainers would be good.
>
> Thanks,
> Micah
>
> On Thu, Nov 21, 2019 at 12:23 PM Micah Kornfield 
> wrote:
>
> > Forgot to say,  My vote is +1 (binding).
> >
> > On Thu, Nov 21, 2019 at 12:09 PM Wes McKinney 
> wrote:
> >
> >> +1 (binding). Thanks Micah
> >>
> >> On Wed, Nov 20, 2019 at 10:42 PM Micah Kornfield  >
> >> wrote:
> >> >
> >> > Hello,
> >> > As discussed on [1], I've proposed clarifications in a PR [2] that
> >> > clarifies:
> >> >
> >> > 1.  It is not required that all dictionary batches occur at the
> >> beginning
> >> > of the IPC stream format (if a the first record batch has an all null
> >> > dictionary encoded column, the null column's dictionary might not be
> >> sent
> >> > until later in the stream).
> >> >
> >> > 2.  A second dictionary batch for the same ID that is not a "delta
> >> batch"
> >> > in an IPC stream indicates the dictionary should be replaced.
> >> >
> >> > 3.  Clarifies that the file format, can only contain 1 "NON-delta"
> >> > dictionary batch and multiple "delta" dictionary batches. Dictionary
> >> > replacement is not supported in the file format.
> >> >
> >> > 4.  Add an enum to dictionary metadata for possible future changes in
> >> what
> >> > format dictionary batches can be sent. (the most likely would be an
> >> array
> >> > Map).  An enum is needed as a place holder to allow for
> >> forward
> >> > compatibility past the release 1.0.0.
> >> >
> >> > If accepted there will be work in all implementations to make sure
> that
> >> > they cover the edge cases highlighted and additional integration
> testing
> >> > will be needed.
> >> >
> >> > Please vote whether to accept these additions. The vote will be open
> >> for at
> >> > least 72 hours.
> >> >
> >> > [ ] +1 Accept these change to the specification
> >> > [ ] +0
> >> > [ ] -1 Do not accept the changes because...
> >> >
> >> > Thanks,
> >> > Micah
> >> >
> >> >
> >> > [1]
> >> >
> >>
> https://lists.apache.org/thread.html/d0f137e9db0abfcfde2ef879ca517a710f620e5be4dd749923d22c37@%3Cdev.arrow.apache.org%3E
> >> > [2] https://github.com/apache/arrow/pull/5585
> >>
> >
>



Re: [VOTE] Clarifications and forward compatibility changes for Dictionary Encoding (second iteration)

2019-11-25 Thread Ji Liu
+1 (non-binding)

Thanks
Ji Liu


--
From:Fan Liya 
Send Time:2019年11月26日(星期二) 14:01
To:dev ; Micah Kornfield 
Cc:Wes McKinney 
Subject:Re: [VOTE] Clarifications and forward compatibility changes for 
Dictionary Encoding (second iteration)

I am sorry I did not follow the thread closely (will follow up later).
However, the proposal above looks good to me.
So I am +0.5 for this.

Best,
Liya Fan

On Tue, Nov 26, 2019 at 1:12 PM Micah Kornfield 
wrote:

> Could other members of the community chime in on this?  In particular
> getting views from other language maintainers would be good.
>
> Thanks,
> Micah
>
> On Thu, Nov 21, 2019 at 12:23 PM Micah Kornfield 
> wrote:
>
> > Forgot to say,  My vote is +1 (binding).
> >
> > On Thu, Nov 21, 2019 at 12:09 PM Wes McKinney 
> wrote:
> >
> >> +1 (binding). Thanks Micah
> >>
> >> On Wed, Nov 20, 2019 at 10:42 PM Micah Kornfield  >
> >> wrote:
> >> >
> >> > Hello,
> >> > As discussed on [1], I've proposed clarifications in a PR [2] that
> >> > clarifies:
> >> >
> >> > 1.  It is not required that all dictionary batches occur at the
> >> beginning
> >> > of the IPC stream format (if a the first record batch has an all null
> >> > dictionary encoded column, the null column's dictionary might not be
> >> sent
> >> > until later in the stream).
> >> >
> >> > 2.  A second dictionary batch for the same ID that is not a "delta
> >> batch"
> >> > in an IPC stream indicates the dictionary should be replaced.
> >> >
> >> > 3.  Clarifies that the file format, can only contain 1 "NON-delta"
> >> > dictionary batch and multiple "delta" dictionary batches. Dictionary
> >> > replacement is not supported in the file format.
> >> >
> >> > 4.  Add an enum to dictionary metadata for possible future changes in
> >> what
> >> > format dictionary batches can be sent. (the most likely would be an
> >> array
> >> > Map).  An enum is needed as a place holder to allow for
> >> forward
> >> > compatibility past the release 1.0.0.
> >> >
> >> > If accepted there will be work in all implementations to make sure
> that
> >> > they cover the edge cases highlighted and additional integration
> testing
> >> > will be needed.
> >> >
> >> > Please vote whether to accept these additions. The vote will be open
> >> for at
> >> > least 72 hours.
> >> >
> >> > [ ] +1 Accept these change to the specification
> >> > [ ] +0
> >> > [ ] -1 Do not accept the changes because...
> >> >
> >> > Thanks,
> >> > Micah
> >> >
> >> >
> >> > [1]
> >> >
> >>
> https://lists.apache.org/thread.html/d0f137e9db0abfcfde2ef879ca517a710f620e5be4dd749923d22c37@%3Cdev.arrow.apache.org%3E
> >> > [2] https://github.com/apache/arrow/pull/5585
> >>
> >
>


Re: [VOTE] Clarifications and forward compatibility changes for Dictionary Encoding (second iteration)

2019-11-25 Thread Fan Liya
I am sorry I did not follow the thread closely (will follow up later).
However, the proposal above looks good to me.
So I am +0.5 for this.

Best,
Liya Fan

On Tue, Nov 26, 2019 at 1:12 PM Micah Kornfield 
wrote:

> Could other members of the community chime in on this?  In particular
> getting views from other language maintainers would be good.
>
> Thanks,
> Micah
>
> On Thu, Nov 21, 2019 at 12:23 PM Micah Kornfield 
> wrote:
>
> > Forgot to say,  My vote is +1 (binding).
> >
> > On Thu, Nov 21, 2019 at 12:09 PM Wes McKinney 
> wrote:
> >
> >> +1 (binding). Thanks Micah
> >>
> >> On Wed, Nov 20, 2019 at 10:42 PM Micah Kornfield  >
> >> wrote:
> >> >
> >> > Hello,
> >> > As discussed on [1], I've proposed clarifications in a PR [2] that
> >> > clarifies:
> >> >
> >> > 1.  It is not required that all dictionary batches occur at the
> >> beginning
> >> > of the IPC stream format (if a the first record batch has an all null
> >> > dictionary encoded column, the null column's dictionary might not be
> >> sent
> >> > until later in the stream).
> >> >
> >> > 2.  A second dictionary batch for the same ID that is not a "delta
> >> batch"
> >> > in an IPC stream indicates the dictionary should be replaced.
> >> >
> >> > 3.  Clarifies that the file format, can only contain 1 "NON-delta"
> >> > dictionary batch and multiple "delta" dictionary batches. Dictionary
> >> > replacement is not supported in the file format.
> >> >
> >> > 4.  Add an enum to dictionary metadata for possible future changes in
> >> what
> >> > format dictionary batches can be sent. (the most likely would be an
> >> array
> >> > Map).  An enum is needed as a place holder to allow for
> >> forward
> >> > compatibility past the release 1.0.0.
> >> >
> >> > If accepted there will be work in all implementations to make sure
> that
> >> > they cover the edge cases highlighted and additional integration
> testing
> >> > will be needed.
> >> >
> >> > Please vote whether to accept these additions. The vote will be open
> >> for at
> >> > least 72 hours.
> >> >
> >> > [ ] +1 Accept these change to the specification
> >> > [ ] +0
> >> > [ ] -1 Do not accept the changes because...
> >> >
> >> > Thanks,
> >> > Micah
> >> >
> >> >
> >> > [1]
> >> >
> >>
> https://lists.apache.org/thread.html/d0f137e9db0abfcfde2ef879ca517a710f620e5be4dd749923d22c37@%3Cdev.arrow.apache.org%3E
> >> > [2] https://github.com/apache/arrow/pull/5585
> >>
> >
>


Re: [VOTE] Clarifications and forward compatibility changes for Dictionary Encoding (second iteration)

2019-11-25 Thread Micah Kornfield
Could other members of the community chime in on this?  In particular
getting views from other language maintainers would be good.

Thanks,
Micah

On Thu, Nov 21, 2019 at 12:23 PM Micah Kornfield 
wrote:

> Forgot to say,  My vote is +1 (binding).
>
> On Thu, Nov 21, 2019 at 12:09 PM Wes McKinney  wrote:
>
>> +1 (binding). Thanks Micah
>>
>> On Wed, Nov 20, 2019 at 10:42 PM Micah Kornfield 
>> wrote:
>> >
>> > Hello,
>> > As discussed on [1], I've proposed clarifications in a PR [2] that
>> > clarifies:
>> >
>> > 1.  It is not required that all dictionary batches occur at the
>> beginning
>> > of the IPC stream format (if a the first record batch has an all null
>> > dictionary encoded column, the null column's dictionary might not be
>> sent
>> > until later in the stream).
>> >
>> > 2.  A second dictionary batch for the same ID that is not a "delta
>> batch"
>> > in an IPC stream indicates the dictionary should be replaced.
>> >
>> > 3.  Clarifies that the file format, can only contain 1 "NON-delta"
>> > dictionary batch and multiple "delta" dictionary batches. Dictionary
>> > replacement is not supported in the file format.
>> >
>> > 4.  Add an enum to dictionary metadata for possible future changes in
>> what
>> > format dictionary batches can be sent. (the most likely would be an
>> array
>> > Map).  An enum is needed as a place holder to allow for
>> forward
>> > compatibility past the release 1.0.0.
>> >
>> > If accepted there will be work in all implementations to make sure that
>> > they cover the edge cases highlighted and additional integration testing
>> > will be needed.
>> >
>> > Please vote whether to accept these additions. The vote will be open
>> for at
>> > least 72 hours.
>> >
>> > [ ] +1 Accept these change to the specification
>> > [ ] +0
>> > [ ] -1 Do not accept the changes because...
>> >
>> > Thanks,
>> > Micah
>> >
>> >
>> > [1]
>> >
>> https://lists.apache.org/thread.html/d0f137e9db0abfcfde2ef879ca517a710f620e5be4dd749923d22c37@%3Cdev.arrow.apache.org%3E
>> > [2] https://github.com/apache/arrow/pull/5585
>>
>


Re: [VOTE] Clarifications and forward compatibility changes for Dictionary Encoding (second iteration)

2019-11-21 Thread Micah Kornfield
Forgot to say,  My vote is +1 (binding).

On Thu, Nov 21, 2019 at 12:09 PM Wes McKinney  wrote:

> +1 (binding). Thanks Micah
>
> On Wed, Nov 20, 2019 at 10:42 PM Micah Kornfield 
> wrote:
> >
> > Hello,
> > As discussed on [1], I've proposed clarifications in a PR [2] that
> > clarifies:
> >
> > 1.  It is not required that all dictionary batches occur at the beginning
> > of the IPC stream format (if a the first record batch has an all null
> > dictionary encoded column, the null column's dictionary might not be sent
> > until later in the stream).
> >
> > 2.  A second dictionary batch for the same ID that is not a "delta batch"
> > in an IPC stream indicates the dictionary should be replaced.
> >
> > 3.  Clarifies that the file format, can only contain 1 "NON-delta"
> > dictionary batch and multiple "delta" dictionary batches. Dictionary
> > replacement is not supported in the file format.
> >
> > 4.  Add an enum to dictionary metadata for possible future changes in
> what
> > format dictionary batches can be sent. (the most likely would be an array
> > Map).  An enum is needed as a place holder to allow for
> forward
> > compatibility past the release 1.0.0.
> >
> > If accepted there will be work in all implementations to make sure that
> > they cover the edge cases highlighted and additional integration testing
> > will be needed.
> >
> > Please vote whether to accept these additions. The vote will be open for
> at
> > least 72 hours.
> >
> > [ ] +1 Accept these change to the specification
> > [ ] +0
> > [ ] -1 Do not accept the changes because...
> >
> > Thanks,
> > Micah
> >
> >
> > [1]
> >
> https://lists.apache.org/thread.html/d0f137e9db0abfcfde2ef879ca517a710f620e5be4dd749923d22c37@%3Cdev.arrow.apache.org%3E
> > [2] https://github.com/apache/arrow/pull/5585
>


Re: [VOTE] Clarifications and forward compatibility changes for Dictionary Encoding (second iteration)

2019-11-21 Thread Wes McKinney
+1 (binding). Thanks Micah

On Wed, Nov 20, 2019 at 10:42 PM Micah Kornfield  wrote:
>
> Hello,
> As discussed on [1], I've proposed clarifications in a PR [2] that
> clarifies:
>
> 1.  It is not required that all dictionary batches occur at the beginning
> of the IPC stream format (if a the first record batch has an all null
> dictionary encoded column, the null column's dictionary might not be sent
> until later in the stream).
>
> 2.  A second dictionary batch for the same ID that is not a "delta batch"
> in an IPC stream indicates the dictionary should be replaced.
>
> 3.  Clarifies that the file format, can only contain 1 "NON-delta"
> dictionary batch and multiple "delta" dictionary batches. Dictionary
> replacement is not supported in the file format.
>
> 4.  Add an enum to dictionary metadata for possible future changes in what
> format dictionary batches can be sent. (the most likely would be an array
> Map).  An enum is needed as a place holder to allow for forward
> compatibility past the release 1.0.0.
>
> If accepted there will be work in all implementations to make sure that
> they cover the edge cases highlighted and additional integration testing
> will be needed.
>
> Please vote whether to accept these additions. The vote will be open for at
> least 72 hours.
>
> [ ] +1 Accept these change to the specification
> [ ] +0
> [ ] -1 Do not accept the changes because...
>
> Thanks,
> Micah
>
>
> [1]
> https://lists.apache.org/thread.html/d0f137e9db0abfcfde2ef879ca517a710f620e5be4dd749923d22c37@%3Cdev.arrow.apache.org%3E
> [2] https://github.com/apache/arrow/pull/5585