Re: [VOTE] Clarifications and forward compatibility changes for Dictionary Encoding (second iteration)
+1 (binding) In "[VOTE] Clarifications and forward compatibility changes for Dictionary Encoding (second iteration)" on Wed, 20 Nov 2019 20:41:57 -0800, Micah Kornfield wrote: > Hello, > As discussed on [1], I've proposed clarifications in a PR [2] that > clarifies: > > 1. It is not required that all dictionary batches occur at the beginning > of the IPC stream format (if a the first record batch has an all null > dictionary encoded column, the null column's dictionary might not be sent > until later in the stream). > > 2. A second dictionary batch for the same ID that is not a "delta batch" > in an IPC stream indicates the dictionary should be replaced. > > 3. Clarifies that the file format, can only contain 1 "NON-delta" > dictionary batch and multiple "delta" dictionary batches. Dictionary > replacement is not supported in the file format. > > 4. Add an enum to dictionary metadata for possible future changes in what > format dictionary batches can be sent. (the most likely would be an array > Map). An enum is needed as a place holder to allow for forward > compatibility past the release 1.0.0. > > If accepted there will be work in all implementations to make sure that > they cover the edge cases highlighted and additional integration testing > will be needed. > > Please vote whether to accept these additions. The vote will be open for at > least 72 hours. > > [ ] +1 Accept these change to the specification > [ ] +0 > [ ] -1 Do not accept the changes because... > > Thanks, > Micah > > > [1] > https://lists.apache.org/thread.html/d0f137e9db0abfcfde2ef879ca517a710f620e5be4dd749923d22c37@%3Cdev.arrow.apache.org%3E > [2] https://github.com/apache/arrow/pull/5585
Re: [VOTE] Clarifications and forward compatibility changes for Dictionary Encoding (second iteration)
To clarify, we have already implemented option #1 ("It is not required that all dictionary batches occur at the beginning") in the previous PR[1]. So hope this proposal will be accepted and I would like to take follow-up works in Java side if possible. Thanks, Ji Liu [1] https://github.com/apache/arrow/pull/4960 -- From:Ji Liu Send Time:2019年11月26日(星期二) 14:04 To:dev ; Micah Kornfield Cc:Wes McKinney Subject:Re: [VOTE] Clarifications and forward compatibility changes for Dictionary Encoding (second iteration) +1 (non-binding) Thanks Ji Liu -- From:Fan Liya Send Time:2019年11月26日(星期二) 14:01 To:dev ; Micah Kornfield Cc:Wes McKinney Subject:Re: [VOTE] Clarifications and forward compatibility changes for Dictionary Encoding (second iteration) I am sorry I did not follow the thread closely (will follow up later). However, the proposal above looks good to me. So I am +0.5 for this. Best, Liya Fan On Tue, Nov 26, 2019 at 1:12 PM Micah Kornfield wrote: > Could other members of the community chime in on this? In particular > getting views from other language maintainers would be good. > > Thanks, > Micah > > On Thu, Nov 21, 2019 at 12:23 PM Micah Kornfield > wrote: > > > Forgot to say, My vote is +1 (binding). > > > > On Thu, Nov 21, 2019 at 12:09 PM Wes McKinney > wrote: > > > >> +1 (binding). Thanks Micah > >> > >> On Wed, Nov 20, 2019 at 10:42 PM Micah Kornfield > > >> wrote: > >> > > >> > Hello, > >> > As discussed on [1], I've proposed clarifications in a PR [2] that > >> > clarifies: > >> > > >> > 1. It is not required that all dictionary batches occur at the > >> beginning > >> > of the IPC stream format (if a the first record batch has an all null > >> > dictionary encoded column, the null column's dictionary might not be > >> sent > >> > until later in the stream). > >> > > >> > 2. A second dictionary batch for the same ID that is not a "delta > >> batch" > >> > in an IPC stream indicates the dictionary should be replaced. > >> > > >> > 3. Clarifies that the file format, can only contain 1 "NON-delta" > >> > dictionary batch and multiple "delta" dictionary batches. Dictionary > >> > replacement is not supported in the file format. > >> > > >> > 4. Add an enum to dictionary metadata for possible future changes in > >> what > >> > format dictionary batches can be sent. (the most likely would be an > >> array > >> > Map). An enum is needed as a place holder to allow for > >> forward > >> > compatibility past the release 1.0.0. > >> > > >> > If accepted there will be work in all implementations to make sure > that > >> > they cover the edge cases highlighted and additional integration > testing > >> > will be needed. > >> > > >> > Please vote whether to accept these additions. The vote will be open > >> for at > >> > least 72 hours. > >> > > >> > [ ] +1 Accept these change to the specification > >> > [ ] +0 > >> > [ ] -1 Do not accept the changes because... > >> > > >> > Thanks, > >> > Micah > >> > > >> > > >> > [1] > >> > > >> > https://lists.apache.org/thread.html/d0f137e9db0abfcfde2ef879ca517a710f620e5be4dd749923d22c37@%3Cdev.arrow.apache.org%3E > >> > [2] https://github.com/apache/arrow/pull/5585 > >> > > >
Re: [VOTE] Clarifications and forward compatibility changes for Dictionary Encoding (second iteration)
+1 (non-binding) Thanks Ji Liu -- From:Fan Liya Send Time:2019年11月26日(星期二) 14:01 To:dev ; Micah Kornfield Cc:Wes McKinney Subject:Re: [VOTE] Clarifications and forward compatibility changes for Dictionary Encoding (second iteration) I am sorry I did not follow the thread closely (will follow up later). However, the proposal above looks good to me. So I am +0.5 for this. Best, Liya Fan On Tue, Nov 26, 2019 at 1:12 PM Micah Kornfield wrote: > Could other members of the community chime in on this? In particular > getting views from other language maintainers would be good. > > Thanks, > Micah > > On Thu, Nov 21, 2019 at 12:23 PM Micah Kornfield > wrote: > > > Forgot to say, My vote is +1 (binding). > > > > On Thu, Nov 21, 2019 at 12:09 PM Wes McKinney > wrote: > > > >> +1 (binding). Thanks Micah > >> > >> On Wed, Nov 20, 2019 at 10:42 PM Micah Kornfield > > >> wrote: > >> > > >> > Hello, > >> > As discussed on [1], I've proposed clarifications in a PR [2] that > >> > clarifies: > >> > > >> > 1. It is not required that all dictionary batches occur at the > >> beginning > >> > of the IPC stream format (if a the first record batch has an all null > >> > dictionary encoded column, the null column's dictionary might not be > >> sent > >> > until later in the stream). > >> > > >> > 2. A second dictionary batch for the same ID that is not a "delta > >> batch" > >> > in an IPC stream indicates the dictionary should be replaced. > >> > > >> > 3. Clarifies that the file format, can only contain 1 "NON-delta" > >> > dictionary batch and multiple "delta" dictionary batches. Dictionary > >> > replacement is not supported in the file format. > >> > > >> > 4. Add an enum to dictionary metadata for possible future changes in > >> what > >> > format dictionary batches can be sent. (the most likely would be an > >> array > >> > Map). An enum is needed as a place holder to allow for > >> forward > >> > compatibility past the release 1.0.0. > >> > > >> > If accepted there will be work in all implementations to make sure > that > >> > they cover the edge cases highlighted and additional integration > testing > >> > will be needed. > >> > > >> > Please vote whether to accept these additions. The vote will be open > >> for at > >> > least 72 hours. > >> > > >> > [ ] +1 Accept these change to the specification > >> > [ ] +0 > >> > [ ] -1 Do not accept the changes because... > >> > > >> > Thanks, > >> > Micah > >> > > >> > > >> > [1] > >> > > >> > https://lists.apache.org/thread.html/d0f137e9db0abfcfde2ef879ca517a710f620e5be4dd749923d22c37@%3Cdev.arrow.apache.org%3E > >> > [2] https://github.com/apache/arrow/pull/5585 > >> > > >
Re: [VOTE] Clarifications and forward compatibility changes for Dictionary Encoding (second iteration)
I am sorry I did not follow the thread closely (will follow up later). However, the proposal above looks good to me. So I am +0.5 for this. Best, Liya Fan On Tue, Nov 26, 2019 at 1:12 PM Micah Kornfield wrote: > Could other members of the community chime in on this? In particular > getting views from other language maintainers would be good. > > Thanks, > Micah > > On Thu, Nov 21, 2019 at 12:23 PM Micah Kornfield > wrote: > > > Forgot to say, My vote is +1 (binding). > > > > On Thu, Nov 21, 2019 at 12:09 PM Wes McKinney > wrote: > > > >> +1 (binding). Thanks Micah > >> > >> On Wed, Nov 20, 2019 at 10:42 PM Micah Kornfield > > >> wrote: > >> > > >> > Hello, > >> > As discussed on [1], I've proposed clarifications in a PR [2] that > >> > clarifies: > >> > > >> > 1. It is not required that all dictionary batches occur at the > >> beginning > >> > of the IPC stream format (if a the first record batch has an all null > >> > dictionary encoded column, the null column's dictionary might not be > >> sent > >> > until later in the stream). > >> > > >> > 2. A second dictionary batch for the same ID that is not a "delta > >> batch" > >> > in an IPC stream indicates the dictionary should be replaced. > >> > > >> > 3. Clarifies that the file format, can only contain 1 "NON-delta" > >> > dictionary batch and multiple "delta" dictionary batches. Dictionary > >> > replacement is not supported in the file format. > >> > > >> > 4. Add an enum to dictionary metadata for possible future changes in > >> what > >> > format dictionary batches can be sent. (the most likely would be an > >> array > >> > Map). An enum is needed as a place holder to allow for > >> forward > >> > compatibility past the release 1.0.0. > >> > > >> > If accepted there will be work in all implementations to make sure > that > >> > they cover the edge cases highlighted and additional integration > testing > >> > will be needed. > >> > > >> > Please vote whether to accept these additions. The vote will be open > >> for at > >> > least 72 hours. > >> > > >> > [ ] +1 Accept these change to the specification > >> > [ ] +0 > >> > [ ] -1 Do not accept the changes because... > >> > > >> > Thanks, > >> > Micah > >> > > >> > > >> > [1] > >> > > >> > https://lists.apache.org/thread.html/d0f137e9db0abfcfde2ef879ca517a710f620e5be4dd749923d22c37@%3Cdev.arrow.apache.org%3E > >> > [2] https://github.com/apache/arrow/pull/5585 > >> > > >
Re: [VOTE] Clarifications and forward compatibility changes for Dictionary Encoding (second iteration)
Could other members of the community chime in on this? In particular getting views from other language maintainers would be good. Thanks, Micah On Thu, Nov 21, 2019 at 12:23 PM Micah Kornfield wrote: > Forgot to say, My vote is +1 (binding). > > On Thu, Nov 21, 2019 at 12:09 PM Wes McKinney wrote: > >> +1 (binding). Thanks Micah >> >> On Wed, Nov 20, 2019 at 10:42 PM Micah Kornfield >> wrote: >> > >> > Hello, >> > As discussed on [1], I've proposed clarifications in a PR [2] that >> > clarifies: >> > >> > 1. It is not required that all dictionary batches occur at the >> beginning >> > of the IPC stream format (if a the first record batch has an all null >> > dictionary encoded column, the null column's dictionary might not be >> sent >> > until later in the stream). >> > >> > 2. A second dictionary batch for the same ID that is not a "delta >> batch" >> > in an IPC stream indicates the dictionary should be replaced. >> > >> > 3. Clarifies that the file format, can only contain 1 "NON-delta" >> > dictionary batch and multiple "delta" dictionary batches. Dictionary >> > replacement is not supported in the file format. >> > >> > 4. Add an enum to dictionary metadata for possible future changes in >> what >> > format dictionary batches can be sent. (the most likely would be an >> array >> > Map). An enum is needed as a place holder to allow for >> forward >> > compatibility past the release 1.0.0. >> > >> > If accepted there will be work in all implementations to make sure that >> > they cover the edge cases highlighted and additional integration testing >> > will be needed. >> > >> > Please vote whether to accept these additions. The vote will be open >> for at >> > least 72 hours. >> > >> > [ ] +1 Accept these change to the specification >> > [ ] +0 >> > [ ] -1 Do not accept the changes because... >> > >> > Thanks, >> > Micah >> > >> > >> > [1] >> > >> https://lists.apache.org/thread.html/d0f137e9db0abfcfde2ef879ca517a710f620e5be4dd749923d22c37@%3Cdev.arrow.apache.org%3E >> > [2] https://github.com/apache/arrow/pull/5585 >> >
Re: [VOTE] Clarifications and forward compatibility changes for Dictionary Encoding (second iteration)
Forgot to say, My vote is +1 (binding). On Thu, Nov 21, 2019 at 12:09 PM Wes McKinney wrote: > +1 (binding). Thanks Micah > > On Wed, Nov 20, 2019 at 10:42 PM Micah Kornfield > wrote: > > > > Hello, > > As discussed on [1], I've proposed clarifications in a PR [2] that > > clarifies: > > > > 1. It is not required that all dictionary batches occur at the beginning > > of the IPC stream format (if a the first record batch has an all null > > dictionary encoded column, the null column's dictionary might not be sent > > until later in the stream). > > > > 2. A second dictionary batch for the same ID that is not a "delta batch" > > in an IPC stream indicates the dictionary should be replaced. > > > > 3. Clarifies that the file format, can only contain 1 "NON-delta" > > dictionary batch and multiple "delta" dictionary batches. Dictionary > > replacement is not supported in the file format. > > > > 4. Add an enum to dictionary metadata for possible future changes in > what > > format dictionary batches can be sent. (the most likely would be an array > > Map). An enum is needed as a place holder to allow for > forward > > compatibility past the release 1.0.0. > > > > If accepted there will be work in all implementations to make sure that > > they cover the edge cases highlighted and additional integration testing > > will be needed. > > > > Please vote whether to accept these additions. The vote will be open for > at > > least 72 hours. > > > > [ ] +1 Accept these change to the specification > > [ ] +0 > > [ ] -1 Do not accept the changes because... > > > > Thanks, > > Micah > > > > > > [1] > > > https://lists.apache.org/thread.html/d0f137e9db0abfcfde2ef879ca517a710f620e5be4dd749923d22c37@%3Cdev.arrow.apache.org%3E > > [2] https://github.com/apache/arrow/pull/5585 >
Re: [VOTE] Clarifications and forward compatibility changes for Dictionary Encoding (second iteration)
+1 (binding). Thanks Micah On Wed, Nov 20, 2019 at 10:42 PM Micah Kornfield wrote: > > Hello, > As discussed on [1], I've proposed clarifications in a PR [2] that > clarifies: > > 1. It is not required that all dictionary batches occur at the beginning > of the IPC stream format (if a the first record batch has an all null > dictionary encoded column, the null column's dictionary might not be sent > until later in the stream). > > 2. A second dictionary batch for the same ID that is not a "delta batch" > in an IPC stream indicates the dictionary should be replaced. > > 3. Clarifies that the file format, can only contain 1 "NON-delta" > dictionary batch and multiple "delta" dictionary batches. Dictionary > replacement is not supported in the file format. > > 4. Add an enum to dictionary metadata for possible future changes in what > format dictionary batches can be sent. (the most likely would be an array > Map). An enum is needed as a place holder to allow for forward > compatibility past the release 1.0.0. > > If accepted there will be work in all implementations to make sure that > they cover the edge cases highlighted and additional integration testing > will be needed. > > Please vote whether to accept these additions. The vote will be open for at > least 72 hours. > > [ ] +1 Accept these change to the specification > [ ] +0 > [ ] -1 Do not accept the changes because... > > Thanks, > Micah > > > [1] > https://lists.apache.org/thread.html/d0f137e9db0abfcfde2ef879ca517a710f620e5be4dd749923d22c37@%3Cdev.arrow.apache.org%3E > [2] https://github.com/apache/arrow/pull/5585