Re: [VOTE] Removing validity bitmap from Arrow union types
+1 (binding) In "[VOTE] Removing validity bitmap from Arrow union types" on Mon, 29 Jun 2020 16:23:23 -0500, Wes McKinney wrote: > Hi, > > As discussed on the mailing list [1], it has been proposed to remove > the validity bitmap buffer from Union types in the columnar format > specification and instead let value validity be determined exclusively > by constituent arrays of the union. > > One of the primary motivations for this is to simplify the creation of > unions, since constructing a validity bitmap that merges the > information contained in the child arrays' bitmaps is quite > complicated. > > Note that change breaks IPC forward compatibility for union types, > however implementations with hitherto spec-compliant union > implementations would be able to (at their discretion, of course) > preserve backward compatibility for deserializing "old" union data in > the case that the parent null count of the union is zero. The expected > impact of this breakage is low, particularly given that Unions have > been absent from integration testing and thus not recommended for > anything but ephemeral serialization. > > Under the assumption that the MetadataVersion V4 -> V5 version bump is > accepted, in order to protect against forward compatibility problems, > Arrow implementations would be forbidden from serializing union types > using the MetadataVersion::V4. > > A PR with the changes to Columnar.rst is at [2]. > > The vote will be open for at least 72 hours. > > [ ] +1 Accept changes to Columnar.rst (removing union validity bitmaps) > [ ] +0 > [ ] -1 Do not accept changes because... > > [1]: > https://lists.apache.org/thread.html/r889d7532cf1e1eff74b072b4e642762ad39f4008caccef5ecde5b26e%40%3Cdev.arrow.apache.org%3E > [2]: https://github.com/apache/arrow/pull/7535
Re: [VOTE] Removing validity bitmap from Arrow union types
FYI: I just submitted a PR implementing this in C++ and in the integration tests. It was not too awful https://github.com/apache/arrow/pull/7598 On Tue, Jun 30, 2020 at 4:52 AM Antoine Pitrou wrote: > > +0 > > > Le 29/06/2020 à 23:23, Wes McKinney a écrit : > > Hi, > > > > As discussed on the mailing list [1], it has been proposed to remove > > the validity bitmap buffer from Union types in the columnar format > > specification and instead let value validity be determined exclusively > > by constituent arrays of the union. > > > > One of the primary motivations for this is to simplify the creation of > > unions, since constructing a validity bitmap that merges the > > information contained in the child arrays' bitmaps is quite > > complicated. > > > > Note that change breaks IPC forward compatibility for union types, > > however implementations with hitherto spec-compliant union > > implementations would be able to (at their discretion, of course) > > preserve backward compatibility for deserializing "old" union data in > > the case that the parent null count of the union is zero. The expected > > impact of this breakage is low, particularly given that Unions have > > been absent from integration testing and thus not recommended for > > anything but ephemeral serialization. > > > > Under the assumption that the MetadataVersion V4 -> V5 version bump is > > accepted, in order to protect against forward compatibility problems, > > Arrow implementations would be forbidden from serializing union types > > using the MetadataVersion::V4. > > > > A PR with the changes to Columnar.rst is at [2]. > > > > The vote will be open for at least 72 hours. > > > > [ ] +1 Accept changes to Columnar.rst (removing union validity bitmaps) > > [ ] +0 > > [ ] -1 Do not accept changes because... > > > > [1]: > > https://lists.apache.org/thread.html/r889d7532cf1e1eff74b072b4e642762ad39f4008caccef5ecde5b26e%40%3Cdev.arrow.apache.org%3E > > [2]: https://github.com/apache/arrow/pull/7535 > >
Re: [VOTE] Removing validity bitmap from Arrow union types
+0 Le 29/06/2020 à 23:23, Wes McKinney a écrit : > Hi, > > As discussed on the mailing list [1], it has been proposed to remove > the validity bitmap buffer from Union types in the columnar format > specification and instead let value validity be determined exclusively > by constituent arrays of the union. > > One of the primary motivations for this is to simplify the creation of > unions, since constructing a validity bitmap that merges the > information contained in the child arrays' bitmaps is quite > complicated. > > Note that change breaks IPC forward compatibility for union types, > however implementations with hitherto spec-compliant union > implementations would be able to (at their discretion, of course) > preserve backward compatibility for deserializing "old" union data in > the case that the parent null count of the union is zero. The expected > impact of this breakage is low, particularly given that Unions have > been absent from integration testing and thus not recommended for > anything but ephemeral serialization. > > Under the assumption that the MetadataVersion V4 -> V5 version bump is > accepted, in order to protect against forward compatibility problems, > Arrow implementations would be forbidden from serializing union types > using the MetadataVersion::V4. > > A PR with the changes to Columnar.rst is at [2]. > > The vote will be open for at least 72 hours. > > [ ] +1 Accept changes to Columnar.rst (removing union validity bitmaps) > [ ] +0 > [ ] -1 Do not accept changes because... > > [1]: > https://lists.apache.org/thread.html/r889d7532cf1e1eff74b072b4e642762ad39f4008caccef5ecde5b26e%40%3Cdev.arrow.apache.org%3E > [2]: https://github.com/apache/arrow/pull/7535 >
Re: [VOTE] Removing validity bitmap from Arrow union types
+1 (non binding) On Tue, Jun 30, 2020 at 5:29 AM Ben Kietzman wrote: > +1 (non binding) > > On Tue, Jun 30, 2020, 00:24 Wes McKinney wrote: > > > +1 (binding) > > > > On Mon, Jun 29, 2020 at 11:09 PM Micah Kornfield > > wrote: > > > > > > +1 (binding) (I had a couple of nits on language, that I put in the PR > > > > > > On Mon, Jun 29, 2020 at 2:24 PM Wes McKinney > > wrote: > > > > > > > Hi, > > > > > > > > As discussed on the mailing list [1], it has been proposed to remove > > > > the validity bitmap buffer from Union types in the columnar format > > > > specification and instead let value validity be determined > exclusively > > > > by constituent arrays of the union. > > > > > > > > One of the primary motivations for this is to simplify the creation > of > > > > unions, since constructing a validity bitmap that merges the > > > > information contained in the child arrays' bitmaps is quite > > > > complicated. > > > > > > > > Note that change breaks IPC forward compatibility for union types, > > > > however implementations with hitherto spec-compliant union > > > > implementations would be able to (at their discretion, of course) > > > > preserve backward compatibility for deserializing "old" union data in > > > > the case that the parent null count of the union is zero. The > expected > > > > impact of this breakage is low, particularly given that Unions have > > > > been absent from integration testing and thus not recommended for > > > > anything but ephemeral serialization. > > > > > > > > Under the assumption that the MetadataVersion V4 -> V5 version bump > is > > > > accepted, in order to protect against forward compatibility problems, > > > > Arrow implementations would be forbidden from serializing union types > > > > using the MetadataVersion::V4. > > > > > > > > A PR with the changes to Columnar.rst is at [2]. > > > > > > > > The vote will be open for at least 72 hours. > > > > > > > > [ ] +1 Accept changes to Columnar.rst (removing union validity > bitmaps) > > > > [ ] +0 > > > > [ ] -1 Do not accept changes because... > > > > > > > > [1]: > > > > > > > https://lists.apache.org/thread.html/r889d7532cf1e1eff74b072b4e642762ad39f4008caccef5ecde5b26e%40%3Cdev.arrow.apache.org%3E > > > > [2]: https://github.com/apache/arrow/pull/7535 > > > > > > >
Re: [VOTE] Removing validity bitmap from Arrow union types
+1 (non binding) On Tue, Jun 30, 2020, 00:24 Wes McKinney wrote: > +1 (binding) > > On Mon, Jun 29, 2020 at 11:09 PM Micah Kornfield > wrote: > > > > +1 (binding) (I had a couple of nits on language, that I put in the PR > > > > On Mon, Jun 29, 2020 at 2:24 PM Wes McKinney > wrote: > > > > > Hi, > > > > > > As discussed on the mailing list [1], it has been proposed to remove > > > the validity bitmap buffer from Union types in the columnar format > > > specification and instead let value validity be determined exclusively > > > by constituent arrays of the union. > > > > > > One of the primary motivations for this is to simplify the creation of > > > unions, since constructing a validity bitmap that merges the > > > information contained in the child arrays' bitmaps is quite > > > complicated. > > > > > > Note that change breaks IPC forward compatibility for union types, > > > however implementations with hitherto spec-compliant union > > > implementations would be able to (at their discretion, of course) > > > preserve backward compatibility for deserializing "old" union data in > > > the case that the parent null count of the union is zero. The expected > > > impact of this breakage is low, particularly given that Unions have > > > been absent from integration testing and thus not recommended for > > > anything but ephemeral serialization. > > > > > > Under the assumption that the MetadataVersion V4 -> V5 version bump is > > > accepted, in order to protect against forward compatibility problems, > > > Arrow implementations would be forbidden from serializing union types > > > using the MetadataVersion::V4. > > > > > > A PR with the changes to Columnar.rst is at [2]. > > > > > > The vote will be open for at least 72 hours. > > > > > > [ ] +1 Accept changes to Columnar.rst (removing union validity bitmaps) > > > [ ] +0 > > > [ ] -1 Do not accept changes because... > > > > > > [1]: > > > > https://lists.apache.org/thread.html/r889d7532cf1e1eff74b072b4e642762ad39f4008caccef5ecde5b26e%40%3Cdev.arrow.apache.org%3E > > > [2]: https://github.com/apache/arrow/pull/7535 > > > >
Re: [VOTE] Removing validity bitmap from Arrow union types
+1 (binding) On Mon, Jun 29, 2020 at 11:09 PM Micah Kornfield wrote: > > +1 (binding) (I had a couple of nits on language, that I put in the PR > > On Mon, Jun 29, 2020 at 2:24 PM Wes McKinney wrote: > > > Hi, > > > > As discussed on the mailing list [1], it has been proposed to remove > > the validity bitmap buffer from Union types in the columnar format > > specification and instead let value validity be determined exclusively > > by constituent arrays of the union. > > > > One of the primary motivations for this is to simplify the creation of > > unions, since constructing a validity bitmap that merges the > > information contained in the child arrays' bitmaps is quite > > complicated. > > > > Note that change breaks IPC forward compatibility for union types, > > however implementations with hitherto spec-compliant union > > implementations would be able to (at their discretion, of course) > > preserve backward compatibility for deserializing "old" union data in > > the case that the parent null count of the union is zero. The expected > > impact of this breakage is low, particularly given that Unions have > > been absent from integration testing and thus not recommended for > > anything but ephemeral serialization. > > > > Under the assumption that the MetadataVersion V4 -> V5 version bump is > > accepted, in order to protect against forward compatibility problems, > > Arrow implementations would be forbidden from serializing union types > > using the MetadataVersion::V4. > > > > A PR with the changes to Columnar.rst is at [2]. > > > > The vote will be open for at least 72 hours. > > > > [ ] +1 Accept changes to Columnar.rst (removing union validity bitmaps) > > [ ] +0 > > [ ] -1 Do not accept changes because... > > > > [1]: > > https://lists.apache.org/thread.html/r889d7532cf1e1eff74b072b4e642762ad39f4008caccef5ecde5b26e%40%3Cdev.arrow.apache.org%3E > > [2]: https://github.com/apache/arrow/pull/7535 > >
Re: [VOTE] Removing validity bitmap from Arrow union types
+1 (binding) (I had a couple of nits on language, that I put in the PR On Mon, Jun 29, 2020 at 2:24 PM Wes McKinney wrote: > Hi, > > As discussed on the mailing list [1], it has been proposed to remove > the validity bitmap buffer from Union types in the columnar format > specification and instead let value validity be determined exclusively > by constituent arrays of the union. > > One of the primary motivations for this is to simplify the creation of > unions, since constructing a validity bitmap that merges the > information contained in the child arrays' bitmaps is quite > complicated. > > Note that change breaks IPC forward compatibility for union types, > however implementations with hitherto spec-compliant union > implementations would be able to (at their discretion, of course) > preserve backward compatibility for deserializing "old" union data in > the case that the parent null count of the union is zero. The expected > impact of this breakage is low, particularly given that Unions have > been absent from integration testing and thus not recommended for > anything but ephemeral serialization. > > Under the assumption that the MetadataVersion V4 -> V5 version bump is > accepted, in order to protect against forward compatibility problems, > Arrow implementations would be forbidden from serializing union types > using the MetadataVersion::V4. > > A PR with the changes to Columnar.rst is at [2]. > > The vote will be open for at least 72 hours. > > [ ] +1 Accept changes to Columnar.rst (removing union validity bitmaps) > [ ] +0 > [ ] -1 Do not accept changes because... > > [1]: > https://lists.apache.org/thread.html/r889d7532cf1e1eff74b072b4e642762ad39f4008caccef5ecde5b26e%40%3Cdev.arrow.apache.org%3E > [2]: https://github.com/apache/arrow/pull/7535 >