Re: [VOTE] Removing validity bitmap from Arrow union types

2020-06-30 Thread Sutou Kouhei
+1 (binding)

In 
  "[VOTE] Removing validity bitmap from Arrow union types" on Mon, 29 Jun 2020 
16:23:23 -0500,
  Wes McKinney  wrote:

> Hi,
> 
> As discussed on the mailing list [1], it has been proposed to remove
> the validity bitmap buffer from Union types in the columnar format
> specification and instead let value validity be determined exclusively
> by constituent arrays of the union.
> 
> One of the primary motivations for this is to simplify the creation of
> unions, since constructing a validity bitmap that merges the
> information contained in the child arrays' bitmaps is quite
> complicated.
> 
> Note that change breaks IPC forward compatibility for union types,
> however implementations with hitherto spec-compliant union
> implementations would be able to (at their discretion, of course)
> preserve backward compatibility for deserializing "old" union data in
> the case that the parent null count of the union is zero. The expected
> impact of this breakage is low, particularly given that Unions have
> been absent from integration testing and thus not recommended for
> anything but ephemeral serialization.
> 
> Under the assumption that the MetadataVersion V4 -> V5 version bump is
> accepted, in order to protect against forward compatibility problems,
> Arrow implementations would be forbidden from serializing union types
> using the MetadataVersion::V4.
> 
> A PR with the changes to Columnar.rst is at [2].
> 
> The vote will be open for at least 72 hours.
> 
> [ ] +1 Accept changes to Columnar.rst (removing union validity bitmaps)
> [ ] +0
> [ ] -1 Do not accept changes because...
> 
> [1]: 
> https://lists.apache.org/thread.html/r889d7532cf1e1eff74b072b4e642762ad39f4008caccef5ecde5b26e%40%3Cdev.arrow.apache.org%3E
> [2]: https://github.com/apache/arrow/pull/7535


Re: [VOTE] Removing validity bitmap from Arrow union types

2020-06-30 Thread Wes McKinney
FYI: I just submitted a PR implementing this in C++ and in the
integration tests. It was not too awful

https://github.com/apache/arrow/pull/7598

On Tue, Jun 30, 2020 at 4:52 AM Antoine Pitrou  wrote:
>
> +0
>
>
> Le 29/06/2020 à 23:23, Wes McKinney a écrit :
> > Hi,
> >
> > As discussed on the mailing list [1], it has been proposed to remove
> > the validity bitmap buffer from Union types in the columnar format
> > specification and instead let value validity be determined exclusively
> > by constituent arrays of the union.
> >
> > One of the primary motivations for this is to simplify the creation of
> > unions, since constructing a validity bitmap that merges the
> > information contained in the child arrays' bitmaps is quite
> > complicated.
> >
> > Note that change breaks IPC forward compatibility for union types,
> > however implementations with hitherto spec-compliant union
> > implementations would be able to (at their discretion, of course)
> > preserve backward compatibility for deserializing "old" union data in
> > the case that the parent null count of the union is zero. The expected
> > impact of this breakage is low, particularly given that Unions have
> > been absent from integration testing and thus not recommended for
> > anything but ephemeral serialization.
> >
> > Under the assumption that the MetadataVersion V4 -> V5 version bump is
> > accepted, in order to protect against forward compatibility problems,
> > Arrow implementations would be forbidden from serializing union types
> > using the MetadataVersion::V4.
> >
> > A PR with the changes to Columnar.rst is at [2].
> >
> > The vote will be open for at least 72 hours.
> >
> > [ ] +1 Accept changes to Columnar.rst (removing union validity bitmaps)
> > [ ] +0
> > [ ] -1 Do not accept changes because...
> >
> > [1]: 
> > https://lists.apache.org/thread.html/r889d7532cf1e1eff74b072b4e642762ad39f4008caccef5ecde5b26e%40%3Cdev.arrow.apache.org%3E
> > [2]: https://github.com/apache/arrow/pull/7535
> >


Re: [VOTE] Removing validity bitmap from Arrow union types

2020-06-30 Thread Antoine Pitrou
+0


Le 29/06/2020 à 23:23, Wes McKinney a écrit :
> Hi,
> 
> As discussed on the mailing list [1], it has been proposed to remove
> the validity bitmap buffer from Union types in the columnar format
> specification and instead let value validity be determined exclusively
> by constituent arrays of the union.
> 
> One of the primary motivations for this is to simplify the creation of
> unions, since constructing a validity bitmap that merges the
> information contained in the child arrays' bitmaps is quite
> complicated.
> 
> Note that change breaks IPC forward compatibility for union types,
> however implementations with hitherto spec-compliant union
> implementations would be able to (at their discretion, of course)
> preserve backward compatibility for deserializing "old" union data in
> the case that the parent null count of the union is zero. The expected
> impact of this breakage is low, particularly given that Unions have
> been absent from integration testing and thus not recommended for
> anything but ephemeral serialization.
> 
> Under the assumption that the MetadataVersion V4 -> V5 version bump is
> accepted, in order to protect against forward compatibility problems,
> Arrow implementations would be forbidden from serializing union types
> using the MetadataVersion::V4.
> 
> A PR with the changes to Columnar.rst is at [2].
> 
> The vote will be open for at least 72 hours.
> 
> [ ] +1 Accept changes to Columnar.rst (removing union validity bitmaps)
> [ ] +0
> [ ] -1 Do not accept changes because...
> 
> [1]: 
> https://lists.apache.org/thread.html/r889d7532cf1e1eff74b072b4e642762ad39f4008caccef5ecde5b26e%40%3Cdev.arrow.apache.org%3E
> [2]: https://github.com/apache/arrow/pull/7535
> 


Re: [VOTE] Removing validity bitmap from Arrow union types

2020-06-30 Thread Ryan Murray
+1 (non binding)


On Tue, Jun 30, 2020 at 5:29 AM Ben Kietzman 
wrote:

> +1 (non binding)
>
> On Tue, Jun 30, 2020, 00:24 Wes McKinney  wrote:
>
> > +1 (binding)
> >
> > On Mon, Jun 29, 2020 at 11:09 PM Micah Kornfield 
> > wrote:
> > >
> > > +1 (binding) (I had a couple of nits on language, that I put in the PR
> > >
> > > On Mon, Jun 29, 2020 at 2:24 PM Wes McKinney 
> > wrote:
> > >
> > > > Hi,
> > > >
> > > > As discussed on the mailing list [1], it has been proposed to remove
> > > > the validity bitmap buffer from Union types in the columnar format
> > > > specification and instead let value validity be determined
> exclusively
> > > > by constituent arrays of the union.
> > > >
> > > > One of the primary motivations for this is to simplify the creation
> of
> > > > unions, since constructing a validity bitmap that merges the
> > > > information contained in the child arrays' bitmaps is quite
> > > > complicated.
> > > >
> > > > Note that change breaks IPC forward compatibility for union types,
> > > > however implementations with hitherto spec-compliant union
> > > > implementations would be able to (at their discretion, of course)
> > > > preserve backward compatibility for deserializing "old" union data in
> > > > the case that the parent null count of the union is zero. The
> expected
> > > > impact of this breakage is low, particularly given that Unions have
> > > > been absent from integration testing and thus not recommended for
> > > > anything but ephemeral serialization.
> > > >
> > > > Under the assumption that the MetadataVersion V4 -> V5 version bump
> is
> > > > accepted, in order to protect against forward compatibility problems,
> > > > Arrow implementations would be forbidden from serializing union types
> > > > using the MetadataVersion::V4.
> > > >
> > > > A PR with the changes to Columnar.rst is at [2].
> > > >
> > > > The vote will be open for at least 72 hours.
> > > >
> > > > [ ] +1 Accept changes to Columnar.rst (removing union validity
> bitmaps)
> > > > [ ] +0
> > > > [ ] -1 Do not accept changes because...
> > > >
> > > > [1]:
> > > >
> >
> https://lists.apache.org/thread.html/r889d7532cf1e1eff74b072b4e642762ad39f4008caccef5ecde5b26e%40%3Cdev.arrow.apache.org%3E
> > > > [2]: https://github.com/apache/arrow/pull/7535
> > > >
> >
>


Re: [VOTE] Removing validity bitmap from Arrow union types

2020-06-29 Thread Ben Kietzman
+1 (non binding)

On Tue, Jun 30, 2020, 00:24 Wes McKinney  wrote:

> +1 (binding)
>
> On Mon, Jun 29, 2020 at 11:09 PM Micah Kornfield 
> wrote:
> >
> > +1 (binding) (I had a couple of nits on language, that I put in the PR
> >
> > On Mon, Jun 29, 2020 at 2:24 PM Wes McKinney 
> wrote:
> >
> > > Hi,
> > >
> > > As discussed on the mailing list [1], it has been proposed to remove
> > > the validity bitmap buffer from Union types in the columnar format
> > > specification and instead let value validity be determined exclusively
> > > by constituent arrays of the union.
> > >
> > > One of the primary motivations for this is to simplify the creation of
> > > unions, since constructing a validity bitmap that merges the
> > > information contained in the child arrays' bitmaps is quite
> > > complicated.
> > >
> > > Note that change breaks IPC forward compatibility for union types,
> > > however implementations with hitherto spec-compliant union
> > > implementations would be able to (at their discretion, of course)
> > > preserve backward compatibility for deserializing "old" union data in
> > > the case that the parent null count of the union is zero. The expected
> > > impact of this breakage is low, particularly given that Unions have
> > > been absent from integration testing and thus not recommended for
> > > anything but ephemeral serialization.
> > >
> > > Under the assumption that the MetadataVersion V4 -> V5 version bump is
> > > accepted, in order to protect against forward compatibility problems,
> > > Arrow implementations would be forbidden from serializing union types
> > > using the MetadataVersion::V4.
> > >
> > > A PR with the changes to Columnar.rst is at [2].
> > >
> > > The vote will be open for at least 72 hours.
> > >
> > > [ ] +1 Accept changes to Columnar.rst (removing union validity bitmaps)
> > > [ ] +0
> > > [ ] -1 Do not accept changes because...
> > >
> > > [1]:
> > >
> https://lists.apache.org/thread.html/r889d7532cf1e1eff74b072b4e642762ad39f4008caccef5ecde5b26e%40%3Cdev.arrow.apache.org%3E
> > > [2]: https://github.com/apache/arrow/pull/7535
> > >
>


Re: [VOTE] Removing validity bitmap from Arrow union types

2020-06-29 Thread Wes McKinney
+1 (binding)

On Mon, Jun 29, 2020 at 11:09 PM Micah Kornfield  wrote:
>
> +1 (binding) (I had a couple of nits on language, that I put in the PR
>
> On Mon, Jun 29, 2020 at 2:24 PM Wes McKinney  wrote:
>
> > Hi,
> >
> > As discussed on the mailing list [1], it has been proposed to remove
> > the validity bitmap buffer from Union types in the columnar format
> > specification and instead let value validity be determined exclusively
> > by constituent arrays of the union.
> >
> > One of the primary motivations for this is to simplify the creation of
> > unions, since constructing a validity bitmap that merges the
> > information contained in the child arrays' bitmaps is quite
> > complicated.
> >
> > Note that change breaks IPC forward compatibility for union types,
> > however implementations with hitherto spec-compliant union
> > implementations would be able to (at their discretion, of course)
> > preserve backward compatibility for deserializing "old" union data in
> > the case that the parent null count of the union is zero. The expected
> > impact of this breakage is low, particularly given that Unions have
> > been absent from integration testing and thus not recommended for
> > anything but ephemeral serialization.
> >
> > Under the assumption that the MetadataVersion V4 -> V5 version bump is
> > accepted, in order to protect against forward compatibility problems,
> > Arrow implementations would be forbidden from serializing union types
> > using the MetadataVersion::V4.
> >
> > A PR with the changes to Columnar.rst is at [2].
> >
> > The vote will be open for at least 72 hours.
> >
> > [ ] +1 Accept changes to Columnar.rst (removing union validity bitmaps)
> > [ ] +0
> > [ ] -1 Do not accept changes because...
> >
> > [1]:
> > https://lists.apache.org/thread.html/r889d7532cf1e1eff74b072b4e642762ad39f4008caccef5ecde5b26e%40%3Cdev.arrow.apache.org%3E
> > [2]: https://github.com/apache/arrow/pull/7535
> >


Re: [VOTE] Removing validity bitmap from Arrow union types

2020-06-29 Thread Micah Kornfield
+1 (binding) (I had a couple of nits on language, that I put in the PR

On Mon, Jun 29, 2020 at 2:24 PM Wes McKinney  wrote:

> Hi,
>
> As discussed on the mailing list [1], it has been proposed to remove
> the validity bitmap buffer from Union types in the columnar format
> specification and instead let value validity be determined exclusively
> by constituent arrays of the union.
>
> One of the primary motivations for this is to simplify the creation of
> unions, since constructing a validity bitmap that merges the
> information contained in the child arrays' bitmaps is quite
> complicated.
>
> Note that change breaks IPC forward compatibility for union types,
> however implementations with hitherto spec-compliant union
> implementations would be able to (at their discretion, of course)
> preserve backward compatibility for deserializing "old" union data in
> the case that the parent null count of the union is zero. The expected
> impact of this breakage is low, particularly given that Unions have
> been absent from integration testing and thus not recommended for
> anything but ephemeral serialization.
>
> Under the assumption that the MetadataVersion V4 -> V5 version bump is
> accepted, in order to protect against forward compatibility problems,
> Arrow implementations would be forbidden from serializing union types
> using the MetadataVersion::V4.
>
> A PR with the changes to Columnar.rst is at [2].
>
> The vote will be open for at least 72 hours.
>
> [ ] +1 Accept changes to Columnar.rst (removing union validity bitmaps)
> [ ] +0
> [ ] -1 Do not accept changes because...
>
> [1]:
> https://lists.apache.org/thread.html/r889d7532cf1e1eff74b072b4e642762ad39f4008caccef5ecde5b26e%40%3Cdev.arrow.apache.org%3E
> [2]: https://github.com/apache/arrow/pull/7535
>