Re: [VOTE][Format] UUID canonical extension type

2024-05-07 Thread Rok Mihevc
Hi all,

With 8 +1 votes (4 binding, 4 non-binding) and 0 -1 votes the proposal is
approved as shown below and in the PR [1].
Thank you everyone who voted and helped shape this proposal.

[1] https://github.com/apache/arrow/pull/41299

---

UUID


* Extension name: `arrow.uuid`.

* The storage type of the extension is ``FixedSizeBinary`` with a length of
16 bytes.

.. note::
   A specific UUID version is not required or guaranteed. This extension
represents
   UUIDs as FixedSizeBinary(16) with big-endian notation and does not
interpret the bytes in any way.


Re: [VOTE][Format] UUID canonical extension type

2024-05-07 Thread Rok Mihevc
+1 (non-binding)

On Mon, May 6, 2024 at 12:14 PM Wes McKinney  wrote:

> +1
>
> On Tue, Apr 30, 2024 at 4:03 PM Antoine Pitrou  wrote:
>
> > +1 (binding)
> >
> >
> > Le 19/04/2024 à 22:22, Rok Mihevc a écrit :
> > > Hi all,
> > >
> > > Following initial requests [1][2] and recent tangential ML discussion
> > [3] I
> > > would like to propose a vote to add language for UUID canonical
> extension
> > > type to CanonicalExtensions.rst as in PR [4] and written below.
> > > A draft C++ and Python implementation PR can be seen here [5].
> > >
> > > [1] https://lists.apache.org/thread/k2zvgoq62dyqmw3mj2t6ozfzhzkjkc4j
> > > [2] https://github.com/apache/arrow/issues/15058
> > > [3] https://lists.apache.org/thread/8d5ldl5cb7mms21rd15lhpfrv4j9no4n
> > > [4] https://github.com/apache/arrow/pull/41299 <- proposed change
> > > [5] https://github.com/apache/arrow/pull/37298
> > >
> > >
> > > The vote will be open for at least 72 hours.
> > >
> > > [ ] +1 Accept this proposal
> > > [ ] +0
> > > [ ] -1 Do not accept this proposal because...
> > >
> > >
> > > UUID
> > > 
> > >
> > > * Extension name: `arrow.uuid`.
> > >
> > > * The storage type of the extension is ``FixedSizeBinary`` with a
> length
> > of
> > > 16 bytes.
> > >
> > > .. note::
> > > A specific UUID version is not required or guaranteed. This
> extension
> > > represents
> > > UUIDs as FixedSizeBinary(16) and does not interpret the bytes in
> any
> > way.
> > >
> > >
> > >
> > > Rok
> > >
> >
>


Re: [VOTE][Format] UUID canonical extension type

2024-05-06 Thread Wes McKinney
+1

On Tue, Apr 30, 2024 at 4:03 PM Antoine Pitrou  wrote:

> +1 (binding)
>
>
> Le 19/04/2024 à 22:22, Rok Mihevc a écrit :
> > Hi all,
> >
> > Following initial requests [1][2] and recent tangential ML discussion
> [3] I
> > would like to propose a vote to add language for UUID canonical extension
> > type to CanonicalExtensions.rst as in PR [4] and written below.
> > A draft C++ and Python implementation PR can be seen here [5].
> >
> > [1] https://lists.apache.org/thread/k2zvgoq62dyqmw3mj2t6ozfzhzkjkc4j
> > [2] https://github.com/apache/arrow/issues/15058
> > [3] https://lists.apache.org/thread/8d5ldl5cb7mms21rd15lhpfrv4j9no4n
> > [4] https://github.com/apache/arrow/pull/41299 <- proposed change
> > [5] https://github.com/apache/arrow/pull/37298
> >
> >
> > The vote will be open for at least 72 hours.
> >
> > [ ] +1 Accept this proposal
> > [ ] +0
> > [ ] -1 Do not accept this proposal because...
> >
> >
> > UUID
> > 
> >
> > * Extension name: `arrow.uuid`.
> >
> > * The storage type of the extension is ``FixedSizeBinary`` with a length
> of
> > 16 bytes.
> >
> > .. note::
> > A specific UUID version is not required or guaranteed. This extension
> > represents
> > UUIDs as FixedSizeBinary(16) and does not interpret the bytes in any
> way.
> >
> >
> >
> > Rok
> >
>


Re: [VOTE][Format] UUID canonical extension type

2024-04-30 Thread Antoine Pitrou

+1 (binding)


Le 19/04/2024 à 22:22, Rok Mihevc a écrit :

Hi all,

Following initial requests [1][2] and recent tangential ML discussion [3] I
would like to propose a vote to add language for UUID canonical extension
type to CanonicalExtensions.rst as in PR [4] and written below.
A draft C++ and Python implementation PR can be seen here [5].

[1] https://lists.apache.org/thread/k2zvgoq62dyqmw3mj2t6ozfzhzkjkc4j
[2] https://github.com/apache/arrow/issues/15058
[3] https://lists.apache.org/thread/8d5ldl5cb7mms21rd15lhpfrv4j9no4n
[4] https://github.com/apache/arrow/pull/41299 <- proposed change
[5] https://github.com/apache/arrow/pull/37298


The vote will be open for at least 72 hours.

[ ] +1 Accept this proposal
[ ] +0
[ ] -1 Do not accept this proposal because...


UUID


* Extension name: `arrow.uuid`.

* The storage type of the extension is ``FixedSizeBinary`` with a length of
16 bytes.

.. note::
A specific UUID version is not required or guaranteed. This extension
represents
UUIDs as FixedSizeBinary(16) and does not interpret the bytes in any way.



Rok



Re: [VOTE][Format] UUID canonical extension type

2024-04-30 Thread Joris Van den Bossche
+1 (binding)

On Tue, 30 Apr 2024 at 19:52, Jacob Wujciak  wrote:

> +1 (non-binding)
>
> Am Di., 30. Apr. 2024 um 17:48 Uhr schrieb Weston Pace <
> weston.p...@gmail.com>:
>
> > +1 (binding)
> >
> > On Tue, Apr 30, 2024 at 7:53 AM Rok Mihevc  wrote:
> >
> > > Thanks for all the reviews and comments! I've included the big-endian
> > > requirement so the proposed language is now as below.
> > > I'll leave the vote open until after the May holiday.
> > >
> > > Rok
> > >
> > > UUID
> > > 
> > >
> > > * Extension name: `arrow.uuid`.
> > >
> > > * The storage type of the extension is ``FixedSizeBinary`` with a
> length
> > of
> > > 16 bytes.
> > >
> > > .. note::
> > >A specific UUID version is not required or guaranteed. This
> extension
> > > represents
> > >UUIDs as FixedSizeBinary(16) *with big-endian notation* and does not
> > > interpret the bytes in any way.
> > >
> >
>


Re: [VOTE][Format] UUID canonical extension type

2024-04-30 Thread Jacob Wujciak
+1 (non-binding)

Am Di., 30. Apr. 2024 um 17:48 Uhr schrieb Weston Pace <
weston.p...@gmail.com>:

> +1 (binding)
>
> On Tue, Apr 30, 2024 at 7:53 AM Rok Mihevc  wrote:
>
> > Thanks for all the reviews and comments! I've included the big-endian
> > requirement so the proposed language is now as below.
> > I'll leave the vote open until after the May holiday.
> >
> > Rok
> >
> > UUID
> > 
> >
> > * Extension name: `arrow.uuid`.
> >
> > * The storage type of the extension is ``FixedSizeBinary`` with a length
> of
> > 16 bytes.
> >
> > .. note::
> >A specific UUID version is not required or guaranteed. This extension
> > represents
> >UUIDs as FixedSizeBinary(16) *with big-endian notation* and does not
> > interpret the bytes in any way.
> >
>


Re: [VOTE][Format] UUID canonical extension type

2024-04-30 Thread Weston Pace
+1 (binding)

On Tue, Apr 30, 2024 at 7:53 AM Rok Mihevc  wrote:

> Thanks for all the reviews and comments! I've included the big-endian
> requirement so the proposed language is now as below.
> I'll leave the vote open until after the May holiday.
>
> Rok
>
> UUID
> 
>
> * Extension name: `arrow.uuid`.
>
> * The storage type of the extension is ``FixedSizeBinary`` with a length of
> 16 bytes.
>
> .. note::
>A specific UUID version is not required or guaranteed. This extension
> represents
>UUIDs as FixedSizeBinary(16) *with big-endian notation* and does not
> interpret the bytes in any way.
>


Re: [VOTE][Format] UUID canonical extension type

2024-04-30 Thread Rok Mihevc
Thanks for all the reviews and comments! I've included the big-endian
requirement so the proposed language is now as below.
I'll leave the vote open until after the May holiday.

Rok

UUID


* Extension name: `arrow.uuid`.

* The storage type of the extension is ``FixedSizeBinary`` with a length of
16 bytes.

.. note::
   A specific UUID version is not required or guaranteed. This extension
represents
   UUIDs as FixedSizeBinary(16) *with big-endian notation* and does not
interpret the bytes in any way.


Re: [VOTE][Format] UUID canonical extension type

2024-04-29 Thread Matt Topol
+1 (binding) pending agreement on the endianness which I agree needs to be
specified in the docs. While I lean towards big-endian as it appears most
implementations of UUID use a big-endian byte order, I don't much mind what
endianness we use as long as we explicitly specify it in the spec.

On Mon, Apr 29, 2024 at 3:30 PM Fokko Driesprong  wrote:

> +1 (non-binding)
>
> First of all, thanks Rok for working on this  I raised the mentioned
> issue on GitHub back in December 2022 and I still believe it would be a
> good addition to the spec.
>
> In Iceberg UUIDs are encoded using big endian. For example, the UUID:
> f79c3e09-677c-4bbd-a479-3f349cb785e7 is encoded as a byte array: F7 9C 3E
> 09 67 7C 4B BD A4 79 3F 34 9C B7 85 E7. Avro supported UUIDs for a long
> time as a logical type on top of a string, but now also using fixed[16]
>  which is the way to go
> <
> https://docs.google.com/document/d/16_oSWrEM7AFUCTe0uuraAEHxywezLfoEz5ahzwvhGUk/edit#heading=h.43xuauwfk7ow
> >
> and
> is also in line with the PR by Rok.
>
> Kind regards,
> Fokko
>
>
>
> Op ma 29 apr 2024 om 20:37 schreef Micah Kornfield  >:
>
> > You are correct, it looks like UUID version should be encoded properly in
> > the UUID data, I think another concern around endianess was raised which
> > should probably be resolved before the vote is finalized.
> >
> > Thanks,
> > Micah
> >
> > On Monday, April 29, 2024, Felipe Oliveira Carvalho  >
> > wrote:
> >
> > > Isn't that easily decodable from the UUID data itself?
> > >
> > > If you allow the version to be specified as metadata, you now have to
> > > validate and make sure it's consistent with the version encoded in the
> > > contents of the UUID column. And UUID versions are more of a concern
> > > for UUID generation than consumption.
> > >
> > > --
> > > Felipe
> > >
> > > On Mon, Apr 29, 2024 at 2:31 PM Micah Kornfield  >
> > > wrote:
> > > >
> > > > Apologies for the late reply, but I think being able to specify the
> > UUID
> > > > version as metadata might make sense in some cases?
> > > >
> > > > On Fri, Apr 19, 2024 at 1:22 PM Rok Mihevc 
> > wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > Following initial requests [1][2] and recent tangential ML
> discussion
> > > [3] I
> > > > > would like to propose a vote to add language for UUID canonical
> > > extension
> > > > > type to CanonicalExtensions.rst as in PR [4] and written below.
> > > > > A draft C++ and Python implementation PR can be seen here [5].
> > > > >
> > > > > [1]
> https://lists.apache.org/thread/k2zvgoq62dyqmw3mj2t6ozfzhzkjkc4j
> > > > > [2] https://github.com/apache/arrow/issues/15058
> > > > > [3]
> https://lists.apache.org/thread/8d5ldl5cb7mms21rd15lhpfrv4j9no4n
> > > > > [4] https://github.com/apache/arrow/pull/41299 <- proposed change
> > > > > [5] https://github.com/apache/arrow/pull/37298
> > > > >
> > > > >
> > > > > The vote will be open for at least 72 hours.
> > > > >
> > > > > [ ] +1 Accept this proposal
> > > > > [ ] +0
> > > > > [ ] -1 Do not accept this proposal because...
> > > > >
> > > > >
> > > > > UUID
> > > > > 
> > > > >
> > > > > * Extension name: `arrow.uuid`.
> > > > >
> > > > > * The storage type of the extension is ``FixedSizeBinary`` with a
> > > length of
> > > > > 16 bytes.
> > > > >
> > > > > .. note::
> > > > >A specific UUID version is not required or guaranteed. This
> > > extension
> > > > > represents
> > > > >UUIDs as FixedSizeBinary(16) and does not interpret the bytes in
> > any
> > > > > way.
> > > > >
> > > > >
> > > > >
> > > > > Rok
> > > > >
> > >
> >
>


Re: [VOTE][Format] UUID canonical extension type

2024-04-29 Thread Fokko Driesprong
+1 (non-binding)

First of all, thanks Rok for working on this  I raised the mentioned
issue on GitHub back in December 2022 and I still believe it would be a
good addition to the spec.

In Iceberg UUIDs are encoded using big endian. For example, the UUID:
f79c3e09-677c-4bbd-a479-3f349cb785e7 is encoded as a byte array: F7 9C 3E
09 67 7C 4B BD A4 79 3F 34 9C B7 85 E7. Avro supported UUIDs for a long
time as a logical type on top of a string, but now also using fixed[16]
 which is the way to go

and
is also in line with the PR by Rok.

Kind regards,
Fokko



Op ma 29 apr 2024 om 20:37 schreef Micah Kornfield :

> You are correct, it looks like UUID version should be encoded properly in
> the UUID data, I think another concern around endianess was raised which
> should probably be resolved before the vote is finalized.
>
> Thanks,
> Micah
>
> On Monday, April 29, 2024, Felipe Oliveira Carvalho 
> wrote:
>
> > Isn't that easily decodable from the UUID data itself?
> >
> > If you allow the version to be specified as metadata, you now have to
> > validate and make sure it's consistent with the version encoded in the
> > contents of the UUID column. And UUID versions are more of a concern
> > for UUID generation than consumption.
> >
> > --
> > Felipe
> >
> > On Mon, Apr 29, 2024 at 2:31 PM Micah Kornfield 
> > wrote:
> > >
> > > Apologies for the late reply, but I think being able to specify the
> UUID
> > > version as metadata might make sense in some cases?
> > >
> > > On Fri, Apr 19, 2024 at 1:22 PM Rok Mihevc 
> wrote:
> > >
> > > > Hi all,
> > > >
> > > > Following initial requests [1][2] and recent tangential ML discussion
> > [3] I
> > > > would like to propose a vote to add language for UUID canonical
> > extension
> > > > type to CanonicalExtensions.rst as in PR [4] and written below.
> > > > A draft C++ and Python implementation PR can be seen here [5].
> > > >
> > > > [1] https://lists.apache.org/thread/k2zvgoq62dyqmw3mj2t6ozfzhzkjkc4j
> > > > [2] https://github.com/apache/arrow/issues/15058
> > > > [3] https://lists.apache.org/thread/8d5ldl5cb7mms21rd15lhpfrv4j9no4n
> > > > [4] https://github.com/apache/arrow/pull/41299 <- proposed change
> > > > [5] https://github.com/apache/arrow/pull/37298
> > > >
> > > >
> > > > The vote will be open for at least 72 hours.
> > > >
> > > > [ ] +1 Accept this proposal
> > > > [ ] +0
> > > > [ ] -1 Do not accept this proposal because...
> > > >
> > > >
> > > > UUID
> > > > 
> > > >
> > > > * Extension name: `arrow.uuid`.
> > > >
> > > > * The storage type of the extension is ``FixedSizeBinary`` with a
> > length of
> > > > 16 bytes.
> > > >
> > > > .. note::
> > > >A specific UUID version is not required or guaranteed. This
> > extension
> > > > represents
> > > >UUIDs as FixedSizeBinary(16) and does not interpret the bytes in
> any
> > > > way.
> > > >
> > > >
> > > >
> > > > Rok
> > > >
> >
>


Re: [VOTE][Format] UUID canonical extension type

2024-04-29 Thread Micah Kornfield
You are correct, it looks like UUID version should be encoded properly in
the UUID data, I think another concern around endianess was raised which
should probably be resolved before the vote is finalized.

Thanks,
Micah

On Monday, April 29, 2024, Felipe Oliveira Carvalho 
wrote:

> Isn't that easily decodable from the UUID data itself?
>
> If you allow the version to be specified as metadata, you now have to
> validate and make sure it's consistent with the version encoded in the
> contents of the UUID column. And UUID versions are more of a concern
> for UUID generation than consumption.
>
> --
> Felipe
>
> On Mon, Apr 29, 2024 at 2:31 PM Micah Kornfield 
> wrote:
> >
> > Apologies for the late reply, but I think being able to specify the UUID
> > version as metadata might make sense in some cases?
> >
> > On Fri, Apr 19, 2024 at 1:22 PM Rok Mihevc  wrote:
> >
> > > Hi all,
> > >
> > > Following initial requests [1][2] and recent tangential ML discussion
> [3] I
> > > would like to propose a vote to add language for UUID canonical
> extension
> > > type to CanonicalExtensions.rst as in PR [4] and written below.
> > > A draft C++ and Python implementation PR can be seen here [5].
> > >
> > > [1] https://lists.apache.org/thread/k2zvgoq62dyqmw3mj2t6ozfzhzkjkc4j
> > > [2] https://github.com/apache/arrow/issues/15058
> > > [3] https://lists.apache.org/thread/8d5ldl5cb7mms21rd15lhpfrv4j9no4n
> > > [4] https://github.com/apache/arrow/pull/41299 <- proposed change
> > > [5] https://github.com/apache/arrow/pull/37298
> > >
> > >
> > > The vote will be open for at least 72 hours.
> > >
> > > [ ] +1 Accept this proposal
> > > [ ] +0
> > > [ ] -1 Do not accept this proposal because...
> > >
> > >
> > > UUID
> > > 
> > >
> > > * Extension name: `arrow.uuid`.
> > >
> > > * The storage type of the extension is ``FixedSizeBinary`` with a
> length of
> > > 16 bytes.
> > >
> > > .. note::
> > >A specific UUID version is not required or guaranteed. This
> extension
> > > represents
> > >UUIDs as FixedSizeBinary(16) and does not interpret the bytes in any
> > > way.
> > >
> > >
> > >
> > > Rok
> > >
>


Re: [VOTE][Format] UUID canonical extension type

2024-04-29 Thread Felipe Oliveira Carvalho
Isn't that easily decodable from the UUID data itself?

If you allow the version to be specified as metadata, you now have to
validate and make sure it's consistent with the version encoded in the
contents of the UUID column. And UUID versions are more of a concern
for UUID generation than consumption.

--
Felipe

On Mon, Apr 29, 2024 at 2:31 PM Micah Kornfield  wrote:
>
> Apologies for the late reply, but I think being able to specify the UUID
> version as metadata might make sense in some cases?
>
> On Fri, Apr 19, 2024 at 1:22 PM Rok Mihevc  wrote:
>
> > Hi all,
> >
> > Following initial requests [1][2] and recent tangential ML discussion [3] I
> > would like to propose a vote to add language for UUID canonical extension
> > type to CanonicalExtensions.rst as in PR [4] and written below.
> > A draft C++ and Python implementation PR can be seen here [5].
> >
> > [1] https://lists.apache.org/thread/k2zvgoq62dyqmw3mj2t6ozfzhzkjkc4j
> > [2] https://github.com/apache/arrow/issues/15058
> > [3] https://lists.apache.org/thread/8d5ldl5cb7mms21rd15lhpfrv4j9no4n
> > [4] https://github.com/apache/arrow/pull/41299 <- proposed change
> > [5] https://github.com/apache/arrow/pull/37298
> >
> >
> > The vote will be open for at least 72 hours.
> >
> > [ ] +1 Accept this proposal
> > [ ] +0
> > [ ] -1 Do not accept this proposal because...
> >
> >
> > UUID
> > 
> >
> > * Extension name: `arrow.uuid`.
> >
> > * The storage type of the extension is ``FixedSizeBinary`` with a length of
> > 16 bytes.
> >
> > .. note::
> >A specific UUID version is not required or guaranteed. This extension
> > represents
> >UUIDs as FixedSizeBinary(16) and does not interpret the bytes in any
> > way.
> >
> >
> >
> > Rok
> >


Re: [VOTE][Format] UUID canonical extension type

2024-04-29 Thread Micah Kornfield
Apologies for the late reply, but I think being able to specify the UUID
version as metadata might make sense in some cases?

On Fri, Apr 19, 2024 at 1:22 PM Rok Mihevc  wrote:

> Hi all,
>
> Following initial requests [1][2] and recent tangential ML discussion [3] I
> would like to propose a vote to add language for UUID canonical extension
> type to CanonicalExtensions.rst as in PR [4] and written below.
> A draft C++ and Python implementation PR can be seen here [5].
>
> [1] https://lists.apache.org/thread/k2zvgoq62dyqmw3mj2t6ozfzhzkjkc4j
> [2] https://github.com/apache/arrow/issues/15058
> [3] https://lists.apache.org/thread/8d5ldl5cb7mms21rd15lhpfrv4j9no4n
> [4] https://github.com/apache/arrow/pull/41299 <- proposed change
> [5] https://github.com/apache/arrow/pull/37298
>
>
> The vote will be open for at least 72 hours.
>
> [ ] +1 Accept this proposal
> [ ] +0
> [ ] -1 Do not accept this proposal because...
>
>
> UUID
> 
>
> * Extension name: `arrow.uuid`.
>
> * The storage type of the extension is ``FixedSizeBinary`` with a length of
> 16 bytes.
>
> .. note::
>A specific UUID version is not required or guaranteed. This extension
> represents
>UUIDs as FixedSizeBinary(16) and does not interpret the bytes in any
> way.
>
>
>
> Rok
>


[VOTE][Format] UUID canonical extension type

2024-04-19 Thread Rok Mihevc
Hi all,

Following initial requests [1][2] and recent tangential ML discussion [3] I
would like to propose a vote to add language for UUID canonical extension
type to CanonicalExtensions.rst as in PR [4] and written below.
A draft C++ and Python implementation PR can be seen here [5].

[1] https://lists.apache.org/thread/k2zvgoq62dyqmw3mj2t6ozfzhzkjkc4j
[2] https://github.com/apache/arrow/issues/15058
[3] https://lists.apache.org/thread/8d5ldl5cb7mms21rd15lhpfrv4j9no4n
[4] https://github.com/apache/arrow/pull/41299 <- proposed change
[5] https://github.com/apache/arrow/pull/37298


The vote will be open for at least 72 hours.

[ ] +1 Accept this proposal
[ ] +0
[ ] -1 Do not accept this proposal because...


UUID


* Extension name: `arrow.uuid`.

* The storage type of the extension is ``FixedSizeBinary`` with a length of
16 bytes.

.. note::
   A specific UUID version is not required or guaranteed. This extension
represents
   UUIDs as FixedSizeBinary(16) and does not interpret the bytes in any way.



Rok