Re: [VOTE][Format] JSON canonical extension type

2024-05-07 Thread Rok Mihevc
>
> I spoke to the DuckDB maintainers about this. DuckDB has a JSON extension
> which defines a JSON column type. They intend to have DuckDB's Arrow
> integrations recognize this arrow.json extension name on input and set it
> on output.
>

That's great to hear! Thanks for checking with DuckDB Ian.

Rok


Re: [VOTE][Format] JSON canonical extension type

2024-05-07 Thread Ian Cook
Thanks Rok and Pradeep for your work to advance this proposal.

I spoke to the DuckDB maintainers about this. DuckDB has a JSON extension
which defines a JSON column type. They intend to have DuckDB's Arrow
integrations recognize this arrow.json extension name on input and set it
on output.

Ian

On Tue, May 7, 2024 at 8:21 AM Rok Mihevc  wrote:

> Hi all,
>
> With 9 +1 votes (4 binding, 5 non-binding) and 0 -1 votes the proposal is
> approved as shown below and in the PR [1].
> Thank you everyone who voted and helped shape this proposal. Once the
> language is merged we'll proceed with work on the C++ implementation PR
> [2].
>
> [1] https://github.com/apache/arrow/pull/41257
> [2] https://github.com/apache/arrow/pull/13901
>
> Rok
> ---
>
> JSON
> 
>
> * Extension name: `arrow.json`.
>
> * The storage type of this extension is ``StringArray`` or
>   or ``LargeStringArray`` or ``StringViewArray``.
>   Only UTF-8 encoded JSON as specified in `rfc8259`_ is supported.
>
> * Extension type parameters:
>
>   This type does not have any parameters.
>
> * Description of the serialization:
>
>   Metadata is either an empty string or a JSON string with an empty object.
>   In the future, additional fields may be added, but they are not required
>   to interpret the array.
>


Re: [VOTE][Format] JSON canonical extension type

2024-05-07 Thread Rok Mihevc
Hi all,

With 9 +1 votes (4 binding, 5 non-binding) and 0 -1 votes the proposal is
approved as shown below and in the PR [1].
Thank you everyone who voted and helped shape this proposal. Once the
language is merged we'll proceed with work on the C++ implementation PR [2].

[1] https://github.com/apache/arrow/pull/41257
[2] https://github.com/apache/arrow/pull/13901

Rok
---

JSON


* Extension name: `arrow.json`.

* The storage type of this extension is ``StringArray`` or
  or ``LargeStringArray`` or ``StringViewArray``.
  Only UTF-8 encoded JSON as specified in `rfc8259`_ is supported.

* Extension type parameters:

  This type does not have any parameters.

* Description of the serialization:

  Metadata is either an empty string or a JSON string with an empty object.
  In the future, additional fields may be added, but they are not required
  to interpret the array.


Re: [VOTE][Format] JSON canonical extension type

2024-05-07 Thread Rok Mihevc
+1 (non-binding)

On Mon, May 6, 2024 at 12:14 PM Wes McKinney  wrote:

> +1
>
> On Tue, Apr 30, 2024 at 4:03 PM Antoine Pitrou  wrote:
>
> > +1 (binding) for the current proposal, i.e. with the RFC 8289
> > requirement and the 3 current String types allowed.
> >
> > Regards
> >
> > Antoine.
> >
> >
> > Le 30/04/2024 à 19:26, Rok Mihevc a écrit :
> > > Hi all, thanks for the votes and comments so far.
> > > I've amended [1] the proposed language with the RFC-8259 requirement as
> > it
> > > seems to be almost unanimously requested. New language is below.
> > > To Micah's comment regarding rejecting Binary arrays [2] - please
> discuss
> > > in the PR.
> > >
> > > Let's leave the vote open until after the May holiday.
> > >
> > > Rok
> > >
> > > [1]
> > >
> >
> https://github.com/apache/arrow/pull/41257/commits/594945010e3b7d393b411aad971743ffcdbdbc8e
> > > [2] https://github.com/apache/arrow/pull/41257#discussion_r1583441040
> > >
> > >
> > > JSON
> > > 
> > >
> > > * Extension name: `arrow.json`.
> > >
> > > * The storage type of this extension is ``StringArray`` or
> > >or ``LargeStringArray`` or ``StringViewArray``.
> > >*Only UTF-8 encoded JSON as specified in `rfc8259`_ is supported.*
> > >
> > > * Extension type parameters:
> > >
> > >This type does not have any parameters.
> > >
> > > * Description of the serialization:
> > >
> > >Metadata is either an empty string or a JSON string with an empty
> > object.
> > >In the future, additional fields may be added, but they are not
> > required
> > >to interpret the array.
> > >
> >
>


Re: [VOTE][Format] JSON canonical extension type

2024-05-06 Thread Wes McKinney
+1

On Tue, Apr 30, 2024 at 4:03 PM Antoine Pitrou  wrote:

> +1 (binding) for the current proposal, i.e. with the RFC 8289
> requirement and the 3 current String types allowed.
>
> Regards
>
> Antoine.
>
>
> Le 30/04/2024 à 19:26, Rok Mihevc a écrit :
> > Hi all, thanks for the votes and comments so far.
> > I've amended [1] the proposed language with the RFC-8259 requirement as
> it
> > seems to be almost unanimously requested. New language is below.
> > To Micah's comment regarding rejecting Binary arrays [2] - please discuss
> > in the PR.
> >
> > Let's leave the vote open until after the May holiday.
> >
> > Rok
> >
> > [1]
> >
> https://github.com/apache/arrow/pull/41257/commits/594945010e3b7d393b411aad971743ffcdbdbc8e
> > [2] https://github.com/apache/arrow/pull/41257#discussion_r1583441040
> >
> >
> > JSON
> > 
> >
> > * Extension name: `arrow.json`.
> >
> > * The storage type of this extension is ``StringArray`` or
> >or ``LargeStringArray`` or ``StringViewArray``.
> >*Only UTF-8 encoded JSON as specified in `rfc8259`_ is supported.*
> >
> > * Extension type parameters:
> >
> >This type does not have any parameters.
> >
> > * Description of the serialization:
> >
> >Metadata is either an empty string or a JSON string with an empty
> object.
> >In the future, additional fields may be added, but they are not
> required
> >to interpret the array.
> >
>


Re: [VOTE][Format] JSON canonical extension type

2024-04-30 Thread Antoine Pitrou
+1 (binding) for the current proposal, i.e. with the RFC 8289 
requirement and the 3 current String types allowed.


Regards

Antoine.


Le 30/04/2024 à 19:26, Rok Mihevc a écrit :

Hi all, thanks for the votes and comments so far.
I've amended [1] the proposed language with the RFC-8259 requirement as it
seems to be almost unanimously requested. New language is below.
To Micah's comment regarding rejecting Binary arrays [2] - please discuss
in the PR.

Let's leave the vote open until after the May holiday.

Rok

[1]
https://github.com/apache/arrow/pull/41257/commits/594945010e3b7d393b411aad971743ffcdbdbc8e
[2] https://github.com/apache/arrow/pull/41257#discussion_r1583441040


JSON


* Extension name: `arrow.json`.

* The storage type of this extension is ``StringArray`` or
   or ``LargeStringArray`` or ``StringViewArray``.
   *Only UTF-8 encoded JSON as specified in `rfc8259`_ is supported.*

* Extension type parameters:

   This type does not have any parameters.

* Description of the serialization:

   Metadata is either an empty string or a JSON string with an empty object.
   In the future, additional fields may be added, but they are not required
   to interpret the array.



Re: [VOTE][Format] JSON canonical extension type

2024-04-30 Thread Jacob Wujciak
+1 (non-binding) Thanks for moving these two forward Rok!

Am Di., 30. Apr. 2024 um 19:26 Uhr schrieb Rok Mihevc :

> Hi all, thanks for the votes and comments so far.
> I've amended [1] the proposed language with the RFC-8259 requirement as it
> seems to be almost unanimously requested. New language is below.
> To Micah's comment regarding rejecting Binary arrays [2] - please discuss
> in the PR.
>
> Let's leave the vote open until after the May holiday.
>
> Rok
>
> [1]
>
> https://github.com/apache/arrow/pull/41257/commits/594945010e3b7d393b411aad971743ffcdbdbc8e
> [2] https://github.com/apache/arrow/pull/41257#discussion_r1583441040
>
>
> JSON
> 
>
> * Extension name: `arrow.json`.
>
> * The storage type of this extension is ``StringArray`` or
>   or ``LargeStringArray`` or ``StringViewArray``.
>   *Only UTF-8 encoded JSON as specified in `rfc8259`_ is supported.*
>
> * Extension type parameters:
>
>   This type does not have any parameters.
>
> * Description of the serialization:
>
>   Metadata is either an empty string or a JSON string with an empty object.
>   In the future, additional fields may be added, but they are not required
>   to interpret the array.
>


Re: [VOTE][Format] JSON canonical extension type

2024-04-30 Thread Rok Mihevc
Hi all, thanks for the votes and comments so far.
I've amended [1] the proposed language with the RFC-8259 requirement as it
seems to be almost unanimously requested. New language is below.
To Micah's comment regarding rejecting Binary arrays [2] - please discuss
in the PR.

Let's leave the vote open until after the May holiday.

Rok

[1]
https://github.com/apache/arrow/pull/41257/commits/594945010e3b7d393b411aad971743ffcdbdbc8e
[2] https://github.com/apache/arrow/pull/41257#discussion_r1583441040


JSON


* Extension name: `arrow.json`.

* The storage type of this extension is ``StringArray`` or
  or ``LargeStringArray`` or ``StringViewArray``.
  *Only UTF-8 encoded JSON as specified in `rfc8259`_ is supported.*

* Extension type parameters:

  This type does not have any parameters.

* Description of the serialization:

  Metadata is either an empty string or a JSON string with an empty object.
  In the future, additional fields may be added, but they are not required
  to interpret the array.


Re: [VOTE][Format] JSON canonical extension type

2024-04-30 Thread Weston Pace
+1 (binding)

I agree we should be explicit about RFC-8259

On Mon, Apr 29, 2024 at 4:46 PM David Li  wrote:

> +1 (binding)
>
> assuming we explicitly state RFC-8259
>
> On Tue, Apr 30, 2024, at 08:02, Matt Topol wrote:
> > +1 (binding)
> >
> > On Mon, Apr 29, 2024 at 5:36 PM Ian Cook  wrote:
> >
> >> +1 (non-binding)
> >>
> >> I added a comment in the PR suggesting that we explicitly refer to
> RFC-8259
> >> in CanonicalExtensions.rst.
> >>
> >> On Mon, Apr 29, 2024 at 1:21 PM Micah Kornfield 
> >> wrote:
> >>
> >> > +1, I added a comment to the PR because I think we should recommend
> >> > implementations specifically reject parsing Binary arrays with the
> >> > annotation in-case we want to support non-UTF8 encodings in the future
> >> > (even thought IIRC these aren't really JSON spec compliant).
> >> >
> >> > On Fri, Apr 19, 2024 at 1:24 PM Rok Mihevc 
> wrote:
> >> >
> >> > > Hi all,
> >> > >
> >> > > Following discussions [1][2] and preliminary implementation work (by
> >> > > Pradeep Gollakota) [3] I would like to propose a vote to add
> language
> >> for
> >> > > JSON canonical extension type to CanonicalExtensions.rst as in PR
> [4]
> >> and
> >> > > written below.
> >> > > A draft C++ implementation PR can be seen here [3].
> >> > >
> >> > > [1]
> https://lists.apache.org/thread/p3353oz6lk846pnoq6vk638tjqz2hm1j
> >> > > [2]
> https://lists.apache.org/thread/7xph3476g9rhl9mtqvn804fqf5z8yoo1
> >> > > [3] https://github.com/apache/arrow/pull/13901
> >> > > [4] https://github.com/apache/arrow/pull/41257 <- proposed change
> >> > >
> >> > >
> >> > > The vote will be open for at least 72 hours.
> >> > >
> >> > > [ ] +1 Accept this proposal
> >> > > [ ] +0
> >> > > [ ] -1 Do not accept this proposal because...
> >> > >
> >> > >
> >> > > JSON
> >> > > 
> >> > >
> >> > > * Extension name: `arrow.json`.
> >> > >
> >> > > * The storage type of this extension is ``StringArray`` or
> >> > >   or ``LargeStringArray`` or ``StringViewArray``.
> >> > >   Only UTF-8 encoded JSON is supported.
> >> > >
> >> > > * Extension type parameters:
> >> > >
> >> > >   This type does not have any parameters.
> >> > >
> >> > > * Description of the serialization:
> >> > >
> >> > >   Metadata is either an empty string or a JSON string with an empty
> >> > object.
> >> > >   In the future, additional fields may be added, but they are not
> >> > required
> >> > >   to interpret the array.
> >> > >
> >> > >
> >> > >
> >> > > Rok
> >> > >
> >> >
> >>
>


Re: [VOTE][Format] JSON canonical extension type

2024-04-29 Thread David Li
+1 (binding)

assuming we explicitly state RFC-8259

On Tue, Apr 30, 2024, at 08:02, Matt Topol wrote:
> +1 (binding)
>
> On Mon, Apr 29, 2024 at 5:36 PM Ian Cook  wrote:
>
>> +1 (non-binding)
>>
>> I added a comment in the PR suggesting that we explicitly refer to RFC-8259
>> in CanonicalExtensions.rst.
>>
>> On Mon, Apr 29, 2024 at 1:21 PM Micah Kornfield 
>> wrote:
>>
>> > +1, I added a comment to the PR because I think we should recommend
>> > implementations specifically reject parsing Binary arrays with the
>> > annotation in-case we want to support non-UTF8 encodings in the future
>> > (even thought IIRC these aren't really JSON spec compliant).
>> >
>> > On Fri, Apr 19, 2024 at 1:24 PM Rok Mihevc  wrote:
>> >
>> > > Hi all,
>> > >
>> > > Following discussions [1][2] and preliminary implementation work (by
>> > > Pradeep Gollakota) [3] I would like to propose a vote to add language
>> for
>> > > JSON canonical extension type to CanonicalExtensions.rst as in PR [4]
>> and
>> > > written below.
>> > > A draft C++ implementation PR can be seen here [3].
>> > >
>> > > [1] https://lists.apache.org/thread/p3353oz6lk846pnoq6vk638tjqz2hm1j
>> > > [2] https://lists.apache.org/thread/7xph3476g9rhl9mtqvn804fqf5z8yoo1
>> > > [3] https://github.com/apache/arrow/pull/13901
>> > > [4] https://github.com/apache/arrow/pull/41257 <- proposed change
>> > >
>> > >
>> > > The vote will be open for at least 72 hours.
>> > >
>> > > [ ] +1 Accept this proposal
>> > > [ ] +0
>> > > [ ] -1 Do not accept this proposal because...
>> > >
>> > >
>> > > JSON
>> > > 
>> > >
>> > > * Extension name: `arrow.json`.
>> > >
>> > > * The storage type of this extension is ``StringArray`` or
>> > >   or ``LargeStringArray`` or ``StringViewArray``.
>> > >   Only UTF-8 encoded JSON is supported.
>> > >
>> > > * Extension type parameters:
>> > >
>> > >   This type does not have any parameters.
>> > >
>> > > * Description of the serialization:
>> > >
>> > >   Metadata is either an empty string or a JSON string with an empty
>> > object.
>> > >   In the future, additional fields may be added, but they are not
>> > required
>> > >   to interpret the array.
>> > >
>> > >
>> > >
>> > > Rok
>> > >
>> >
>>


Re: [VOTE][Format] JSON canonical extension type

2024-04-29 Thread Matt Topol
+1 (binding)

On Mon, Apr 29, 2024 at 5:36 PM Ian Cook  wrote:

> +1 (non-binding)
>
> I added a comment in the PR suggesting that we explicitly refer to RFC-8259
> in CanonicalExtensions.rst.
>
> On Mon, Apr 29, 2024 at 1:21 PM Micah Kornfield 
> wrote:
>
> > +1, I added a comment to the PR because I think we should recommend
> > implementations specifically reject parsing Binary arrays with the
> > annotation in-case we want to support non-UTF8 encodings in the future
> > (even thought IIRC these aren't really JSON spec compliant).
> >
> > On Fri, Apr 19, 2024 at 1:24 PM Rok Mihevc  wrote:
> >
> > > Hi all,
> > >
> > > Following discussions [1][2] and preliminary implementation work (by
> > > Pradeep Gollakota) [3] I would like to propose a vote to add language
> for
> > > JSON canonical extension type to CanonicalExtensions.rst as in PR [4]
> and
> > > written below.
> > > A draft C++ implementation PR can be seen here [3].
> > >
> > > [1] https://lists.apache.org/thread/p3353oz6lk846pnoq6vk638tjqz2hm1j
> > > [2] https://lists.apache.org/thread/7xph3476g9rhl9mtqvn804fqf5z8yoo1
> > > [3] https://github.com/apache/arrow/pull/13901
> > > [4] https://github.com/apache/arrow/pull/41257 <- proposed change
> > >
> > >
> > > The vote will be open for at least 72 hours.
> > >
> > > [ ] +1 Accept this proposal
> > > [ ] +0
> > > [ ] -1 Do not accept this proposal because...
> > >
> > >
> > > JSON
> > > 
> > >
> > > * Extension name: `arrow.json`.
> > >
> > > * The storage type of this extension is ``StringArray`` or
> > >   or ``LargeStringArray`` or ``StringViewArray``.
> > >   Only UTF-8 encoded JSON is supported.
> > >
> > > * Extension type parameters:
> > >
> > >   This type does not have any parameters.
> > >
> > > * Description of the serialization:
> > >
> > >   Metadata is either an empty string or a JSON string with an empty
> > object.
> > >   In the future, additional fields may be added, but they are not
> > required
> > >   to interpret the array.
> > >
> > >
> > >
> > > Rok
> > >
> >
>


Re: [VOTE][Format] JSON canonical extension type

2024-04-29 Thread Ian Cook
+1 (non-binding)

I added a comment in the PR suggesting that we explicitly refer to RFC-8259
in CanonicalExtensions.rst.

On Mon, Apr 29, 2024 at 1:21 PM Micah Kornfield 
wrote:

> +1, I added a comment to the PR because I think we should recommend
> implementations specifically reject parsing Binary arrays with the
> annotation in-case we want to support non-UTF8 encodings in the future
> (even thought IIRC these aren't really JSON spec compliant).
>
> On Fri, Apr 19, 2024 at 1:24 PM Rok Mihevc  wrote:
>
> > Hi all,
> >
> > Following discussions [1][2] and preliminary implementation work (by
> > Pradeep Gollakota) [3] I would like to propose a vote to add language for
> > JSON canonical extension type to CanonicalExtensions.rst as in PR [4] and
> > written below.
> > A draft C++ implementation PR can be seen here [3].
> >
> > [1] https://lists.apache.org/thread/p3353oz6lk846pnoq6vk638tjqz2hm1j
> > [2] https://lists.apache.org/thread/7xph3476g9rhl9mtqvn804fqf5z8yoo1
> > [3] https://github.com/apache/arrow/pull/13901
> > [4] https://github.com/apache/arrow/pull/41257 <- proposed change
> >
> >
> > The vote will be open for at least 72 hours.
> >
> > [ ] +1 Accept this proposal
> > [ ] +0
> > [ ] -1 Do not accept this proposal because...
> >
> >
> > JSON
> > 
> >
> > * Extension name: `arrow.json`.
> >
> > * The storage type of this extension is ``StringArray`` or
> >   or ``LargeStringArray`` or ``StringViewArray``.
> >   Only UTF-8 encoded JSON is supported.
> >
> > * Extension type parameters:
> >
> >   This type does not have any parameters.
> >
> > * Description of the serialization:
> >
> >   Metadata is either an empty string or a JSON string with an empty
> object.
> >   In the future, additional fields may be added, but they are not
> required
> >   to interpret the array.
> >
> >
> >
> > Rok
> >
>


Re: [VOTE][Format] JSON canonical extension type

2024-04-29 Thread Micah Kornfield
+1, I added a comment to the PR because I think we should recommend
implementations specifically reject parsing Binary arrays with the
annotation in-case we want to support non-UTF8 encodings in the future
(even thought IIRC these aren't really JSON spec compliant).

On Fri, Apr 19, 2024 at 1:24 PM Rok Mihevc  wrote:

> Hi all,
>
> Following discussions [1][2] and preliminary implementation work (by
> Pradeep Gollakota) [3] I would like to propose a vote to add language for
> JSON canonical extension type to CanonicalExtensions.rst as in PR [4] and
> written below.
> A draft C++ implementation PR can be seen here [3].
>
> [1] https://lists.apache.org/thread/p3353oz6lk846pnoq6vk638tjqz2hm1j
> [2] https://lists.apache.org/thread/7xph3476g9rhl9mtqvn804fqf5z8yoo1
> [3] https://github.com/apache/arrow/pull/13901
> [4] https://github.com/apache/arrow/pull/41257 <- proposed change
>
>
> The vote will be open for at least 72 hours.
>
> [ ] +1 Accept this proposal
> [ ] +0
> [ ] -1 Do not accept this proposal because...
>
>
> JSON
> 
>
> * Extension name: `arrow.json`.
>
> * The storage type of this extension is ``StringArray`` or
>   or ``LargeStringArray`` or ``StringViewArray``.
>   Only UTF-8 encoded JSON is supported.
>
> * Extension type parameters:
>
>   This type does not have any parameters.
>
> * Description of the serialization:
>
>   Metadata is either an empty string or a JSON string with an empty object.
>   In the future, additional fields may be added, but they are not required
>   to interpret the array.
>
>
>
> Rok
>