Re: [VOTE] Release Apache Arrow 10.0.0 - RC0

2022-10-24 Thread Neville Dipale
Is there anything I can do on my side to fix this?

On Mon, 24 Oct 2022 at 07:25, Sutou Kouhei  wrote:

> Hi,
>
> Neville's PGP key uses EDDSA and gpg on CentOS 7 is old to
> process EDDSA PGP key. This RC is signed by my non-EDDSA PGP
> key. So ignore the error from "gpg --import" on CentOS 7.
>
> FYI: Some PGP keys in KEYS are removed for our RPM packages
> to work with old gpg:
>
> https://github.com/apache/arrow/blob/master/dev/tasks/linux-packages/apache-arrow-release/Rakefile#L44-L81
>
> Thanks,
> --
> kou
>
> In 
>   "Re: [VOTE] Release Apache Arrow 10.0.0 - RC0" on Sun, 23 Oct 2022
> 10:31:27 +0300,
>   Benson Muite  wrote:
>
> > WIP but source verification fails for me on CentOS 7 due to unsigned
> > key from Neville Dipale:
> >
> > TEST_DEFAULT=0 TEST_SOURCE=1 dev/release/verify-release-candidate.sh
> > 10.0.0 0
> > 
> > gpg: key 717D3FB2: no valid user IDs
> > gpg: this may be caused by a missing self-signature
> > ...
> > gpg: Total number processed: 14
> > gpg:   w/o user IDs: 1
> > gpg:  unchanged: 13
> > Failed to verify release candidate. See /tmp/arrow-10.0.0.gOoKw for
> > details.
> >
> > On 10/22/22 22:32, David Li wrote:
> >> Still WIP for me. Verified:
> >> - C++, Python, Java, binaries on Ubuntu Linux 18.04/AMD64
> >> - C++, Python, Java on MacOS 12.3/AArch64
> >> * MacOS required Rosetta installed to generate Protobuf sources for Java
> >> * I needed https://github.com/apache/arrow/pull/14477 to verify APT
> >> * packages on Linux
> >> * I needed https://github.com/apache/arrow/pull/14479 to verify native
> >> * wheels on MacOS
> >> I cannot verify universal2 wheels on MacOS as the binaries are for
> >> macosx_10_14 but the script hardcodes macosx_11_0. And if I edit the
> >> filename in the script, I get "...macosx_10_14_universal2.whl is not a
> >> supported wheel on this platform". Is this intended?
> >> On Fri, Oct 21, 2022, at 14:01, Jacob Wujciak wrote:
> >>> +1 (non-binding) verified on Manjaro with CUDA:
> >>>
> >>> TEST_DEFAULT=0 \
> >>>TEST_SOURCE=0 \
> >>>TEST_INTEGRATION_CPP=1 \
> >>>TEST_CPP=1 \
> >>>TEST_PYTHON=1 \
> >>>dev/release/verify-release-candidate.sh 10.0.0 0
> >>>
> >>> TEST_DEFAULT=0 \
> >>>TEST_SOURCE=0 \
> >>>TEST_BINARY=1 \
> >>>dev/release/verify-release-candidate.sh 10.0.0 0
> >>>
> >>> with:
> >>>gcc 12.2.2
> >>>cuda_11.7.r11.7/compiler.31442593_0
> >>>python 3.10.7
> >>>
> >>> Thanks!
> >>>
> >>> On Fri, Oct 21, 2022 at 8:07 AM Sutou Kouhei 
> >>> wrote:
> >>>
> >>>> Hi,
> >>>>
> >>>> I would like to propose the following release candidate (RC0) of
> >>>> Apache
> >>>> Arrow version 10.0.0. This is a release consisting of 470
> >>>> resolved JIRA issues[1].
> >>>>
> >>>> This release candidate is based on commit:
> >>>> 89f9a0948961f6e94f1ef5e4f310b707d22a3c11 [2]
> >>>>
> >>>> The source release rc0 is hosted at [3].
> >>>> The binary artifacts are hosted at [4][5][6][7][8][9][10][11].
> >>>> The changelog is located at [12].
> >>>>
> >>>> Please download, verify checksums and signatures, run the unit tests,
> >>>> and vote on the release. See [13] for how to validate a release
> >>>> candidate.
> >>>>
> >>>> See also a verification result on GitHub pull request [14].
> >>>>
> >>>> The vote will be open for at least 72 hours.
> >>>>
> >>>> [ ] +1 Release this as Apache Arrow 10.0.0
> >>>> [ ] +0
> >>>> [ ] -1 Do not release this as Apache Arrow 10.0.0 because...
> >>>>
> >>>> [1]:
> >>>>
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20status%20in%20%28Resolved%2C%20Closed%29%20AND%20fixVersion%20%3D%2010.0.0
> >>>> [2]:
> >>>>
> https://github.com/apache/arrow/tree/89f9a0948961f6e94f1ef5e4f310b707d22a3c11
> >>>> [3]:
> >>>> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-10.0.0-rc0
> >>>> [4]: https://apache.jfrog.io/artifactory/arrow/almalinux-rc/
> >>>> [5]: https://apache.jfrog.io/artifactory/arrow/amazon-linux-rc/
> >>>> [6]: https://apache.jfrog.io/artifactory/arrow/centos-rc/
> >>>> [7]: https://apache.jfrog.io/artifactory/arrow/debian-rc/
> >>>> [8]: https://apache.jfrog.io/artifactory/arrow/java-rc/10.0.0-rc0
> >>>> [9]: https://apache.jfrog.io/artifactory/arrow/nuget-rc/10.0.0-rc0
> >>>> [10]: https://apache.jfrog.io/artifactory/arrow/python-rc/10.0.0-rc0
> >>>> [11]: https://apache.jfrog.io/artifactory/arrow/ubuntu-rc/
> >>>> [12]:
> >>>>
> https://github.com/apache/arrow/blob/89f9a0948961f6e94f1ef5e4f310b707d22a3c11/CHANGELOG.md
> >>>> [13]:
> >>>>
> https://cwiki.apache.org/confluence/display/ARROW/How+to+Verify+Release+Candidates
> >>>> [14]: https://github.com/apache/arrow/pull/14466
> >>>>
> >
>


Re: [VOTE][RUST] Release Apache Arrow Rust 25.0.0 RC1

2022-10-16 Thread Neville Dipale
+1 (binding)

Verified on Ubuntu (WSL2)

On Sun, 16 Oct 2022 at 13:50, Remzi Yang <1371656737...@gmail.com> wrote:

> +1 (non-binding) Verified on M1 Mac.
>
> Thank you Andrew.
>
> On Sun, 16 Oct 2022 at 11:18, L. C. Hsieh  wrote:
>
> > +1 (binding)
> >
> > Verified on Intel Mac.
> >
> > Thanks Andrew!
> >
> >
> > On Sat, Oct 15, 2022 at 1:34 PM Andy Grove 
> wrote:
> > >
> > > I tried this again, but this time ran the verify script from the repo
> > > rather than the one in the tarball, and that worked fine (although I do
> > not
> > > understand why).
> > >
> > > +1 (binding)
> > >
> > > Thanks, Andrew!
> > >
> > > On Sat, Oct 15, 2022 at 2:13 PM Raphael Taylor-Davies <
> > tustv...@apache.org>
> > > wrote:
> > >
> > > > +1 (binding)
> > > >
> > > > On 2022/10/14 21:30:20 Andrew Lamb wrote:
> > > > > Hi,
> > > > >
> > > > > I would like to propose a release of Apache Arrow Rust
> > Implementation,
> > > > > version 25.0.0.
> > > > >
> > > > > This release candidate is based on commit:
> > > > > 1eb19b5394b84eaa0dbb24f65e74018defb3332b [1]
> > > > >
> > > > > The proposed release tarball and signatures are hosted at [2].
> > > > >
> > > > > The changelog is located at [3].
> > > > >
> > > > > Please download, verify checksums and signatures, run the unit
> tests,
> > > > > and vote on the release. There is a script [4] that automates some
> of
> > > > > the verification.
> > > > >
> > > > > The vote will be open for at least 72 hours.
> > > > >
> > > > > [ ] +1 Release this as Apache Arrow Rust
> > > > > [ ] +0
> > > > > [ ] -1 Do not release this as Apache Arrow Rust  because...
> > > > >
> > > > > [1]:
> > > > >
> > > >
> >
> https://github.com/apache/arrow-rs/tree/1eb19b5394b84eaa0dbb24f65e74018defb3332b
> > > > > [2]:
> > > >
> > https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-rs-25.0.0-rc1
> > > > > [3]:
> > > > >
> > > >
> >
> https://github.com/apache/arrow-rs/blob/1eb19b5394b84eaa0dbb24f65e74018defb3332b/CHANGELOG.md
> > > > > [4]:
> > > > >
> > > >
> >
> https://github.com/apache/arrow-rs/blob/master/dev/release/verify-release-candidate.sh
> > > > >
> > > >
> >
>


Re: [VOTE][RUST][DataFusion] Release Apache Arrow DataFusion 10.0.0 RC1

2022-07-15 Thread Neville Dipale
I've logged https://github.com/apache/arrow-datafusion/issues/2916

On Fri, 15 Jul 2022 at 08:34, Neville Dipale  wrote:

> +1 (binding)
>
> There are 3 test failures because the hardcoded strings expect
> "ARROW_TEST_DATA" but we get "privateARROW_TEST_DATA"
> when verifying the RC.
>
> An example can be seen on this diff https://www.diffchecker.com/B8i8w4GC.
>
> My conclusion from looking at it is that it's likely a quirk that we can
> fix, but shouldn't affect the release.
>
> On Wed, 13 Jul 2022 at 23:48, QP Hou  wrote:
>
>> +1 (binding)
>>
>> On Wed, Jul 13, 2022 at 1:24 PM Andrew Lamb  wrote:
>> >
>> > +1 (binding)
>> >
>> > Thanks Andy! I know releases are a significant amount of work.
>> >
>> > Andrew
>> >
>> > On Tue, Jul 12, 2022 at 11:45 AM Andy Grove 
>> wrote:
>> >
>> > > Hi,
>> > >
>> > > I would like to propose a release of Apache Arrow DataFusion
>> > > Implementation,
>> > > version 10.0.0.
>> > >
>> > > This release candidate is based on commit:
>> > > d25e822c1ef85ee7c0297b4b38d05a51b0d2e46f [1]
>> > > The proposed release tarball and signatures are hosted at [2].
>> > > The changelog is located at [3].
>> > >
>> > > Please download, verify checksums and signatures, run the unit tests,
>> and
>> > > vote
>> > > on the release. The vote will be open for at least 72 hours.
>> > >
>> > > Only votes from PMC members are binding, but all members of the
>> community
>> > > are
>> > > encouraged to test the release and vote with "(non-binding)".
>> > >
>> > > The standard verification procedure is documented at
>> > >
>> > >
>> https://github.com/apache/arrow-datafusion/blob/master/dev/release/README.md#verifying-release-candidates
>> > > .
>> > >
>> > > [ ] +1 Release this as Apache Arrow DataFusion 10.0.0
>> > > [ ] +0
>> > > [ ] -1 Do not release this as Apache Arrow DataFusion 10.0.0
>> because...
>> > >
>> > > Here is my vote:
>> > >
>> > > +1
>> > >
>> > > [1]:
>> > >
>> > >
>> https://github.com/apache/arrow-datafusion/tree/d25e822c1ef85ee7c0297b4b38d05a51b0d2e46f
>> > > [2]:
>> > >
>> > >
>> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-datafusion-10.0.0-rc1
>> > > [3]:
>> > >
>> > >
>> https://github.com/apache/arrow-datafusion/blob/d25e822c1ef85ee7c0297b4b38d05a51b0d2e46f/CHANGELOG.md
>> > >
>>
>


Re: [VOTE][RUST][DataFusion] Release Apache Arrow DataFusion 10.0.0 RC1

2022-07-15 Thread Neville Dipale
+1 (binding)

There are 3 test failures because the hardcoded strings expect
"ARROW_TEST_DATA" but we get "privateARROW_TEST_DATA"
when verifying the RC.

An example can be seen on this diff https://www.diffchecker.com/B8i8w4GC.

My conclusion from looking at it is that it's likely a quirk that we can
fix, but shouldn't affect the release.

On Wed, 13 Jul 2022 at 23:48, QP Hou  wrote:

> +1 (binding)
>
> On Wed, Jul 13, 2022 at 1:24 PM Andrew Lamb  wrote:
> >
> > +1 (binding)
> >
> > Thanks Andy! I know releases are a significant amount of work.
> >
> > Andrew
> >
> > On Tue, Jul 12, 2022 at 11:45 AM Andy Grove 
> wrote:
> >
> > > Hi,
> > >
> > > I would like to propose a release of Apache Arrow DataFusion
> > > Implementation,
> > > version 10.0.0.
> > >
> > > This release candidate is based on commit:
> > > d25e822c1ef85ee7c0297b4b38d05a51b0d2e46f [1]
> > > The proposed release tarball and signatures are hosted at [2].
> > > The changelog is located at [3].
> > >
> > > Please download, verify checksums and signatures, run the unit tests,
> and
> > > vote
> > > on the release. The vote will be open for at least 72 hours.
> > >
> > > Only votes from PMC members are binding, but all members of the
> community
> > > are
> > > encouraged to test the release and vote with "(non-binding)".
> > >
> > > The standard verification procedure is documented at
> > >
> > >
> https://github.com/apache/arrow-datafusion/blob/master/dev/release/README.md#verifying-release-candidates
> > > .
> > >
> > > [ ] +1 Release this as Apache Arrow DataFusion 10.0.0
> > > [ ] +0
> > > [ ] -1 Do not release this as Apache Arrow DataFusion 10.0.0 because...
> > >
> > > Here is my vote:
> > >
> > > +1
> > >
> > > [1]:
> > >
> > >
> https://github.com/apache/arrow-datafusion/tree/d25e822c1ef85ee7c0297b4b38d05a51b0d2e46f
> > > [2]:
> > >
> > >
> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-datafusion-10.0.0-rc1
> > > [3]:
> > >
> > >
> https://github.com/apache/arrow-datafusion/blob/d25e822c1ef85ee7c0297b4b38d05a51b0d2e46f/CHANGELOG.md
> > >
>


Re: [VOTE][RUST] Release Apache Arrow Rust 17.0.0 RC1

2022-06-27 Thread Neville Dipale
+1 (binding)

Verified on aarch64 macos

On Sat, 25 Jun 2022 at 07:00, QP Hou  wrote:

> +1 (binding)
>
> On Fri, Jun 24, 2022 at 7:36 PM Remzi Yang <1371656737...@gmail.com>
> wrote:
> >
> > +1 (non-binding). Verified on Mac M1.
> > Thanks Andrew.
> >
> > Remzi
> >
> > On Sat, 25 Jun 2022 at 09:33, Chao Sun  wrote:
> >
> > > +1 (non-binding). Verified on Intel Mac.
> > >
> > > Thanks Andrew.
> > >
> > > On Fri, Jun 24, 2022 at 5:17 PM L. C. Hsieh  wrote:
> > > >
> > > > +1 (non-binding)
> > > >
> > > > Verified on Intel Mac.
> > > >
> > > > Thank you, Andrew.
> > > >
> > > > On Fri, Jun 24, 2022 at 5:00 PM Andy Grove 
> > > wrote:
> > > > >
> > > > > +1 (binding)
> > > > >
> > > > > Verified on Ubuntu 20.04.4 LTS.
> > > > >
> > > > > Thanks, Andrew.
> > > > >
> > > > > On Fri, Jun 24, 2022 at 2:45 PM Andrew Lamb 
> > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I would like to propose a release of Apache Arrow Rust
> > > Implementation,
> > > > > > version 17.0.0.
> > > > > >
> > > > > > This release candidate is based on commit:
> > > > > > 9f7b6004d365b0c0bac8e30170b49bdd66cc7df0 [1]
> > > > > >
> > > > > > The proposed release tarball and signatures are hosted at [2].
> > > > > >
> > > > > > The changelog is located at [3].
> > > > > >
> > > > > > Please download, verify checksums and signatures, run the unit
> tests,
> > > > > > and vote on the release. There is a script [4] that automates
> some of
> > > > > > the verification.
> > > > > >
> > > > > > The vote will be open for at least 72 hours.
> > > > > >
> > > > > > [ ] +1 Release this as Apache Arrow Rust
> > > > > > [ ] +0
> > > > > > [ ] -1 Do not release this as Apache Arrow Rust  because...
> > > > > >
> > > > > > [1]:
> > > > > >
> > > > > >
> > >
> https://github.com/apache/arrow-rs/tree/9f7b6004d365b0c0bac8e30170b49bdd66cc7df0
> > > > > > [2]:
> > > > > >
> > >
> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-rs-17.0.0-rc1
> > > > > > [3]:
> > > > > >
> > > > > >
> > >
> https://github.com/apache/arrow-rs/blob/9f7b6004d365b0c0bac8e30170b49bdd66cc7df0/CHANGELOG.md
> > > > > > [4]:
> > > > > >
> > > > > >
> > >
> https://github.com/apache/arrow-rs/blob/master/dev/release/verify-release-candidate.sh
> > > > > > -
> > > > > >
> > >
>


Re: [VOTE][RUST] Release Apache Arrow Rust 9.1.0 RC1

2022-02-22 Thread Neville Dipale
All the crates are now released

https://crates.io/crates/arrow
https://crates.io/crates/arrow-flight
https://crates.io/crates/parquet
https://crates.io/crates/parquet_derive

On Tue, 22 Feb 2022 at 20:16, Andy Grove  wrote:

> You should have those now. Let me know if I missed any others.
>
> On Tue, Feb 22, 2022 at 11:13 AM Neville Dipale 
> wrote:
>
> > Thanks Andy, I didn't get invites for parquet, parquet-derive and
> > arrow-flight
> >
> > On Tue, 22 Feb 2022 at 19:58, Andy Grove  wrote:
> >
> > > Hi Neville,
> > >
> > > I have invited you to become an owner on the Arrow/DataFusion crates.
> > >
> > > Thanks,
> > >
> > > Andy.
> > >
> > >
> > >
> > > On Tue, Feb 22, 2022 at 10:25 AM Neville Dipale  >
> > > wrote:
> > >
> > > > With 3 +1 votes the release is approved.
> > > >
> > > > The release is available here:
> > > >   https://dist.apache.org/repos/dist/release/arrow/arrow-rs-9.1.0
> > > >
> > > > All that's left is to publish to crates.io, i'm now an owner on
> > > crates.io,
> > > > so I can't publish the new crates.
> > > >
> > > > @Andrew Lamb  may you please add me as an
> owner
> > :)
> > > >
> > > > On Mon, 21 Feb 2022 at 19:29, Andy Grove 
> > wrote:
> > > >
> > > > > +1
> > > > >
> > > > > Verified on Ubuntu 20.04.3 LTS
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Mon, Feb 21, 2022 at 12:04 AM Andrew Lamb  >
> > > > wrote:
> > > > >
> > > > > > +1
> > > > > >
> > > > > > I ran the verification script, looked at changelog, and the
> > commits.
> > > > > Thanks
> > > > > > Neville for driving this process
> > > > > >
> > > > > > Andrew
> > > > > >
> > > > > > On Mon, Feb 21, 2022 at 7:55 AM Chao Sun 
> > wrote:
> > > > > >
> > > > > > > +1 (non-binding)
> > > > > > >
> > > > > > > Verified on macos with
> `dev/release/verify-release-candidate.sh`
> > > and
> > > > it
> > > > > > > looks good.
> > > > > > >
> > > > > > > On Sun, Feb 20, 2022 at 1:06 PM Sutou Kouhei <
> k...@clear-code.com
> > >
> > > > > wrote:
> > > > > > >
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > Great!
> > > > > > > > Could you also add your PGP key to
> > > > > > > > https://dist.apache.org/repos/dist/release/arrow/KEYS
> > > > > > > > ?
> > > > > > > >
> > > > > > > > The release/arrow/KEYS content is distributed as
> > > > > > > > https://downloads.apache.org/arrow/KEYS .
> > > > > > > >
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > --
> > > > > > > > kou
> > > > > > > >
> > > > > > > > In <
> > > > > cabgzuegyc3-trrvucmtfnxgpkbp3io2nq6swof2umq2f3ka...@mail.gmail.com
> > > > > > >
> > > > > > > >   "Re: [VOTE][RUST] Release Apache Arrow Rust 9.1.0 RC1" on
> > Sun,
> > > 20
> > > > > Feb
> > > > > > > > 2022 10:43:16 +0200,
> > > > > > > >   Neville Dipale  wrote:
> > > > > > > >
> > > > > > > > > Hi Kou,
> > > > > > > > >
> > > > > > > > > Thanks, I've just managed to add my key through some
> fiddling
> > > > > before
> > > > > > I
> > > > > > > > saw
> > > > > > > > > your email. My RC verification is still running :)
> > > > > > > > >
> > > > > > > > > Also added it to the Ubuntu keyserver if it helps.
> > > > > > > > >
> > > > > > > > > On Sun, 20 Feb 2022 at 06:34, Sutou Kouhei <
> > k...@clear-code.com
> > > >
> > > > > > wrote:
> > >

Re: [VOTE][RUST] Release Apache Arrow Rust 9.1.0 RC1

2022-02-22 Thread Neville Dipale
Thanks Andy, I didn't get invites for parquet, parquet-derive and
arrow-flight

On Tue, 22 Feb 2022 at 19:58, Andy Grove  wrote:

> Hi Neville,
>
> I have invited you to become an owner on the Arrow/DataFusion crates.
>
> Thanks,
>
> Andy.
>
>
>
> On Tue, Feb 22, 2022 at 10:25 AM Neville Dipale 
> wrote:
>
> > With 3 +1 votes the release is approved.
> >
> > The release is available here:
> >   https://dist.apache.org/repos/dist/release/arrow/arrow-rs-9.1.0
> >
> > All that's left is to publish to crates.io, i'm now an owner on
> crates.io,
> > so I can't publish the new crates.
> >
> > @Andrew Lamb  may you please add me as an owner :)
> >
> > On Mon, 21 Feb 2022 at 19:29, Andy Grove  wrote:
> >
> > > +1
> > >
> > > Verified on Ubuntu 20.04.3 LTS
> > >
> > >
> > >
> > >
> > >
> > > On Mon, Feb 21, 2022 at 12:04 AM Andrew Lamb 
> > wrote:
> > >
> > > > +1
> > > >
> > > > I ran the verification script, looked at changelog, and the commits.
> > > Thanks
> > > > Neville for driving this process
> > > >
> > > > Andrew
> > > >
> > > > On Mon, Feb 21, 2022 at 7:55 AM Chao Sun  wrote:
> > > >
> > > > > +1 (non-binding)
> > > > >
> > > > > Verified on macos with `dev/release/verify-release-candidate.sh`
> and
> > it
> > > > > looks good.
> > > > >
> > > > > On Sun, Feb 20, 2022 at 1:06 PM Sutou Kouhei 
> > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > Great!
> > > > > > Could you also add your PGP key to
> > > > > > https://dist.apache.org/repos/dist/release/arrow/KEYS
> > > > > > ?
> > > > > >
> > > > > > The release/arrow/KEYS content is distributed as
> > > > > > https://downloads.apache.org/arrow/KEYS .
> > > > > >
> > > > > >
> > > > > > Thanks,
> > > > > > --
> > > > > > kou
> > > > > >
> > > > > > In <
> > > cabgzuegyc3-trrvucmtfnxgpkbp3io2nq6swof2umq2f3ka...@mail.gmail.com
> > > > >
> > > > > >   "Re: [VOTE][RUST] Release Apache Arrow Rust 9.1.0 RC1" on Sun,
> 20
> > > Feb
> > > > > > 2022 10:43:16 +0200,
> > > > > >   Neville Dipale  wrote:
> > > > > >
> > > > > > > Hi Kou,
> > > > > > >
> > > > > > > Thanks, I've just managed to add my key through some fiddling
> > > before
> > > > I
> > > > > > saw
> > > > > > > your email. My RC verification is still running :)
> > > > > > >
> > > > > > > Also added it to the Ubuntu keyserver if it helps.
> > > > > > >
> > > > > > > On Sun, 20 Feb 2022 at 06:34, Sutou Kouhei  >
> > > > wrote:
> > > > > > >
> > > > > > >> Hi,
> > > > > > >>
> > > > > > >> It seems that Neville forgot to add Neville's PGP key to
> > > > > > >>
> > > > > > >>   * https://dist.apache.org/repos/dist/dev/arrow/KEYS
> > > > > > >>   * https://dist.apache.org/repos/dist/release/arrow/KEYS
> > > > > > >>
> > > > > > >> See the header comment of them how to add a PGP key.
> > > > > > >>
> > > > > > >> Committers can update them by Subversion client with their
> > > > > > >> ASF account. e.g.:
> > > > > > >>
> > > > > > >>   $ svn co https://dist.apache.org/repos/dist/dev/arrow
> > > > > > >>   $ cd arrow
> > > > > > >>   $ editor KEYS
> > > > > > >>   $ svn ci KEYS
> > > > > > >>
> > > > > > >> If Neville doesn't have a PGP key,
> > > > > > >> https://infra.apache.org/release-signing.html#generate may
> > > > > > >> be helpful.
> > > > > > >>
> > > > > > >>
> > > > > > >> Thanks,
> > > > > > >> --
> > > > > > >

Re: [VOTE][RUST] Release Apache Arrow Rust 9.1.0 RC1

2022-02-22 Thread Neville Dipale
With 3 +1 votes the release is approved.

The release is available here:
  https://dist.apache.org/repos/dist/release/arrow/arrow-rs-9.1.0

All that's left is to publish to crates.io, i'm now an owner on crates.io,
so I can't publish the new crates.

@Andrew Lamb  may you please add me as an owner :)

On Mon, 21 Feb 2022 at 19:29, Andy Grove  wrote:

> +1
>
> Verified on Ubuntu 20.04.3 LTS
>
>
>
>
>
> On Mon, Feb 21, 2022 at 12:04 AM Andrew Lamb  wrote:
>
> > +1
> >
> > I ran the verification script, looked at changelog, and the commits.
> Thanks
> > Neville for driving this process
> >
> > Andrew
> >
> > On Mon, Feb 21, 2022 at 7:55 AM Chao Sun  wrote:
> >
> > > +1 (non-binding)
> > >
> > > Verified on macos with `dev/release/verify-release-candidate.sh` and it
> > > looks good.
> > >
> > > On Sun, Feb 20, 2022 at 1:06 PM Sutou Kouhei 
> wrote:
> > >
> > > > Hi,
> > > >
> > > > Great!
> > > > Could you also add your PGP key to
> > > > https://dist.apache.org/repos/dist/release/arrow/KEYS
> > > > ?
> > > >
> > > > The release/arrow/KEYS content is distributed as
> > > > https://downloads.apache.org/arrow/KEYS .
> > > >
> > > >
> > > > Thanks,
> > > > --
> > > > kou
> > > >
> > > > In <
> cabgzuegyc3-trrvucmtfnxgpkbp3io2nq6swof2umq2f3ka...@mail.gmail.com
> > >
> > > >   "Re: [VOTE][RUST] Release Apache Arrow Rust 9.1.0 RC1" on Sun, 20
> Feb
> > > > 2022 10:43:16 +0200,
> > > >   Neville Dipale  wrote:
> > > >
> > > > > Hi Kou,
> > > > >
> > > > > Thanks, I've just managed to add my key through some fiddling
> before
> > I
> > > > saw
> > > > > your email. My RC verification is still running :)
> > > > >
> > > > > Also added it to the Ubuntu keyserver if it helps.
> > > > >
> > > > > On Sun, 20 Feb 2022 at 06:34, Sutou Kouhei 
> > wrote:
> > > > >
> > > > >> Hi,
> > > > >>
> > > > >> It seems that Neville forgot to add Neville's PGP key to
> > > > >>
> > > > >>   * https://dist.apache.org/repos/dist/dev/arrow/KEYS
> > > > >>   * https://dist.apache.org/repos/dist/release/arrow/KEYS
> > > > >>
> > > > >> See the header comment of them how to add a PGP key.
> > > > >>
> > > > >> Committers can update them by Subversion client with their
> > > > >> ASF account. e.g.:
> > > > >>
> > > > >>   $ svn co https://dist.apache.org/repos/dist/dev/arrow
> > > > >>   $ cd arrow
> > > > >>   $ editor KEYS
> > > > >>   $ svn ci KEYS
> > > > >>
> > > > >> If Neville doesn't have a PGP key,
> > > > >> https://infra.apache.org/release-signing.html#generate may
> > > > >> be helpful.
> > > > >>
> > > > >>
> > > > >> Thanks,
> > > > >> --
> > > > >> kou
> > > > >>
> > > > >> In  > > pnp3cu91r5vf_6szh1z4uab_qp...@mail.gmail.com>
> > > > >>   "Re: [VOTE][RUST] Release Apache Arrow Rust 9.1.0 RC1" on Sat,
> 19
> > > Feb
> > > > >> 2022 11:52:44 -0700,
> > > > >>   Andy Grove  wrote:
> > > > >>
> > > > >> > I'm using the same process as usual (as far as I am aware) and
> the
> > > > script
> > > > >> > failed with:
> > > > >> >
> > > > >> > gpg: Signature made Sat 19 Feb 2022 10:01:41 AM MST
> > > > >> > gpg:using EDDSA key
> > > > >> 3905F254F9E504B40FFF6CF6000488D7717D3FB2
> > > > >> > gpg: Can't check signature: No public key
> > > > >> > + cleanup
> > > > >> > + '[' no = yes ']'
> > > > >> > + echo 'Failed to verify release candidate. See
> > > /tmp/arrow-9.1.0.62KW4
> > > > >> for
> > > > >> > details.'
> > > > >> >
> > > > >> >
> > > > >> > On Sat, Feb 19, 2022 at 10:04 AM Neville Dipale <
> > > > nevilled...@gmail.com&

Re: [VOTE][RUST] Release Apache Arrow Rust 9.1.0 RC1

2022-02-20 Thread Neville Dipale
+1 binding

Verified on macos aarch64

+ TEST_SUCCESS=yes

+ echo 'Release candidate looks good!'

Release candidate looks good!

On Sun, 20 Feb 2022 at 10:43, Neville Dipale  wrote:

> Hi Kou,
>
> Thanks, I've just managed to add my key through some fiddling before I saw
> your email. My RC verification is still running :)
>
> Also added it to the Ubuntu keyserver if it helps.
>
> On Sun, 20 Feb 2022 at 06:34, Sutou Kouhei  wrote:
>
>> Hi,
>>
>> It seems that Neville forgot to add Neville's PGP key to
>>
>>   * https://dist.apache.org/repos/dist/dev/arrow/KEYS
>>   * https://dist.apache.org/repos/dist/release/arrow/KEYS
>>
>> See the header comment of them how to add a PGP key.
>>
>> Committers can update them by Subversion client with their
>> ASF account. e.g.:
>>
>>   $ svn co https://dist.apache.org/repos/dist/dev/arrow
>>   $ cd arrow
>>   $ editor KEYS
>>   $ svn ci KEYS
>>
>> If Neville doesn't have a PGP key,
>> https://infra.apache.org/release-signing.html#generate may
>> be helpful.
>>
>>
>> Thanks,
>> --
>> kou
>>
>> In 
>>   "Re: [VOTE][RUST] Release Apache Arrow Rust 9.1.0 RC1" on Sat, 19 Feb
>> 2022 11:52:44 -0700,
>>   Andy Grove  wrote:
>>
>> > I'm using the same process as usual (as far as I am aware) and the
>> script
>> > failed with:
>> >
>> > gpg: Signature made Sat 19 Feb 2022 10:01:41 AM MST
>> > gpg:using EDDSA key
>> 3905F254F9E504B40FFF6CF6000488D7717D3FB2
>> > gpg: Can't check signature: No public key
>> > + cleanup
>> > + '[' no = yes ']'
>> > + echo 'Failed to verify release candidate. See /tmp/arrow-9.1.0.62KW4
>> for
>> > details.'
>> >
>> >
>> > On Sat, Feb 19, 2022 at 10:04 AM Neville Dipale 
>> > wrote:
>> >
>> >> Hi,
>> >>
>> >> I would like to propose a release of Apache Arrow Rust Implementation,
>> >> version 9.1.0.
>> >>
>> >> This release candidate is based on commit:
>> >> ecba7dc0830dbde6aa6dd9432519b776e40c1e85 [1]
>> >>
>> >> The proposed release tarball and signatures are hosted at [2].
>> >>
>> >> The changelog is located at [3].
>> >>
>> >> Please download, verify checksums and signatures, run the unit tests,
>> >> and vote on the release. There is a script [4] that automates some of
>> >> the verification.
>> >>
>> >> The vote will be open for at least 72 hours.
>> >>
>> >> [ ] +1 Release this as Apache Arrow Rust
>> >> [ ] +0
>> >> [ ] -1 Do not release this as Apache Arrow Rust  because...
>> >>
>> >> [1]:
>> >>
>> >>
>> https://github.com/apache/arrow-rs/tree/ecba7dc0830dbde6aa6dd9432519b776e40c1e85
>> >> [2]:
>> >> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-rs-9.1.0-rc1
>> >> [3]:
>> >>
>> >>
>> https://github.com/apache/arrow-rs/blob/ecba7dc0830dbde6aa6dd9432519b776e40c1e85/CHANGELOG.md
>> >> [4]:
>> >>
>> >>
>> https://github.com/apache/arrow-rs/blob/master/dev/release/verify-release-candidate.sh
>> >>
>>
>


Re: [VOTE][RUST] Release Apache Arrow Rust 9.1.0 RC1

2022-02-20 Thread Neville Dipale
Hi Kou,

Thanks, I've just managed to add my key through some fiddling before I saw
your email. My RC verification is still running :)

Also added it to the Ubuntu keyserver if it helps.

On Sun, 20 Feb 2022 at 06:34, Sutou Kouhei  wrote:

> Hi,
>
> It seems that Neville forgot to add Neville's PGP key to
>
>   * https://dist.apache.org/repos/dist/dev/arrow/KEYS
>   * https://dist.apache.org/repos/dist/release/arrow/KEYS
>
> See the header comment of them how to add a PGP key.
>
> Committers can update them by Subversion client with their
> ASF account. e.g.:
>
>   $ svn co https://dist.apache.org/repos/dist/dev/arrow
>   $ cd arrow
>   $ editor KEYS
>   $ svn ci KEYS
>
> If Neville doesn't have a PGP key,
> https://infra.apache.org/release-signing.html#generate may
> be helpful.
>
>
> Thanks,
> --
> kou
>
> In 
>   "Re: [VOTE][RUST] Release Apache Arrow Rust 9.1.0 RC1" on Sat, 19 Feb
> 2022 11:52:44 -0700,
>   Andy Grove  wrote:
>
> > I'm using the same process as usual (as far as I am aware) and the script
> > failed with:
> >
> > gpg: Signature made Sat 19 Feb 2022 10:01:41 AM MST
> > gpg:using EDDSA key
> 3905F254F9E504B40FFF6CF6000488D7717D3FB2
> > gpg: Can't check signature: No public key
> > + cleanup
> > + '[' no = yes ']'
> > + echo 'Failed to verify release candidate. See /tmp/arrow-9.1.0.62KW4
> for
> > details.'
> >
> >
> > On Sat, Feb 19, 2022 at 10:04 AM Neville Dipale 
> > wrote:
> >
> >> Hi,
> >>
> >> I would like to propose a release of Apache Arrow Rust Implementation,
> >> version 9.1.0.
> >>
> >> This release candidate is based on commit:
> >> ecba7dc0830dbde6aa6dd9432519b776e40c1e85 [1]
> >>
> >> The proposed release tarball and signatures are hosted at [2].
> >>
> >> The changelog is located at [3].
> >>
> >> Please download, verify checksums and signatures, run the unit tests,
> >> and vote on the release. There is a script [4] that automates some of
> >> the verification.
> >>
> >> The vote will be open for at least 72 hours.
> >>
> >> [ ] +1 Release this as Apache Arrow Rust
> >> [ ] +0
> >> [ ] -1 Do not release this as Apache Arrow Rust  because...
> >>
> >> [1]:
> >>
> >>
> https://github.com/apache/arrow-rs/tree/ecba7dc0830dbde6aa6dd9432519b776e40c1e85
> >> [2]:
> >> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-rs-9.1.0-rc1
> >> [3]:
> >>
> >>
> https://github.com/apache/arrow-rs/blob/ecba7dc0830dbde6aa6dd9432519b776e40c1e85/CHANGELOG.md
> >> [4]:
> >>
> >>
> https://github.com/apache/arrow-rs/blob/master/dev/release/verify-release-candidate.sh
> >>
>


Re: [VOTE][RUST] Release Apache Arrow Rust 9.1.0 RC1

2022-02-19 Thread Neville Dipale
My gpg signature is freshly added, and might not be on the directory yet as
I understand it to be refreshed once daily.

May you please try again later? I'll also check if the signature appears in
a few hours.

On Sat, 19 Feb 2022, 20:53 Andy Grove,  wrote:

> I'm using the same process as usual (as far as I am aware) and the script
> failed with:
>
> gpg: Signature made Sat 19 Feb 2022 10:01:41 AM MST
> gpg:using EDDSA key
> 3905F254F9E504B40FFF6CF6000488D7717D3FB2
> gpg: Can't check signature: No public key
> + cleanup
> + '[' no = yes ']'
> + echo 'Failed to verify release candidate. See /tmp/arrow-9.1.0.62KW4 for
> details.'
>
>
> On Sat, Feb 19, 2022 at 10:04 AM Neville Dipale 
> wrote:
>
> > Hi,
> >
> > I would like to propose a release of Apache Arrow Rust Implementation,
> > version 9.1.0.
> >
> > This release candidate is based on commit:
> > ecba7dc0830dbde6aa6dd9432519b776e40c1e85 [1]
> >
> > The proposed release tarball and signatures are hosted at [2].
> >
> > The changelog is located at [3].
> >
> > Please download, verify checksums and signatures, run the unit tests,
> > and vote on the release. There is a script [4] that automates some of
> > the verification.
> >
> > The vote will be open for at least 72 hours.
> >
> > [ ] +1 Release this as Apache Arrow Rust
> > [ ] +0
> > [ ] -1 Do not release this as Apache Arrow Rust  because...
> >
> > [1]:
> >
> >
> https://github.com/apache/arrow-rs/tree/ecba7dc0830dbde6aa6dd9432519b776e40c1e85
> > [2]:
> > https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-rs-9.1.0-rc1
> > [3]:
> >
> >
> https://github.com/apache/arrow-rs/blob/ecba7dc0830dbde6aa6dd9432519b776e40c1e85/CHANGELOG.md
> > [4]:
> >
> >
> https://github.com/apache/arrow-rs/blob/master/dev/release/verify-release-candidate.sh
> >
>


[VOTE][RUST] Release Apache Arrow Rust 9.1.0 RC1

2022-02-19 Thread Neville Dipale
Hi,

I would like to propose a release of Apache Arrow Rust Implementation,
version 9.1.0.

This release candidate is based on commit:
ecba7dc0830dbde6aa6dd9432519b776e40c1e85 [1]

The proposed release tarball and signatures are hosted at [2].

The changelog is located at [3].

Please download, verify checksums and signatures, run the unit tests,
and vote on the release. There is a script [4] that automates some of
the verification.

The vote will be open for at least 72 hours.

[ ] +1 Release this as Apache Arrow Rust
[ ] +0
[ ] -1 Do not release this as Apache Arrow Rust  because...

[1]:
https://github.com/apache/arrow-rs/tree/ecba7dc0830dbde6aa6dd9432519b776e40c1e85
[2]: https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-rs-9.1.0-rc1
[3]:
https://github.com/apache/arrow-rs/blob/ecba7dc0830dbde6aa6dd9432519b776e40c1e85/CHANGELOG.md
[4]:
https://github.com/apache/arrow-rs/blob/master/dev/release/verify-release-candidate.sh


Re: [ANNOUNCE] New Arrow PMC member: QP Hou

2022-02-19 Thread Neville Dipale
Congratulations QP! 拾

On Fri, 18 Feb 2022 at 09:42, Daniël Heres  wrote:

> Congratulations QP!
>
> On Fri, Feb 18, 2022, 06:55 Benson Muite 
> wrote:
>
> > Congratulations QP!
> > On 2/18/22 8:35 AM, Jiayu Liu wrote:
> > > Congratulations QP!
> > >
> > > On Fri, Feb 18, 2022 at 1:32 PM Micah Kornfield  >
> > > wrote:
> > >
> > >> Congrats!
> > >>
> > >> On Thu, Feb 17, 2022 at 7:27 PM Weston Pace 
> > wrote:
> > >>
> > >>> Congratulations QP!
> > >>>
> > >>> On Thu, Feb 17, 2022 at 3:22 PM hao Yang <1371656737...@gmail.com>
> > >> wrote:
> > 
> >  Congratulations QP!
> > 
> >  On Fri, 18 Feb 2022 at 09:14, Vibhatha Abeykoon  >
> > >>> wrote:
> > 
> > > Congratulations!
> > >
> > > On Fri, Feb 18, 2022 at 5:51 AM Yijie Shen <
> > >> henry.yijies...@gmail.com>
> > > wrote:
> > >
> > >> Congratulations QP!
> > >>
> > >> On Fri, Feb 18, 2022 at 6:17 AM Phillip Cloud 
> > >>> wrote:
> > >>
> > >>> Congratulations!!
> > >>>
> > >>> On Thu, Feb 17, 2022 at 5:12 PM Neal Richardson <
> > >>> neal.p.richard...@gmail.com>
> > >>> wrote:
> > >>>
> >  Congratulations!
> > 
> >  Neal
> > 
> >  On Thu, Feb 17, 2022 at 4:48 PM Rok Mihevc <
> > >> rok.mih...@gmail.com
> > 
> > >> wrote:
> > 
> > > Congrats QP!
> > >
> > > Rok
> > >
> > > On Thu, Feb 17, 2022 at 10:41 PM David Li <
> > >> lidav...@apache.org
> > 
> > >> wrote:
> > >>
> > >> Congrats QP!
> > >>
> > >> On Thu, Feb 17, 2022, at 16:26, Matthew Turner wrote:
> > >>> Congratulations, QP! Appreciate all of your contributions
> > >>> and
> >  guidance.
> > >>>
> > >>> From: Sutou Kouhei 
> > >>> Date: Thursday, February 17, 2022 at 3:17 PM
> > >>> To: dev@arrow.apache.org 
> > >>> Subject: [ANNOUNCE] New Arrow PMC member: QP Hou
> > >>> The Project Management Committee (PMC) for Apache Arrow
> > >> has
> > >> invited
> > >>> QP Hou to become a PMC member and we are pleased to
> > >>> announce
> > >>> that QP Hou has accepted.
> > >>>
> > >>> Congratulations and welcome!
> > >
> > 
> > >>>
> > >>
> > > --
> > > Vibhatha Abeykoon
> > >
> > >>>
> > >>
> > >
> >
> >
>


Re: [VOTE][RUST][Datafusion] Release Apache Arrow Datafusion 7.0.0 RC2

2022-02-15 Thread Neville Dipale
+1 binding, verified the RC on macos aarch64

On Tue, 15 Feb 2022 at 09:18, Yijie Shen  wrote:

> Thanks, Andrew!
>
> +1 non-binding
>
>
> On Tue, Feb 15, 2022 at 11:51 AM QP Hou  wrote:
>
> > +1 non-binding, went release script on Linux arm64.
> >
> > It failed at the end when executing the `cargo publish --dry-run`
> > command, but I think this is expected because the datafusion core is
> > now depending on the `datafusion-common` crate. We should probably
> > update the verification script to run publish dry run on
> > datafusion-common instead.
> >
> >
> > On Mon, Feb 14, 2022 at 6:53 AM Andrew Lamb 
> wrote:
> > >
> > > Greetings,
> > >
> > > I would like to propose a release of Apache Arrow Datafusion
> > Implementation,
> > > version 7.0.0.
> > >
> > > This release candidate is based on commit [1]
> > > The proposed release tarball and signatures are hosted at [2].
> > > The changelog is located at [3].
> > >
> > > Note this release does NOT include python bindings or ballista (which
> > > can hopefully be released separately). More detail on release
> > > coordination be found at [8].
> > >
> > > In particular, I believe it would be beneficial for downstream projects
> > > (such as datafusion-python [4], and datafusion-objectstore-s3 [5]) to
> > > test that this release candidate works for their needs
> > >
> > > Please download, verify checksums and signatures, run your tests, and
> > > vote on the release. The vote will be open for at least 72 hours.
> > >
> > > Only votes from PMC members are binding, but all members of the
> community
> > > are
> > > encouraged to test the release and vote with "(non-binding)".
> > >
> > > The standard verification procedure is documented at [6]. Note there
> > > were changes to the verification scripts for this release [7] related
> > > to no longer publishing the python bindings and breaking datafusion
> > > into multiple smaller crates.
> > >
> > >
> > > [ ] +1 Release this as Apache Arrow Datafusion 7.0.0
> > > [ ] +0
> > > [ ] -1 Do not release this as Apache Arrow Datafusion 7.0.0 because...
> > >
> > > [1]:
> > >
> >
> https://github.com/apache/arrow-datafusion/tree/ca765d54dda6114da55ece8d876c042eca3ea870
> > > [2]:
> > >
> >
> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-datafusion-7.0.0-rc2
> > > [3]:
> > >
> >
> https://github.com/apache/arrow-datafusion/blob/ca765d54dda6114da55ece8d876c042eca3ea870/CHANGELOG.md
> > > [4]: https://github.com/datafusion-contrib/datafusion-python
> > > [5]: https://github.com/datafusion-contrib/datafusion-objectstore-s3
> > > [6]:
> > >
> >
> https://github.com/apache/arrow-datafusion/blob/master/dev/release/README.md#verifying-release-candidates
> > > [7]: https://github.com/apache/arrow-datafusion/pull/1830
> > > [8]: https://github.com/apache/arrow-datafusion/issues/1587
> >
>


Re: [VOTE][RUST] Release Apache Arrow Rust 9.0.3 RC3

2022-02-11 Thread Neville Dipale
I concluded same, I also tested 9.0.2

On Fri, 11 Feb 2022 at 18:46, Andy Grove  wrote:

> Just to be clear, I am voting on 9.0.2 RC3 using the following command.
> However, The email subject refers to 9.0.3 RC3. I assume this is just a
> typo?
>
> ./dev/release/verify-release-candidate.sh 9.0.2 3
>
> On Fri, Feb 11, 2022 at 8:58 AM Andy Grove  wrote:
>
> > +1 (binding)
> >
> > On Fri, Feb 11, 2022 at 3:12 AM Jörn Horstmann <
> > joern.horstm...@signavio.com> wrote:
> >
> >> +1 (non-binding)
> >>
> >> On Thu, Feb 10, 2022 at 2:38 PM Andrew Lamb 
> wrote:
> >>
> >> > Update for anyone following along: we have found and fixed the bug
> >> (which
> >> > appears to only affect a test) I believe Neville plans to re-test this
> >> RC.
> >> > More details can be found here [1]
> >> >
> >> > Andrew
> >> >
> >> > [1]
> >> https://github.com/apache/arrow-rs/pull/1297#issuecomment-1034888108
> >> >
> >> > On Thu, Feb 10, 2022 at 2:43 AM Neville Dipale  >
> >> > wrote:
> >> >
> >> > > Sorry, that should be a -1 because of the bug
> >> > >
> >> > > On Thu, 10 Feb 2022 at 09:42, Neville Dipale  >
> >> > > wrote:
> >> > >
> >> > > > [X] +0 binding
> >> > > >
> >> > > > There's a test failure on aarch64, I've opened
> >> > > > https://github.com/apache/arrow-rs/issues/1294
> >> > > >
> >> > > > thread
> >> > > >
> 'util::bit_chunk_iterator::tests::test_unaligned_bit_chunk_iterator'
> >> > > > panicked at 'assertion failed: ALIGNMENT > 64',
> >> > > > arrow/src/util/bit_chunk_iterator.rs:470:9
> >> > > >
> >> > > > On Wed, 9 Feb 2022 at 18:57, Andrew Lamb 
> >> wrote:
> >> > > >
> >> > > >> Hi,
> >> > > >>
> >> > > >> I would like to propose a release of Apache Arrow Rust
> >> Implementation,
> >> > > >> version 9.0.2.
> >> > > >>
> >> > > >> This is the third RC[5] for the 9.x  version. While somewhat
> >> > > inconvenient
> >> > > >> I
> >> > > >> am happy to say our process has found and prevented two critical
> >> bugs
> >> > > >> prior
> >> > > >> from being released! I hope that "third time's the charm" this
> >> time.
> >> > > >>
> >> > > >> This release candidate is based on commit:
> >> > > >> ab7c2904ccc03d0c05687ef416fbc5f4ed92f125 [1]
> >> > > >>
> >> > > >> The proposed release tarball and signatures are hosted at [2].
> >> > > >>
> >> > > >> The changelog is located at [3].
> >> > > >>
> >> > > >> Please download, verify checksums and signatures, run the unit
> >> tests,
> >> > > >> and vote on the release. There is a script [4] that automates
> some
> >> of
> >> > > >> the verification.
> >> > > >>
> >> > > >> The vote will be open for at least 72 hours.
> >> > > >>
> >> > > >> [ ] +1 Release this as Apache Arrow Rust
> >> > > >> [ ] +0
> >> > > >> [ ] -1 Do not release this as Apache Arrow Rust  because...
> >> > > >>
> >> > > >> [1]:
> >> > > >>
> >> > > >>
> >> > >
> >> >
> >>
> https://github.com/apache/arrow-rs/tree/ab7c2904ccc03d0c05687ef416fbc5f4ed92f125
> >> > > >> [2]:
> >> > > >>
> >> >
> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-rs-9.0.2-rc3
> >> > > >> [3]:
> >> > > >>
> >> > > >>
> >> > >
> >> >
> >>
> https://github.com/apache/arrow-rs/blob/ab7c2904ccc03d0c05687ef416fbc5f4ed92f125/CHANGELOG.md
> >> > > >> [4]:
> >> > > >>
> >> > > >>
> >> > >
> >> >
> >>
> https://github.com/apache/arrow-rs/blob/master/dev/release/verify-release-candidate.sh
> >> > > >> [5]:
> >> https://lists.apache.org/thread/gtvv5okt49jyr0hxn5kho7vwp25rz5dk
> >> > > >> -
> >> > > >> Running rat license checker on
> >> > > >>
> >> > > >>
> >> > >
> >> >
> >>
> /Users/alamb/Software/arrow-rs/dev/dist/apache-arrow-rs-9.0.2-rc3/apache-arrow-rs-9.0.2.tar.gz
> >> > > >>
> >> > > >
> >> > >
> >> >
> >>
> >>
> >> --
> >> *Jörn Horstmann* | Senior Backend Engineer
> >>
> >> www.signavio.com
> >> Kurfürstenstraße 111, 10787 Berlin, Germany
> >>
> >> Work with us! <https://hubs.ly/H0wwzcr0>
> >>
> >>
> >>
> >> <https://www.linkedin.com/company/signavio/>
> >> <https://www.twitter.com/signavio>   <https://www.facebook.com/signavio
> >
> >> <https://www.youtube.com/user/signavio>
> >> <https://www.xing.com/companies/signaviogmbh>
> >>
> >> <https://t-eu.xink.io/Tracking/Index/BSUAALBtAAAnu0MA0>
> >>
> >> HRB 121584 B Charlottenburg District Court, VAT ID: DE265675123
> >> Managing Directors: Dr. Gero Decker, Rouven Morato Adam
> >>
> >
>


Re: [VOTE][RUST] Release Apache Arrow Rust 9.0.3 RC3

2022-02-10 Thread Neville Dipale
+1 binding

The issue was only a test issue, which has now been resolved.

On Thu, 10 Feb 2022 at 15:37, Andrew Lamb  wrote:

> Update for anyone following along: we have found and fixed the bug (which
> appears to only affect a test) I believe Neville plans to re-test this RC.
> More details can be found here [1]
>
> Andrew
>
> [1] https://github.com/apache/arrow-rs/pull/1297#issuecomment-1034888108
>
> On Thu, Feb 10, 2022 at 2:43 AM Neville Dipale 
> wrote:
>
> > Sorry, that should be a -1 because of the bug
> >
> > On Thu, 10 Feb 2022 at 09:42, Neville Dipale 
> > wrote:
> >
> > > [X] +0 binding
> > >
> > > There's a test failure on aarch64, I've opened
> > > https://github.com/apache/arrow-rs/issues/1294
> > >
> > > thread
> > > 'util::bit_chunk_iterator::tests::test_unaligned_bit_chunk_iterator'
> > > panicked at 'assertion failed: ALIGNMENT > 64',
> > > arrow/src/util/bit_chunk_iterator.rs:470:9
> > >
> > > On Wed, 9 Feb 2022 at 18:57, Andrew Lamb  wrote:
> > >
> > >> Hi,
> > >>
> > >> I would like to propose a release of Apache Arrow Rust Implementation,
> > >> version 9.0.2.
> > >>
> > >> This is the third RC[5] for the 9.x  version. While somewhat
> > inconvenient
> > >> I
> > >> am happy to say our process has found and prevented two critical bugs
> > >> prior
> > >> from being released! I hope that "third time's the charm" this time.
> > >>
> > >> This release candidate is based on commit:
> > >> ab7c2904ccc03d0c05687ef416fbc5f4ed92f125 [1]
> > >>
> > >> The proposed release tarball and signatures are hosted at [2].
> > >>
> > >> The changelog is located at [3].
> > >>
> > >> Please download, verify checksums and signatures, run the unit tests,
> > >> and vote on the release. There is a script [4] that automates some of
> > >> the verification.
> > >>
> > >> The vote will be open for at least 72 hours.
> > >>
> > >> [ ] +1 Release this as Apache Arrow Rust
> > >> [ ] +0
> > >> [ ] -1 Do not release this as Apache Arrow Rust  because...
> > >>
> > >> [1]:
> > >>
> > >>
> >
> https://github.com/apache/arrow-rs/tree/ab7c2904ccc03d0c05687ef416fbc5f4ed92f125
> > >> [2]:
> > >>
> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-rs-9.0.2-rc3
> > >> [3]:
> > >>
> > >>
> >
> https://github.com/apache/arrow-rs/blob/ab7c2904ccc03d0c05687ef416fbc5f4ed92f125/CHANGELOG.md
> > >> [4]:
> > >>
> > >>
> >
> https://github.com/apache/arrow-rs/blob/master/dev/release/verify-release-candidate.sh
> > >> [5]: https://lists.apache.org/thread/gtvv5okt49jyr0hxn5kho7vwp25rz5dk
> > >> -
> > >> Running rat license checker on
> > >>
> > >>
> >
> /Users/alamb/Software/arrow-rs/dev/dist/apache-arrow-rs-9.0.2-rc3/apache-arrow-rs-9.0.2.tar.gz
> > >>
> > >
> >
>


Re: [VOTE][RUST] Release Apache Arrow Rust 9.0.3 RC3

2022-02-09 Thread Neville Dipale
Sorry, that should be a -1 because of the bug

On Thu, 10 Feb 2022 at 09:42, Neville Dipale  wrote:

> [X] +0 binding
>
> There's a test failure on aarch64, I've opened
> https://github.com/apache/arrow-rs/issues/1294
>
> thread
> 'util::bit_chunk_iterator::tests::test_unaligned_bit_chunk_iterator'
> panicked at 'assertion failed: ALIGNMENT > 64',
> arrow/src/util/bit_chunk_iterator.rs:470:9
>
> On Wed, 9 Feb 2022 at 18:57, Andrew Lamb  wrote:
>
>> Hi,
>>
>> I would like to propose a release of Apache Arrow Rust Implementation,
>> version 9.0.2.
>>
>> This is the third RC[5] for the 9.x  version. While somewhat inconvenient
>> I
>> am happy to say our process has found and prevented two critical bugs
>> prior
>> from being released! I hope that "third time's the charm" this time.
>>
>> This release candidate is based on commit:
>> ab7c2904ccc03d0c05687ef416fbc5f4ed92f125 [1]
>>
>> The proposed release tarball and signatures are hosted at [2].
>>
>> The changelog is located at [3].
>>
>> Please download, verify checksums and signatures, run the unit tests,
>> and vote on the release. There is a script [4] that automates some of
>> the verification.
>>
>> The vote will be open for at least 72 hours.
>>
>> [ ] +1 Release this as Apache Arrow Rust
>> [ ] +0
>> [ ] -1 Do not release this as Apache Arrow Rust  because...
>>
>> [1]:
>>
>> https://github.com/apache/arrow-rs/tree/ab7c2904ccc03d0c05687ef416fbc5f4ed92f125
>> [2]:
>> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-rs-9.0.2-rc3
>> [3]:
>>
>> https://github.com/apache/arrow-rs/blob/ab7c2904ccc03d0c05687ef416fbc5f4ed92f125/CHANGELOG.md
>> [4]:
>>
>> https://github.com/apache/arrow-rs/blob/master/dev/release/verify-release-candidate.sh
>> [5]: https://lists.apache.org/thread/gtvv5okt49jyr0hxn5kho7vwp25rz5dk
>> -
>> Running rat license checker on
>>
>> /Users/alamb/Software/arrow-rs/dev/dist/apache-arrow-rs-9.0.2-rc3/apache-arrow-rs-9.0.2.tar.gz
>>
>


Re: [VOTE][RUST] Release Apache Arrow Rust 9.0.3 RC3

2022-02-09 Thread Neville Dipale
[X] +0 binding

There's a test failure on aarch64, I've opened
https://github.com/apache/arrow-rs/issues/1294

thread 'util::bit_chunk_iterator::tests::test_unaligned_bit_chunk_iterator'
panicked at 'assertion failed: ALIGNMENT > 64',
arrow/src/util/bit_chunk_iterator.rs:470:9

On Wed, 9 Feb 2022 at 18:57, Andrew Lamb  wrote:

> Hi,
>
> I would like to propose a release of Apache Arrow Rust Implementation,
> version 9.0.2.
>
> This is the third RC[5] for the 9.x  version. While somewhat inconvenient I
> am happy to say our process has found and prevented two critical bugs prior
> from being released! I hope that "third time's the charm" this time.
>
> This release candidate is based on commit:
> ab7c2904ccc03d0c05687ef416fbc5f4ed92f125 [1]
>
> The proposed release tarball and signatures are hosted at [2].
>
> The changelog is located at [3].
>
> Please download, verify checksums and signatures, run the unit tests,
> and vote on the release. There is a script [4] that automates some of
> the verification.
>
> The vote will be open for at least 72 hours.
>
> [ ] +1 Release this as Apache Arrow Rust
> [ ] +0
> [ ] -1 Do not release this as Apache Arrow Rust  because...
>
> [1]:
>
> https://github.com/apache/arrow-rs/tree/ab7c2904ccc03d0c05687ef416fbc5f4ed92f125
> [2]:
> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-rs-9.0.2-rc3
> [3]:
>
> https://github.com/apache/arrow-rs/blob/ab7c2904ccc03d0c05687ef416fbc5f4ed92f125/CHANGELOG.md
> [4]:
>
> https://github.com/apache/arrow-rs/blob/master/dev/release/verify-release-candidate.sh
> [5]: https://lists.apache.org/thread/gtvv5okt49jyr0hxn5kho7vwp25rz5dk
> -
> Running rat license checker on
>
> /Users/alamb/Software/arrow-rs/dev/dist/apache-arrow-rs-9.0.2-rc3/apache-arrow-rs-9.0.2.tar.gz
>


Re: VOTE][RUST] Release Apache Arrow Rust 8.0.0 RC1

2022-01-25 Thread Neville Dipale
Good day

+1 (binding), verified on macos-aarch64

On Tue, 25 Jan 2022 at 05:08, Wang Xudong  wrote:

> +1(non-binding)
> Verified on macos
> "Release candidate looks good!"
>
> --
> xudong963
>
> Andy Grove  于2022年1月25日周二 09:07写道:
>
> > +1 (binding). Verified on Ubuntu 20.04.3 LTS.
> >
> > On Fri, Jan 21, 2022 at 5:23 AM Andrew Lamb 
> wrote:
> >
> > > As we have discussed [5], the Rust bi-weekly releases are going to come
> > off
> > > master now, and this is the first such release.
> > >
> > > Therefore, it is with great excitement that I would like to propose a
> > > release of Apache Arrow Rust Implementation, version 8.0.0. Among other
> > > things, this resolves the last outstanding specific security issues
> > [6][7]
> > > we know of.
> > >
> > > This release candidate is based on commit:
> > > 0377aaed5ff46214359d1b8d66c27f3afd9323c3 [1]
> > >
> > > The proposed release tarball and signatures are hosted at [2].
> > >
> > > The changelog is located at [3].
> > >
> > > Please download, verify checksums and signatures, run the unit tests,
> > > and vote on the release. There is a script [4] that automates some of
> > > the verification.
> > >
> > > The vote will be open for at least 72 hours.
> > >
> > > [ ] +1 Release this as Apache Arrow Rust
> > > [ ] +0
> > > [ ] -1 Do not release this as Apache Arrow Rust  because...
> > >
> > > [1]:
> > >
> > >
> >
> https://github.com/apache/arrow-rs/tree/0377aaed5ff46214359d1b8d66c27f3afd9323c3
> > > [2]:
> > > https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-rs-8.0.0-rc1
> > > [3]:
> > >
> > >
> >
> https://github.com/apache/arrow-rs/blob/0377aaed5ff46214359d1b8d66c27f3afd9323c3/CHANGELOG.md
> > > [4]:
> > >
> > >
> >
> https://github.com/apache/arrow-rs/blob/master/dev/release/verify-release-candidate.sh
> > > [5] https://github.com/apache/arrow-rs/issues/1120
> > > [6] https://github.com/apache/arrow-rs/issues/197
> > > [7] https://github.com/apache/arrow-rs/issues/786
> > >
> >
>


Re: [VOTE][RUST][Datafusion] Release Apache Arrow Datafusion 5.0.0 RC3

2021-08-13 Thread Neville Dipale
+1 (bniding)

i verified the RC on aarch64-macos

On Fri, 13 Aug 2021 at 23:29, Jorge Cardoso Leitão 
wrote:

> +1
>
> Great work everyone!
>
>
> On Fri, Aug 13, 2021, 22:19 Daniël Heres  wrote:
>
> > +1 (non binding). Looking good.
> >
> >
> > On Fri, Aug 13, 2021, 07:49 QP Hou  wrote:
> >
> > > Good call Ruihang. I remember we used to have this toolchain file when
> > > we were still in the main arrow repo. I will take a look into that.
> > >
> > > On Wed, Aug 11, 2021 at 5:36 PM Wayne Xia 
> wrote:
> > > >
> > > > Hi QP,
> > > >
> > > > When running this script I noticed that this might be because I was
> not
> > > > using a stable toolchain when testing.
> > > > Those failures occur with nightly (which is my default toolchain).
> And
> > > > everything works fine after switching to stable 1.54.
> > > > So I think it's ok from my side to vote +1.
> > > >
> > > > BTW, I think we can add a toolchain file [1] to datafusion repo.
> > > >
> > > > [1]:
> > > https://rust-lang.github.io/rustup/overrides.html#the-toolchain-file
> > > >
> > > > On Thu, Aug 12, 2021 at 2:14 AM QP Hou  wrote:
> > > >
> > > > > Hi Ruihang,
> > > > >
> > > > > Thanks for helping with the validation. It would certainly be
> helpful
> > > > > if you could share the error log with me.
> > > > >
> > > > > I have also prepared an updated version of the verification script
> at
> > > > >
> > > > >
> > >
> >
> https://github.com/houqp/arrow-datafusion/blob/qp_release/dev/release/verify-release-candidate.sh
> > > > > .
> > > > > This script does a clean checkout of everything before running
> tests
> > > > > and linting tools. Could you give that a try to see if you are
> > getting
> > > > > the same results?
> > > > >
> > > > > Thanks,
> > > > > QP
> > > > >
> > > > > On Wed, Aug 11, 2021 at 6:38 AM Wayne Xia 
> > > wrote:
> > > > > >
> > > > > > Thanks, QP!
> > > > > >
> > > > > > I verified the signature and checked shasum, but got 3 failed
> case
> > > while
> > > > > > testing:
> > > > > >
> > > > > > - execution_plans::shuffle_writer::tests::test
> > > > > > - execution_plans::shuffle_writer::tests::test_partitioned
> > > > > > -
> > > > >
> > >
> >
> physical_plan::repartition::tests::repartition_with_dropping_output_stream
> > > > > >
> > > > > > I set up env `ARROW_TEST_DATA` and `PARQUET_TEST_DATA`, then run
> > the
> > > test
> > > > > > with
> > > > > > "cargo test --all --no-fail-fast" on Linux 5.13.6 with x86_64
> chip.
> > > > > >
> > > > > > Did I miss something? I can paste the log here or file an issue
> if
> > > > > needed.
> > > > > >
> > > > > > Ruihang
> > > > > >
> > > > > > QP Hou :
> > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > I would like to propose a release of Apache Arrow Datafusion
> > > > > > > Implementation,
> > > > > > > version 5.0.0.
> > > > > > >
> > > > > > > RC3 fixed a cargo publish issue discovered in RC1.
> > > > > > >
> > > > > > > This release candidate is based on commit:
> > > > > > > deb929369c9aaba728ae0c2c49dcd05bfecc8bf8 [1]
> > > > > > > The proposed release tarball and signatures are hosted at [2].
> > > > > > > The changelog is located at [3].
> > > > > > >
> > > > > > > Please download, verify checksums and signatures, run the unit
> > > tests,
> > > > > and
> > > > > > > vote
> > > > > > > on the release. The vote will be open for at least 72 hours.
> > > > > > >
> > > > > > > [ ] +1 Release this as Apache Arrow Datafusion 5.0.0
> > > > > > > [ ] +0
> > > > > > > [ ] -1 Do not release this as Apache Arrow Datafusion 5.0.0
> > > because...
> > > > > > >
> > > > > > > [1]:
> > > > > > >
> > > > >
> > >
> >
> https://github.com/apache/arrow-datafusion/tree/deb929369c9aaba728ae0c2c49dcd05bfecc8bf8
> > > > > > > [2]:
> > > > > > >
> > > > >
> > >
> >
> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-datafusion-5.0.0-rc3
> > > > > > > [3]:
> > > > > > >
> > > > >
> > >
> >
> https://github.com/apache/arrow-datafusion/blob/deb929369c9aaba728ae0c2c49dcd05bfecc8bf8/CHANGELOG.md
> > > > > > >
> > > > > > > Thanks,
> > > > > > > QP
> > > > > > >
> > > > >
> > >
> >
>


Re: [VOTE][RUST] Release Apache Arrow Rust 5.2.0 RC1

2021-08-13 Thread Neville Dipale
+1 (binding)

I ran the verification script on aarch64 macos

On Fri, 13 Aug 2021 at 07:45, QP Hou  wrote:

> +1 (non-binding)
>
> ran the verification script on Linux 5.4.0 x86_64
>
> On Thu, Aug 12, 2021 at 12:44 PM Andrew Lamb  wrote:
> >
> > Hi,
> >
> > I would like to propose a release of Apache Arrow Rust Implementation,
> > version 5.2.0.
> >
> > This release candidate is based on commit:
> > 7c98c4c60bc776acd09bd3568c6630d360e8d652 [1]
> >
> > The proposed release tarball and signatures are hosted at [2].
> >
> > The changelog is located at [3].
> >
> > Please download, verify checksums and signatures, run the unit tests,
> > and vote on the release. There is a script [4] that automates some of
> > the verification.
> >
> > The vote will be open for at least 72 hours.
> >
> > [ ] +1 Release this as Apache Arrow Rust
> > [ ] +0
> > [ ] -1 Do not release this as Apache Arrow Rust  because...
> >
> > [1]:
> >
> https://github.com/apache/arrow-rs/tree/7c98c4c60bc776acd09bd3568c6630d360e8d652
> > [2]:
> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-rs-5.2.0-rc1
> > [3]:
> >
> https://github.com/apache/arrow-rs/blob/7c98c4c60bc776acd09bd3568c6630d360e8d652/CHANGELOG.md
> > [4]:
> >
> https://github.com/apache/arrow-rs/blob/master/dev/release/verify-release-candidate.sh
>


Re: [VOTE][RUST] Release Apache Arrow Rust 5.1.0 RC1

2021-08-02 Thread Neville Dipale
+1

Ran ./dev/release/verify-release-candidate.sh 5.1.0 1
on aarch64-apple-darwin

Got

+ TEST_SUCCESS=yes
+ echo 'Release candidate looks good!'
Release candidate looks good!
+ exit 0
+ cleanup

On Fri, 30 Jul 2021 at 15:53, Wayne Xia  wrote:

> +1
>
> I ran this on Intel macOS Catalina:
> ./dev/release/verify-release-candidate.sh 5.1.0 1
>
> Got "Release candidate looks good!". Thanks.
>
> Sutou Kouhei  于 2021年7月30日周五 上午8:14写道:
>
> > +1
> >
> > I ran the following command line on Debian GNU/Linux sid:
> >
> >   dev/release/verify-release-candidate.sh 5.1.0 1
> >
> >
> > Thanks,
> > --
> > kou
> >
> > In 
> >   "[VOTE][RUST] Release Apache Arrow Rust 5.1.0 RC1" on Thu, 29 Jul 2021
> > 17:16:31 -0400,
> >   Andrew Lamb  wrote:
> >
> > > Hi,
> > >
> > > I would like to propose a release of Apache Arrow Rust Implementation,
> > > version 5.1.0.
> > >
> > > This release candidate is based on commit:
> > > 64c78b89fd170a152f9e509dfa36d44685c0dc90 [1]
> > >
> > > The proposed release tarball and signatures are hosted at [2].
> > >
> > > The changelog is located at [3].
> > >
> > > Please download, verify checksums and signatures, run the unit tests,
> > > and vote on the release. There is a script [4] that automates some of
> > > the verification.
> > >
> > > The vote will be open for at least 72 hours.
> > >
> > > [ ] +1 Release this as Apache Arrow Rust
> > > [ ] +0
> > > [ ] -1 Do not release this as Apache Arrow Rust  because...
> > >
> > > [1]:
> > >
> >
> https://github.com/apache/arrow-rs/tree/64c78b89fd170a152f9e509dfa36d44685c0dc90
> > > [2]:
> > https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-rs-5.1.0-rc1
> > > [3]:
> > >
> >
> https://github.com/apache/arrow-rs/blob/64c78b89fd170a152f9e509dfa36d44685c0dc90/CHANGELOG.md
> > > [4]:
> > >
> >
> https://github.com/apache/arrow-rs/blob/master/dev/release/verify-release-candidate.sh
> >
>


Re: [ANNOUNCE] New Arrow PMC member: Neville Dipale

2021-07-30 Thread Neville Dipale
Thank you everyone :)

On Fri, 30 Jul 2021 at 12:38, David Li  wrote:

> Congrats, Neville!
>
> -David
>
> On Fri, Jul 30, 2021, at 02:23, Daniël Heres wrote:
> > Congrats Neville!
> >
> > On Fri, Jul 30, 2021, 08:21 Jorge Cardoso Leitão <
> jorgecarlei...@gmail.com>
> > wrote:
> >
> > > Congratulations, Neville :)
> > >
> > > On Fri, Jul 30, 2021 at 8:18 AM QP Hou  wrote:
> > >
> > > > Well deserved, congratulations Neville!
> > > >
> > > > On Thu, Jul 29, 2021 at 3:20 PM Wes McKinney 
> > > wrote:
> > > > >
> > > > > The Project Management Committee (PMC) for Apache Arrow has invited
> > > > > Neville Dipale to become a PMC member and we are pleased to
> announce
> > > > > that Neville has accepted.
> > > > >
> > > > > Congratulations and welcome!
> > > >
> > >
> >


Re: Delta Lake support for DataFusion

2021-06-09 Thread Neville Dipale
The correct approach might be to improve DataFusion support in
delta-rs. TableProvider is already implemented here:
https://github.com/delta-io/delta-rs/blob/main/rust/src/delta_datafusion.rs

I've pinged QP to ask for their advice.

Neville

On Wed, 9 Jun 2021 at 19:58, Andrew Lamb  wrote:

> I think the idea of DataFusion + DeltaLake is quite compelling and likely
> useful.
>
> However, I think DataFusion is ideally an  "embeddable query engine" rather
> than a database system in itself, so in that mental model Delta Lake
> integration belongs somewhere other than the core DataFusion crate.
>
> My ideal structure would be a new crate (maybe not even part of the Apache
> Arrow Project), perhaps called `datafusion-delta-rs`, that contained the
> TableProvider and whatever else was needed to integrate DataFusion with
> DeltaLake
>
> This structure could also start a pattern of publishing plugins for
> DataFusion separately from the core.
>
> Andrew
> p.s. now that Arrow is publishing more incrementally (e.g. 4.1.0, 4.2.0,
> etc), I think delta-rs[1] and datafusion both only specify `4.x` so they
> should work together nicely
>
> https://github.com/delta-io/delta-rs/blame/main/rust/Cargo.toml
>
> On Wed, Jun 9, 2021 at 2:29 AM Daniël Heres  wrote:
>
> > Hi all,
> >
> > I would like to receive some feedback about adding Delta Lake support to
> > DataFusion (https://github.com/apache/arrow-datafusion/issues/525).
> > As you might know, Delta Lake  is a format adding
> > features like ACID transactions, statistics, and storage optimization to
> > Parquet and is getting quite some traction for managing data lakes.
> > It seems a great feature to have in DataFusion as well.
> >
> > The delta-rs  project provides a
> > native, Apache licensed, Rust implementation of Delta Lake, already
> > supporting a large part of the format and operations.
> >
> > The first integration I would like to propose is adding read support via
> a
> > new TableProvider. There might be some work to do around dependencies as
> > both DataFusion and delta-rs rely on (certain versions of) Arrow and
> > Parquet.
> >
> > Let me know if you have any further ideas or concerns.
> >
> > Best regards,
> >
> > Daniël Heres
> >
>


Re: [DISCUSS] Moving the format directory to arrow-format repository

2021-05-02 Thread Neville Dipale
Hi,

Thanks for the feedback, in light of what's been said, I'm also now fine
with
leaving the format as is.

Changes to the format are visible enough that we shouldn't miss them, as
there's normally be a discussion in the ML.

Regards
Neville

On Wed, 28 Apr 2021 at 17:31, Jorge Cardoso Leitão 
wrote:

> Hi,
>
> imo the time-scale of changes in the format is too large to justify the
> complexity. I also think that we should not force users to clone or
> submodule the repo to even compile the crate.
>
> What if we just do not have the format files there at all, and instead just
> keep the generated code? Updates to the format are manually performed (i.e.
> instead of copying the file, we run the build against latest). This is what
> we already do in practice, anyways: whenever the format changed, tonic
> would auto-generate new rust code that we would commit before any change.
>
> This avoids having copies of files between repos, which imo confuses people
> into which one is the official one (even when they are all equal, people
> will not do diffs in their heads).
>
> Best,
> Jorge
>
>
>
> On Wed, Apr 28, 2021 at 12:13 PM Andrew Lamb  wrote:
>
> > I also think manually copying the format .fbs files to arrow-rs is
> probably
> > ok for the time being.
> >
> > Once Arrow gets to the point where many implementations that need
> > format.fbs live in many different repos, pulling out the format files
> into
> > their own repo might be worth reconsidering.
> >
> > Andrew
> >
> > On Tue, Apr 27, 2021 at 5:45 PM Wes McKinney 
> wrote:
> >
> > > I wouldn't be too excited about this. Here are my thoughts:
> > >
> > > 1. Having the format/ directory in apache/arrow be a submodule would be
> > > cumbersome and error-prone for developers. The only submodules we have
> > > right now are optional testing dependencies — not having these
> > initialized
> > > and updated does not result in a broken project, whereas this change
> > would.
> > > We have a copy of parquet.thrift from apache/parquet-format for similar
> > > reasons.
> > >
> > > 2. So based on #1, we would want to maintain a copy of the format files
> > in
> > > apache/arrow, even if there were a separate apache/arrow-format
> > repository.
> > > The format files are slow-moving enough that I don't think it's
> > burdensome
> > > to mirror these into satellite repositories like arrow/arrow-rs.
> > >
> > > On Tue, Apr 27, 2021 at 10:54 AM Neville Dipale  >
> > > wrote:
> > >
> > > > Hi Arrow devs,
> > > >
> > > > Andy noticed that we carry a copy of the format directory in
> arrow-rs,
> > > > which
> > > > is bound to get outdated in the future.
> > > >
> > > > We would like to propose creating an arrow-format repository, similar
> > to
> > > > parquet-format, so that arrow-rs and other future separate
> repositories
> > > > could
> > > > add this as a submodule.
> > > >
> > > > What are your thoughts?
> > > >
> > > > Regards
> > > > Neville
> > > >
> > >
> >
>


[DISCUSS] Moving the format directory to arrow-format repository

2021-04-27 Thread Neville Dipale
Hi Arrow devs,

Andy noticed that we carry a copy of the format directory in arrow-rs,
which
is bound to get outdated in the future.

We would like to propose creating an arrow-format repository, similar to
parquet-format, so that arrow-rs and other future separate repositories
could
add this as a submodule.

What are your thoughts?

Regards
Neville


Re: [VOTE] Move Rust components to new repos and process

2021-04-14 Thread Neville Dipale
Hi Arrow devs,

+1

On Thu, 15 Apr 2021 at 05:28, Jorge Cardoso Leitão 
wrote:

> +1 binding.
>
> On Thu, Apr 15, 2021 at 4:10 AM Micah Kornfield 
> wrote:
>
> > +1 binding. Does this also cover other changes (issue management) +1 to
> > those as well.
> >
> > On Wednesday, April 14, 2021, Andy Grove  wrote:
> >
> > > This vote is to determine if the Arrow PMC is in favor of the Rust
> > > community moving the Rust implementation of Apache Arrow as well as the
> > > related projects (such as Parquet, DataFusion, Ballista, etc) out of
> the
> > > monorepo and into two new repositories, as outlined in the proposal
> > > document [1].
> > >
> > > Please vote whether to accept the proposal and allow the Rust community
> > to
> > > proceed with the work.
> > >
> > > The vote will be open for at least 72 hours.
> > >
> > > [ ] +1 : Accept the proposal
> > >
> > > [ ] 0 : No opinion
> > >
> > > [ ] -1 : Reject proposal because...
> > >
> > > Here is my vote: +1
> > >
> > > Thanks,
> > >
> > > Andy.
> > >
> > > [1]
> > >
> >
> https://docs.google.com/document/d/1TyrUP8_UWXqk97a8Hvb1d0UYWigch0HAephIj
> > > W7soSI/edit?usp=sharing
> > >
> >
>


Re: [VOTE] Accept donation of Rust Ballista project

2021-03-21 Thread Neville Dipale
+1 (binding)

On Sun, 21 Mar 2021 at 19:48, Neal Richardson 
wrote:

> +1 (binding)
>
> Neal
>
> On Sun, Mar 21, 2021 at 10:38 AM Francis Du  wrote:
>
> > I have been following Ballista for a long time. This is an exciting
> thing.
> > I am also very happy to contribute to this.
> >
> > So +1
> >
> > Regards,
> > Fracnis
> >
> > On Sun, 21 Mar 2021 at 23:56, Andy Grove  wrote:
> >
> > > Dear all,
> > >
> > > On behalf of the Ballista community, I would like to propose that we
> > donate
> > > Ballista to the Apache Arrow project.
> > >
> > > Ballista is a distributed scheduler based on Arrow standards (memory
> > > format, IPC, Flight) and supports distributed query execution with the
> > > DataFusion query engine.
> > >
> > > The community has had an opportunity to discuss this [1] and there do
> not
> > > seem to be objections to this.
> > >
> > > The code donation in the form of a pull request:
> > >
> > > https://github.com/apache/arrow/pull/9723
> > >
> > > This vote is to determine if the Arrow PMC is in favor of accepting
> this
> > > donation. If the vote passes, the PMC and the authors of the code will
> > work
> > > together to complete the ASF IP Clearance process (
> > > http://incubator.apache.org/ip-clearance/) and import this Rust
> codebase
> > > implementation into Apache Arrow.
> > >
> > > [ ] +1 : Accept contribution of Ballista [ ] 0 : No opinion [ ] -1 :
> > Reject
> > > contribution because...
> > >
> > > Here is my vote: +1
> > >
> > > The vote will be open for at least 72 hours.
> > >
> > > Thanks,
> > >
> > > Andy.
> > >
> > > [1]
> > >
> > >
> >
> https://lists.apache.org/x/thread.html/r09556898c9c94259c00e35c04ea051040931bbe9ce577cba60c148c8@%3Cdev.arrow.apache.org%3E
> > >
> >
>


Re: [RUST] Fields and schema metadata

2021-02-06 Thread Neville Dipale
We had to use the BTreeMap because a HashMap doesn't implement Hash, so
can't be used in the Field.

The easiest way to see this, is to replace it with a HashMap, and try
compile the arrow crate.

Neville

On Sat, 06 Feb 2021, 13:50 Fernando Herrera, 
wrote:

> Hi all, Is there a reason why the Field metadata is a BTreeMap and Schema's
> metadata is a HashMap?
>
> I'm just curious why different structures were selected for the same thing.
> Sorry if this is explained somewhere in the code, but I couldn't find
> anything about it.
>
> Fernando,
>


Re: [VOTE] Release Apache Arrow 3.0.0 - RC2

2021-01-22 Thread Neville Dipale
Hi Krisztian,

The full output is at
https://gist.github.com/nevi-me/88a6279dd90aea30aa4caaa15fb0cc53

I also ran dev/release/verify-release-candidate-wheels.bat 3.0.0 2

Getting the below error, it seems to be a Python 3.7 bug, but I'm not yet
finding a solution for it online.

(C:\tmp\arrow-verify-release-wheels\_verify-wheel-3.6)
C:\tmp\arrow-verify-release-wheels>pip install
pyarrow-3.0.0-cp36-cp36m-win_amd64.whl   || EXIT /B 1
Failed to import the site module
Traceback (most recent call last):
  File "C:\Users\nevi\anaconda3\lib\site.py", line 579, in 
main()
  File "C:\Users\nevi\anaconda3\lib\site.py", line 566, in main
known_paths = addsitepackages(known_paths)
  File "C:\Users\nevi\anaconda3\lib\site.py", line 349, in addsitepackages
addsitedir(sitedir, known_paths)
  File "C:\Users\nevi\anaconda3\lib\site.py", line 207, in addsitedir
addpackage(sitedir, name, known_paths)
  File "C:\Users\nevi\anaconda3\lib\site.py", line 159, in addpackage
f = open(fullname, "r")
  File "C:\Users\nevi\anaconda3\lib\_bootlocale.py", line 12, in
getpreferredencoding
if sys.flags.utf8_mode:
AttributeError: 'sys.flags' object has no attribute 'utf8_mode'

(C:\tmp\arrow-verify-release-wheels\_verify-wheel-3.6)
C:\tmp\arrow-verify-release-wheels>if errorlevel 1 GOTO error

(C:\tmp\arrow-verify-release-wheels\_verify-wheel-3.6)
C:\tmp\arrow-verify-release-wheels>call deactivate
DeprecationWarning: 'deactivate' is deprecated. Use 'conda deactivate'.

(C:\tmp\arrow-verify-release-wheels\_verify-wheel-3.6)
C:\tmp\arrow-verify-release-wheels>conda.bat deactivate

C:\tmp\arrow-verify-release-wheels>cd C:\Users\nevi\Work\oss\arrow

C:\Users\nevi\Work\oss\arrow>EXIT /B 1

On Fri, 22 Jan 2021 at 15:14, Krisztián Szűcs 
wrote:

> Thanks Neville for testing it!
>
> There should be more context about the failures above the summary.
> Could you please post the errors?
>
>
> On Fri, Jan 22, 2021 at 2:05 PM Neville Dipale 
> wrote:
> >
> > (+0 non-binding)
> >
> > Getting test failures (see end of my mail).
> >
> > This is my first time verifying (Windows 10; Insider Preview if
> relevant),
> > so I'm
> > likely missing something in my config. I'll read the verification script
> > and try again.
> >
> > I ran the below using PowerShell:
> >
> > $env:ARROW_GANDIVA=0; $env:ARROW_PLASMA=0; $env:TEST_DEFAULT=0;
> > $env:TEST_SOURCE=1; $env:TEST_CPP=1; $env:TEST_PYTHON=1;
> > $env:TEST_JAVA=1; $env:TEST_INTEGRATION_CPP=1;
> > $env:TEST_INTEGRATION_JAVA=1;
> > ./dev/release/verify-release-candidate.bat 3.0.0 2
> >
> > I had to change cmake generator to use the below (in line 53):
> >
> > set GENERATOR=Visual Studio 16 2019
> >
> > VS 2017 wasn't working for me, even after installing its build tools.
> >
> > 
> >
> > I have 3 test failures per below:
> >
> > 94% tests passed, 3 tests failed out of 52
> >
> > Label Time Summary:
> > arrow-tests   =  66.55 sec*proc (30 tests)
> > arrow_compute =   3.38 sec*proc (4 tests)
> > arrow_dataset =   0.89 sec*proc (9 tests)
> > arrow_flight  =   0.92 sec*proc (1 test)
> > arrow_python-tests=   0.45 sec*proc (1 test)
> > filesystem=   6.36 sec*proc (2 tests)
> > parquet-tests =   7.22 sec*proc (7 tests)
> > unittest  =  79.41 sec*proc (52 tests)
> >
> > Total Test time (real) =  80.27 sec
> >
> > The following tests FAILED:
> >  45 - arrow-python-test (Failed)
> >  46 - parquet-internals-test (Failed)
> >  49 - parquet-arrow-test (Failed)
> > Errors while running CTest
> >
> > I don't know if this has any significance, I got the errors from 2 runs.
> >
> > Neville
> >
> > On Fri, 22 Jan 2021 at 14:22, Neville Dipale 
> wrote:
> >
> > > This is my first time verifying, do I also need to set the env vars
> below?
> > >
> > > ARROW_GANDIVA=0 ARROW_PLASMA=0 TEST_DEFAULT=0
> > > TEST_SOURCE=1 TEST_CPP=1 TEST_PYTHON=1 TEST_JAVA=1
> > > TEST_INTEGRATION_CPP=1 TEST_INTEGRATION_JAVA=1
> > >
> > > Otherwise, I'm currently running:
> > > ./dev/release/verify-release-candidate.bat 3.0.0 2
> > >
> > > Though it looks like it wants Visual Studio 2017. I should already have
> > > the build tools because I need them for Rust, so the error's odd. I'll
> keep
> > > debugging my environment, and revert back if I'm able to successfully
> > > verify
> > > on Windows.
> > >
> > > Neville
> > >
> > >

Re: [VOTE] Release Apache Arrow 3.0.0 - RC2

2021-01-22 Thread Neville Dipale
(+0 non-binding)

Getting test failures (see end of my mail).

This is my first time verifying (Windows 10; Insider Preview if relevant),
so I'm
likely missing something in my config. I'll read the verification script
and try again.

I ran the below using PowerShell:

$env:ARROW_GANDIVA=0; $env:ARROW_PLASMA=0; $env:TEST_DEFAULT=0;
$env:TEST_SOURCE=1; $env:TEST_CPP=1; $env:TEST_PYTHON=1;
$env:TEST_JAVA=1; $env:TEST_INTEGRATION_CPP=1;
$env:TEST_INTEGRATION_JAVA=1;
./dev/release/verify-release-candidate.bat 3.0.0 2

I had to change cmake generator to use the below (in line 53):

set GENERATOR=Visual Studio 16 2019

VS 2017 wasn't working for me, even after installing its build tools.



I have 3 test failures per below:

94% tests passed, 3 tests failed out of 52

Label Time Summary:
arrow-tests   =  66.55 sec*proc (30 tests)
arrow_compute =   3.38 sec*proc (4 tests)
arrow_dataset =   0.89 sec*proc (9 tests)
arrow_flight  =   0.92 sec*proc (1 test)
arrow_python-tests=   0.45 sec*proc (1 test)
filesystem=   6.36 sec*proc (2 tests)
parquet-tests =   7.22 sec*proc (7 tests)
unittest  =  79.41 sec*proc (52 tests)

Total Test time (real) =  80.27 sec

The following tests FAILED:
 45 - arrow-python-test (Failed)
 46 - parquet-internals-test (Failed)
 49 - parquet-arrow-test (Failed)
Errors while running CTest

I don't know if this has any significance, I got the errors from 2 runs.

Neville

On Fri, 22 Jan 2021 at 14:22, Neville Dipale  wrote:

> This is my first time verifying, do I also need to set the env vars below?
>
> ARROW_GANDIVA=0 ARROW_PLASMA=0 TEST_DEFAULT=0
> TEST_SOURCE=1 TEST_CPP=1 TEST_PYTHON=1 TEST_JAVA=1
> TEST_INTEGRATION_CPP=1 TEST_INTEGRATION_JAVA=1
>
> Otherwise, I'm currently running:
> ./dev/release/verify-release-candidate.bat 3.0.0 2
>
> Though it looks like it wants Visual Studio 2017. I should already have
> the build tools because I need them for Rust, so the error's odd. I'll keep
> debugging my environment, and revert back if I'm able to successfully
> verify
> on Windows.
>
> Neville
>
> On Fri, 22 Jan 2021 at 11:45, Krisztián Szűcs 
> wrote:
>
>> Could anyone verify the release on a windows machine?
>>
>> On Thu, Jan 21, 2021 at 4:43 AM Bryan Cutler  wrote:
>> >
>> > +1 (non-binding)
>> >
>> > I verified binaries and source with the following:
>> > ARROW_TMPDIR=/tmp/arrow-test ARROW_GANDIVA=0 ARROW_PLASMA=0
>> TEST_DEFAULT=0
>> > TEST_SOURCE=1 TEST_CPP=1 TEST_PYTHON=1 TEST_JAVA=1
>> TEST_INTEGRATION_CPP=1
>> > TEST_INTEGRATION_JAVA=1 dev/release/verify-release-candidate.sh source
>> > 3.0.0 2
>> >
>> > I also have Spark integration tests passing in PR
>> > https://github.com/apache/arrow/pull/9210
>> >
>> > On Wed, Jan 20, 2021 at 12:48 PM Sutou Kouhei 
>> wrote:
>> >
>> > > +1 (binding)
>> > >
>> > > I ran the followings on Debian GNU/Linux sid:
>> > >
>> > >   * TZ=UTC \
>> > >   ARROW_CMAKE_OPTIONS="-DBoost_NO_BOOST_CMAKE=ON" \
>> > >   CUDA_TOOLKIT_ROOT=/usr \
>> > >   dev/release/verify-release-candidate.sh source 3.0.0 2
>> > >   * dev/release/verify-release-candidate.sh binaries 3.0.0 2
>> > >   * LANG=C dev/release/verify-release-candidate.sh wheels 3.0.0 2
>> > >
>> > > with:
>> > >
>> > >   * gcc (Debian 10.2.1-6) 10.2.1 20210110
>> > >   * openjdk version "11.0.10-ea" 2021-01-19
>> > >   * nvidia-cuda-dev 11.1.1-3
>> > >
>> > > Thanks,
>> > > --
>> > > kou
>> > >
>> > > In > geq7psj25mtp...@mail.gmail.com>
>> > >   "[VOTE] Release Apache Arrow 3.0.0 - RC2" on Tue, 19 Jan 2021
>> 04:49:13
>> > > +0100,
>> > >   Krisztián Szűcs  wrote:
>> > >
>> > > > Hi,
>> > > >
>> > > > I would like to propose the following release candidate (RC2) of
>> Apache
>> > > > Arrow version 3.0.0. This is a release consisting of 678
>> > > > resolved JIRA issues[1].
>> > > >
>> > > > This release candidate is based on commit:
>> > > > d613aa68789288d3503dfbd8376a41f2d28b6c9d [2]
>> > > >
>> > > > The source release rc2 is hosted at [3].
>> > > > The binary artifacts are hosted at [4][5][6][7].
>> > > > The changelog is located at [8].
>> > > >
>> > > > Please download, verify checksums and signatures, run t

Re: [VOTE] Release Apache Arrow 3.0.0 - RC2

2021-01-22 Thread Neville Dipale
This is my first time verifying, do I also need to set the env vars below?

ARROW_GANDIVA=0 ARROW_PLASMA=0 TEST_DEFAULT=0
TEST_SOURCE=1 TEST_CPP=1 TEST_PYTHON=1 TEST_JAVA=1
TEST_INTEGRATION_CPP=1 TEST_INTEGRATION_JAVA=1

Otherwise, I'm currently running:
./dev/release/verify-release-candidate.bat 3.0.0 2

Though it looks like it wants Visual Studio 2017. I should already have
the build tools because I need them for Rust, so the error's odd. I'll keep
debugging my environment, and revert back if I'm able to successfully verify
on Windows.

Neville

On Fri, 22 Jan 2021 at 11:45, Krisztián Szűcs 
wrote:

> Could anyone verify the release on a windows machine?
>
> On Thu, Jan 21, 2021 at 4:43 AM Bryan Cutler  wrote:
> >
> > +1 (non-binding)
> >
> > I verified binaries and source with the following:
> > ARROW_TMPDIR=/tmp/arrow-test ARROW_GANDIVA=0 ARROW_PLASMA=0
> TEST_DEFAULT=0
> > TEST_SOURCE=1 TEST_CPP=1 TEST_PYTHON=1 TEST_JAVA=1 TEST_INTEGRATION_CPP=1
> > TEST_INTEGRATION_JAVA=1 dev/release/verify-release-candidate.sh source
> > 3.0.0 2
> >
> > I also have Spark integration tests passing in PR
> > https://github.com/apache/arrow/pull/9210
> >
> > On Wed, Jan 20, 2021 at 12:48 PM Sutou Kouhei 
> wrote:
> >
> > > +1 (binding)
> > >
> > > I ran the followings on Debian GNU/Linux sid:
> > >
> > >   * TZ=UTC \
> > >   ARROW_CMAKE_OPTIONS="-DBoost_NO_BOOST_CMAKE=ON" \
> > >   CUDA_TOOLKIT_ROOT=/usr \
> > >   dev/release/verify-release-candidate.sh source 3.0.0 2
> > >   * dev/release/verify-release-candidate.sh binaries 3.0.0 2
> > >   * LANG=C dev/release/verify-release-candidate.sh wheels 3.0.0 2
> > >
> > > with:
> > >
> > >   * gcc (Debian 10.2.1-6) 10.2.1 20210110
> > >   * openjdk version "11.0.10-ea" 2021-01-19
> > >   * nvidia-cuda-dev 11.1.1-3
> > >
> > > Thanks,
> > > --
> > > kou
> > >
> > > In  >
> > >   "[VOTE] Release Apache Arrow 3.0.0 - RC2" on Tue, 19 Jan 2021
> 04:49:13
> > > +0100,
> > >   Krisztián Szűcs  wrote:
> > >
> > > > Hi,
> > > >
> > > > I would like to propose the following release candidate (RC2) of
> Apache
> > > > Arrow version 3.0.0. This is a release consisting of 678
> > > > resolved JIRA issues[1].
> > > >
> > > > This release candidate is based on commit:
> > > > d613aa68789288d3503dfbd8376a41f2d28b6c9d [2]
> > > >
> > > > The source release rc2 is hosted at [3].
> > > > The binary artifacts are hosted at [4][5][6][7].
> > > > The changelog is located at [8].
> > > >
> > > > Please download, verify checksums and signatures, run the unit tests,
> > > > and vote on the release. See [9] for how to validate a release
> candidate.
> > > >
> > > > The vote will be open for at least 72 hours.
> > > >
> > > > [ ] +1 Release this as Apache Arrow 3.0.0
> > > > [ ] +0
> > > > [ ] -1 Do not release this as Apache Arrow 3.0.0 because...
> > > >
> > > > [1]:
> > >
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20status%20in%20%28Resolved%2C%20Closed%29%20AND%20fixVersion%20%3D%203.0.0
> > > > [2]:
> > >
> https://github.com/apache/arrow/tree/d613aa68789288d3503dfbd8376a41f2d28b6c9d
> > > > [3]:
> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-3.0.0-rc2
> > > > [4]: https://bintray.com/apache/arrow/centos-rc/3.0.0-rc2
> > > > [5]: https://bintray.com/apache/arrow/debian-rc/3.0.0-rc2
> > > > [6]: https://bintray.com/apache/arrow/python-rc/3.0.0-rc2
> > > > [7]: https://bintray.com/apache/arrow/ubuntu-rc/3.0.0-rc2
> > > > [8]:
> > >
> https://github.com/apache/arrow/blob/d613aa68789288d3503dfbd8376a41f2d28b6c9d/CHANGELOG.md
> > > > [9]:
> > >
> https://cwiki.apache.org/confluence/display/ARROW/How+to+Verify+Release+Candidates
> > >
>


Re: Release 3.0 timeline?

2021-01-16 Thread Neville Dipale
Hi Arrow devs,

There's some bugs in the Parquet implementation which affect reading of
data:

- https://issues.apache.org/jira/browse/ARROW-11269, which was opened today,
and I just saw now.
- an issue with list schema nulls from the parquet-format's logical types.
In this
case, we misinterpret the nullness of lists read from parquet-mr,
potentially leading
to incorrect data being read.

I discovered the second bug while bashing my head trying to fix a bug in
the
Parquet writer (sadly spent very long on it).

Anyways, I would like to work on PRs for the above 2 bugs tonight and
tomorrow.

@Krisztián @Andrew Lamb  , would we still be able to
merge them in time?
I've also seen the offset issues in equality checks, and am going to
review/help
out with them tomorrow.

I haven't been feeling very well this week, so I haven't been spending much
time
working on Arrow.

Thanks
Neville


On Sat, 16 Jan 2021 at 16:34, Krisztián Szűcs 
wrote:

> On Sat, Jan 16, 2021 at 12:51 PM Andrew Lamb  wrote:
> >
> > I just saw the RC0 candidate email -- thanks Krisztián.
> >
> > Does the RC0 mean that any subsequent merges to master can now proceed
> > without affecting the 3.0.0 branch?
> Technically we don't have a 3.0 release branch, but we can always create
> one.
> So yes, the merges can proceed on master.
>
> Thanks, Krisztian
> >
> > On Fri, Jan 15, 2021 at 10:22 AM Krisztián Szűcs <
> szucs.kriszt...@gmail.com>
> > wrote:
> >
> > > The spark integration test fails against spark 3.0.1 with
> > >
> > > 12:21:51.996 WARN org.apache.spark.scheduler.TaskSetManager: Lost task
> > > 1.0 in stage 0.0 (TID 1, 5fc0f8cfe8d2, executor driver):
> > > java.lang.NoClassDefFoundError: Could not initialize class
> > > org.apache.spark.sql.util.ArrowUtils$
> > > ...
> > > Caused by: java.lang.RuntimeException: No DefaultAllocationManager
> > > found on classpath. Can't allocate Arrow buffers. Please consider
> > > adding arrow-memory-netty or arrow-memory-unsafe as a dependency.
> > >
> > > Since this change was introduced in
> > >
> > >
> https://github.com/apache/arrow/commit/2092e18752a9c0494799493b12eb1830052217a2
> > > which is already a part of arrow's 2.0 release, I guess this is not a
> > > blocker (or at least the changes are required on spark's side?).
> > >
> > > Either way, I'm going to proceed with the release.
> > >
> > >
> > > On Fri, Jan 15, 2021 at 2:53 PM Andrew Lamb 
> wrote:
> > > >
> > > > That is great news  Krisztián -- thank you
> > > >
> > > > On Fri, Jan 15, 2021 at 6:50 AM Krisztián Szűcs <
> > > szucs.kriszt...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > My plan is to cut RC0 today, just want to make sure that the spark
> > > > > integration test works with spark's latest release.
> > > > >
> > > > > Thanks, Krisztian
> > > > >
> > > > > On Fri, Jan 15, 2021 at 12:35 PM Andrew Lamb  >
> > > wrote:
> > > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I apologize if I have missed this detail on previous emails; I
> > > wonder if
> > > > > > there is any estimate of when the Arrow 3.0 release might be
> > > finalized.
> > > > > >
> > > > > > The Rust implementation has a few PRs we have been holding off
> > > merging
> > > > > > until the release goes out and I wanted to know if there was any
> > > > > estimated
> > > > > > timeline.
> > > > > >
> > > > > > The wiki shows no blocking JIRA items (nice work everyone!) any
> > > longer:
> > > > > >
> > > https://cwiki.apache.org/confluence/display/ARROW/Arrow+3.0.0+Release
> > > > > >
> > > > > >
> > > > > >
> > > > > > Thanks,
> > > > > > Andrew
> > > > >
> > >
>


Re: Arrow 3.0 release

2021-01-13 Thread Neville Dipale
Good day,

I was hoping to complete the parquet list writer PR in time, but even
though I've been burning the midnight oil addressing the remaining
issues in the PR, I won't make it in time :(

I've removed the Rust blocker from the milestone, so there should be
nothing on my side blocking us.

Thanks everyone for the hard work.

Neville

On Wed, 13 Jan 2021 at 03:29, Neal Richardson 
wrote:

> Last call for 3.0.
> https://cwiki.apache.org/confluence/display/ARROW/Arrow+3.0.0+Release is
> closing out, and hopefully we'll be releasable by the end of tomorrow. If
> you're trying to get something merged in time for the release, now is the
> time!
>
> Thanks all for your hard work under challenging circumstances to get us to
> this point.
>
> Neal
>
> On Mon, Jan 11, 2021 at 11:48 AM Antoine Pitrou 
> wrote:
>
> >
> > I think it would be nice to get
> > https://github.com/apache/arrow/pull/9164 in because it changes the
> > current behaviour and is also stricter with its inputs.
> >
> > Regards
> >
> > Antoine.
> >
> >
> > Le 11/01/2021 à 20:46, Neal Richardson a écrit :
> > > Hi all,
> > > We seem to be getting closer on resolving our CI and packaging
> > challenges,
> > > despite several additional, unexpected setbacks last week. I think we
> > > should be in releasable condition in the next day or two. If you have
> any
> > > other outstanding issues you're trying to get in the 3.0 release, now
> is
> > > the time.
> > >
> > > We need a volunteer from the PMC to be release manager for 3.0.0.
> > Krisztián
> > > has done the last several releases, and I was wondering if there was
> > anyone
> > > else out there who wanted to take a turn and give him a break this
> time.
> > >
> > > Neal
> > >
> > > On Wed, Jan 6, 2021 at 9:03 PM Sutou Kouhei 
> wrote:
> > >
> > >> Hi Neal,
> > >>
> > >> Thanks!
> > >>
> > >>> ARROW-11155: [C++][Packaging] Move gandiva crossbow jobs off of
> > Travis-CI
> > >>
> > >> I closed this because it has been done by
> > >> https://issues.apache.org/jira/browse/ARROW-11015 .
> > >>
> > >>
> > >> Thanks,
> > >> --
> > >> kou
> > >>
> > >> In  gcx6bskna6udz...@mail.gmail.com>
> > >>   "Re: Arrow 3.0 release" on Wed, 6 Jan 2021 11:39:02 -0800,
> > >>   Neal Richardson  wrote:
> > >>
> > >>> I made some JIRAs for these issues:
> > >>>
> > >>> ARROW-11152: [CI][C++] Fix Homebrew numpy installation on macOS
> builds
> > >>> ARROW-11153: [C++][Packaging] Move debian/ubuntu/centos packaging off
> > of
> > >>> Travis-CI (assigned to Kou)
> > >>> ARROW-11154: [CI][C++] Move homebrew crossbow tests off of Travis-CI
> > >>> ARROW-11155: [C++][Packaging] Move gandiva crossbow jobs off of
> > Travis-CI
> > >>>
> > >>> On Wed, Jan 6, 2021 at 5:50 AM Andrew Wieteska <
> > >> andrew.r.wiete...@gmail.com>
> > >>> wrote:
> > >>>
> >  Hi Kou,
> > 
> >  For sure! I'll work on this.
> > 
> >  Best
> >  Andrew
> > 
> >  On Tue, Jan 5, 2021 at 8:06 PM Sutou Kouhei 
> > wrote:
> > 
> > > Hi Andrew,
> > >
> > > We need to fix nightly builds to release a new version.
> > >
> > > Could you check the latest nightly builds[1], open JIRA
> > > issues for not unaddressed failed builds and fix them one by
> > > one?
> > >
> > > [1]
> > >
> > 
> > >>
> >
> https://lists.apache.org/thread.html/ra17f2f9c2323498edd45665d78ce669ae90c0bbd888921f94cf9ca5c%40%3Cdev.arrow.apache.org%3E
> > >
> > > For example, wheel-win-* are worked at
> > > https://github.com/apache/arrow/pull/9096 . We don't need to
> > > open new JIRA issues for them.
> > >
> > >
> > > We also need to move away from Travis CI in nightly builds.
> > > See also:
> > >
> > 
> > >>
> >
> https://lists.apache.org/thread.html/r43ddf0ed5a3d21831edfa20bbe87514dc433e17ec86199c5a4cb80d4%40%3Cdev.arrow.apache.org%3E
> > >
> > > We need to change "ci: travis" in
> > > https://github.com/apache/arrow/blob/master/dev/tasks/tasks.yml
> > > for it.
> > >
> > > wheel-osx-* are worked at
> > > https://github.com/apache/arrow/pull/8915 .
> > >
> > > I'll work on debian-*, ubuntu-* and centos-*.
> > >
> > > homebrew-* aren't unaddressed. Could you also work on them?
> > >
> > >
> > > Thanks,
> > > --
> > > kou
> > >
> > >
> > > In  > >> fn3mbvhdd4yfqoi1havjrnkho0nwvv6qdf...@mail.gmail.com>
> > >   "Re: Arrow 3.0 release" on Mon, 4 Jan 2021 19:45:32 -0500,
> > >   Andrew Wieteska  wrote:
> > >
> > >> Dear all,
> > >>
> > >> I will have time and would be happy to help with the release if
> the
> > >> PMC
> > >> release manager could use a second person. I don't have experience
> > >> with
> > > the
> > >> release process so I might need some detailed pointers on where
> and
> >  how I
> > >> can help - but if I can help to relieve the load on the release
> >  manager,
> > >> I'm available!
> > >>
> > >> All the best
> > 

Re: [ANNOUNCE] New Arrow committer: Andrew Lamb

2020-11-10 Thread Neville Dipale
Congrats Andrew! Look forward to working more with you

On Tue, 10 Nov 2020, 18:07 Jorge Cardoso Leitão, 
wrote:

> Congrats, Andrew!
>
> Andrew has been doing an amazing job, both on the implementation but also
> at reviewing and helping others. He taught me a lot, I am having a great
> time working with him and I am thus really happy about this.
>
> Best,
> Jorge
>
>
> On Tue, Nov 10, 2020 at 4:42 PM Andy Grove  wrote:
>
> > On behalf of the Arrow PMC, I'm happy to announce that Andrew Lamb has
> > accepted an invitation to become a committer on Apache Arrow.
> >
> > Welcome, and thank you for your contributions!
> >
>


[Rust] Merging the Parquet Arrow Branch

2020-10-26 Thread Neville Dipale
Hi Arrow devs,
We've been working on a Parquet writer on a separate branch, mainly to
expedite
merging PRs in case there weren't enough reviewers.I wasn't comfortable
merging
the work by 2.0, so I opted not to get it merged for the release.

There are people who have expressed an interest in using the WIP writer,
and at
the same time, we're starting to diverge to the point where rebases are
becoming
a lot of work for me.

What process should I follow to get the changes merged in?
The branch has 9 commits + 1 which should be merged in the next 2 days.
I'd like to preserve the commit history if possible, instead of squashing
them.

Thanks
Neville


Re: [VOTE] Accept donation of Julia implementation for Apache Arrow

2020-10-12 Thread Neville Dipale
Good day 

+1

On Mon, 12 Oct 2020, 22:35 Neal Richardson, 
wrote:

> Hi all,
> Last month [1] Jacob Quinn proposed donating Arrow.jl, a Julia
> implementation of Arrow, to the Apache Arrow project. The community has had
> an opportunity to discuss this and there do not seem to be objections.
> There is a pull request now available:
>
> https://github.com/apache/arrow/pull/8448
>
> This vote is to determine if the Arrow PMC is in favor of accepting this
> donation. If the vote passes, the PMC and the authors of the code will work
> together to complete the ASF IP Clearance process
> (https://incubator.apache.org/ip-clearance/) and import this Julia
> implementation into Apache Arrow.
>
> [ ] +1 : Accept contribution of Arrow.jl
> [ ]  0 : No opinion
> [ ] -1 : Reject contribution because...
>
> Here is my vote: +1
>
> The vote will be open for at least 72 hours.
>
> Thanks,
> Neal
>
> [1]:
>
> https://lists.apache.org/thread.html/r5f6d0525b3e83de0f7faa2f91a844f1b40c78da4da25a8c0242f5624%40%3Cdev.arrow.apache.org%3E
>


Re: [VOTE][Format] Allow for 256-bit Decimal's in the Arrow specification

2020-09-29 Thread Neville Dipale
+1 (non-binding)

Rust support is behind, but we'll catch up at some point.

On Wed, 30 Sep 2020 at 03:10, Holden Karau  wrote:

> +1 (non-binding)
>
> On Tue, Sep 29, 2020 at 6:08 PM Sutou Kouhei  wrote:
>
> > +1
> >
> > In 
> >   "Re: [VOTE][Format] Allow for 256-bit Decimal's in the Arrow
> > specification" on Tue, 29 Sep 2020 13:38:04 -0700,
> >   Jacques Nadeau  wrote:
> >
> > > +1
> > >
> > > On Tue, Sep 29, 2020 at 11:19 AM Wes McKinney 
> > wrote:
> > >
> > >> +1
> > >>
> > >> On Tue, Sep 29, 2020 at 4:07 AM Fan Liya 
> wrote:
> > >> >
> > >> > +1
> > >> >
> > >> > Best,
> > >> > Liya Fan
> > >> >
> > >> > On Tue, Sep 29, 2020 at 4:55 PM Antoine Pitrou 
> > >> wrote:
> > >> >
> > >> > >
> > >> > > +1 (binding)
> > >> > >
> > >> > > I didn't look at the implementation.
> > >> > >
> > >> > > Regards
> > >> > >
> > >> > > Antoine.
> > >> > >
> > >> > >
> > >> > > Le 29/09/2020 à 06:54, Micah Kornfield a écrit :
> > >> > > > I've opened a PR that updates the specification to allow for
> > 256-bit
> > >> > > > Decimal types [1].  It updates both schema.fbs and the C-ABI to
> > >> document
> > >> > > > this support.
> > >> > > >
> > >> > > > The decimal256 branch [2] contains implementations in Java and
> C++
> > >> and
> > >> > > > updates to the integration test to demonstrate interoperability.
> > If
> > >> this
> > >> > > > vote passes I will open up a PR to merge the decimal256 branch
> to
> > >> master
> > >> > > > and we can handle any specific concerns for the implementations
> on
> > >> that
> > >> > > > PR.  This vote is specifically about approving and merging [1].
> > >> > > >
> > >> > > > The vote will remain open for at least 72 hours.
> > >> > > >
> > >> > > > [ ] +1: Accept 256 bit Decimals as part of the specification
> > >> > > > [ ] +0:
> > >> > > > [ ] -1: I don't think this is a good idea because ...
> > >> > > >
> > >> > > > Thanks,
> > >> > > > Micah
> > >> > > >
> > >> > > > [1] https://github.com/apache/arrow/pull/8293
> > >> > > > [2] https://github.com/apache/arrow/tree/decimal256
> > >> > > >
> > >> > >
> > >>
> >
>
>
> --
> Twitter: https://twitter.com/holdenkarau
> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9  
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>


Re: 2.0.0 release timeline: October 9

2020-09-29 Thread Neville Dipale
Hi Neal,

I've pruned the Rust backlog a bit, but only changed PRs that I've either
opened,
or those that I presume nobody is currently working on.

There's 2 major features I've been working on:

1. Writing Arrow data to Parquet (separate branch)
2. Integration testing

I'll prioritise 2 as that's on the main branch, and we're nearly there.
However, with the arrow parquet writer, I'd like to ask the Rust developers
if
we will want to release the WIP support with 2.0.0, or if we hold back and
keep
chipping away from a separate branch.

There's been a new contributor who's offered to help with the writer,
so I think I'll be able to make more progress with her.

Neville

On Wed, 30 Sep 2020 at 00:34, Neal Richardson 
wrote:

> Hi folks,
> As has been discussed in the biweekly meetings (and in the notes from those
> meetings here on the mailing list), we're looking at an October timeline
> for our next release since we are going about 3 months between releases. So
> that we might get the release voted on and shipped by the middle of the
> month, we should aim to be ready to cut our first (and final!) release
> candidate by next Friday, October 9.
>
> According to
> https://cwiki.apache.org/confluence/display/ARROW/Arrow+2.0.0+Release,
> there are still 178 issues tagged for 2.0 that are not yet started. That
> seems... ambitious. Please do go through the backlog and push to the next
> release (i.e. 3.0.0) unassigned issues that aren't likely to land in the
> next 10 days.
>
> Likewise, I see that there are a few issues tagged as "blocker". Let's
> determine whether those truly should prevent a release candidate from being
> made, and if so, let's make sure they get done ASAP.
>
> Neal
>


Re: [Rust] Arrow SQL Adapters/Connectors

2020-09-27 Thread Neville Dipale
Thanks for the feedback

My interest is mainly in the narrow usecase of reading and writing batch
data,
so I wouldn't want to deal with producing and consuming rows per se.
Andy has worked on RDBC (https://github.com/tokio-rs/rdbc) for the
row-based or OLTP case,
and I'm considering something more suitable for the OLAP case.

@Wes I'll have a read through the Python DB API, I've also been looking at
JDBC
as well as how Apache Spark manages to get such good performance from JDBC.

I haven't been an ODBC fan, but mainly because of historic struggles with
getting it to work
on Linux envs where I don't have system control. WIth that said, we could
still support ODBC.

@Jorge, I have an implementation at rust-dataframe (
https://github.com/nevi-me/rust-dataframe/tree/master/src/io/sql/postgres)
which uses rust-postgres. I however don't use the row-based API as that
comes at
a serialization cost (going from bytes > Rust types > Arrow).
I instead use the
Postgres binary format (
https://github.com/nevi-me/rust-dataframe/blob/master/src/io/sql/postgres/reader.rs#L204
).
That postgres module would be the starting point of such separate crate.

For Postgres <> Arrow type conversions, I leverage 2 methods:

1. When reading a table, we I get schema from the *information_schema* system
table
2. When reading a query, I issue the query with a 1-row limit, and convert
the row's schema to an Arrow schema

@Adam I think async and pooling would be attainable yes, if an underlying
SQL crate
uses R2D2 for pooling, an API that supports that could be provided.

In summary, I'm thinking along the lines of:

* A reader that takes connection parameters & a query or table
* The reader can handle partitioning if need be (similar to how Spark does
it)
* The reader returns a Schema, and can be iterated on to return data in
batches

* A writer that takes connection parameters and a table
* The writer writes batches to a table, and is able to write batches in
parallel

In the case of a hypothetical interfacing with column databases like
Clickhouse,
we would be able to levarage materialising arrows from columns, instead of
the
potential column-wise conversions that can be performed from row-based APIs.

Neville


On Sun, 27 Sep 2020 at 22:08, Adam Lippai  wrote:

> One more universal approach is to use ODBC, this is a recent Rust
> conversation (with example) on the topic:
> https://github.com/Koka/odbc-rs/issues/140
>
> Honestly I find the Python DB API too simple, all it provides is a
> row-by-row API. I miss four things:
>
>- Batched or bulk processing both for data loading and dumping.
>- Async support (python has asyncio and async web frameworks, but no
>async DB spec). SQLAlchemy async support is coming soon and there is
>https://github.com/encode/databases
>- Connection pooling (it's common to use TLS, connection reuse would be
>nice as TLS 1.3 is not here yet)
>- Failover / load balancing support (this is connected to the previous)
>
> Best regards,
> Adam Lippai
>
> On Sun, Sep 27, 2020 at 9:57 PM Jorge Cardoso Leitão <
> jorgecarlei...@gmail.com> wrote:
>
> > That would be awesome! I agree with this, and would be really useful, as
> it
> > would leverage all the goodies that RDMS have wrt to transitions, etc.
> >
> > I would probably go for having database-specifics outside of the arrow
> > project, so that they can be used by other folks beyond arrow, and keep
> the
> > arrow-specifics (i.e. conversion from the format from the specific
> > databases to arrow) as part of the arrow crate. Ideally as Wes wrote,
> with
> > some standard to be easier to handle different DBs.
> >
> > I think that there are two layers: one is how to connect to a database,
> the
> > other is how to serialize/deserialize. AFAIK PEP 249 covers both layers,
> as
> > it standardizes things like `connect` and `tpc_begin`, as well as how
> > things should be serialized to Python objects (e.g. dates should be
> > datetime.date). This split is done by postgres for Rust
> > <https://github.com/sfackler/rust-postgres>, as it offers 5 crates:
> > * postges-async
> > * postges-sync (a blocking wrapper of postgres-async)
> > * postges-types (to convert to native rust  < IMO this one is what we
> > want to offer in Arrow)
> > * postges-TLS
> > * postges-openssl
> >
> > `postges-sync` implements Iterator (`client.query`), and
> postges-async
> > implements Stream.
> >
> > One idea is to have a generic iterator/stream adapter, that yields
> > RecordBatches. The implementation of this trait by different providers
> > would give support to be used in Arrow and DataFusion.
> >
> > Besides postgres, one idea is to pick the top from this list
> &

[Rust] Arrow SQL Adapters/Connectors

2020-09-26 Thread Neville Dipale
Hi Arrow developers

I would like to gauge the appetite for an Arrow SQL connector that:

* Reads and writes Arrow data to and from SQL databases
* Reads tables and queries into record batches, and writes batches to
tables (either append or overwrite)
* Leverages binary SQL formats where available (e.g. PostgreSQL format is
relatively easy and well-documented)
* Provides a batch interface that abstracts away the different database
semantics, and exposes a RecordBatchReader (
https://docs.rs/arrow/1.0.1/arrow/record_batch/trait.RecordBatchReader.html),
and perhaps a RecordBatchWriter
* Resides in the Rust repo as either an arrow::sql module (like arrow::csv,
arrow::json, arrow::ipc) or alternatively is a separate crate in the
workspace  (*arrow-sql*?)

I would be able to contribute a Postgres reader/writer as a start.
I could make this a separate crate, but to drive adoption I would prefer
this living in Arrow, also it can remain updated (sometimes we reorganise
modules and end up breaking dependencies).

Also, being developed next to DataFusion could allow DF to support SQL
databases, as this would be yet another datasource.

Some questions:
* Should such library support async, sync or both IO methods?
* Other than postgres, what other databases would be interesting? Here I'm
hoping that once we've established a suitable API, it could be easier to
natively support more database types.

Potential concerns:

* Sparse database support
It's a lot of effort to write database connectors, especially if starting
from scratch (unlike with say JDBC). What if we end up supporting 1 or 2
database servers?
Perhaps in that case we could keep the module without publishing it to
crates.io until we're happy with database support, or even its usage.

* Dependency bloat
We could feature-gate database types to reduce the number of dependencies
if one only wants certain DB connectors

* Why not use Java's JDBC adapter?
I already do this, but sometimes if working on a Rust project, creating a
separate JVM service solely to extract Arrow data is a lot of effort.
I also don't think it's currently possible to use the adapter to save Arrow
data in a database.

* What about Flight SQL extensions?
There have been discussions around creating Flight SQL extensions, and the
Rust SQL adapter could implement that and co-exist well.
>From a crate dependency, *arrow-flight* depends on *arrow*, so it could
also depend on this *arrow-sql* crate.

Please let me know what you think

Regards
Neville


Re: Rust conversion

2020-09-16 Thread Neville Dipale
Hi Roland,

For primitive types, there are value_slice methods which would allow you to
get the array's contents as a vector, but you have to handle the null
values as the vectors wouldn't return an Option.
We haven't seen demand yet in converting arrays to JSON, but the
integration crate (
https://github.com/apache/arrow/blob/master/rust/integration-testing/src/bin/arrow-json-integration-test.rs#L420)
would contain some logic to convert to JSON already.

Neville

On Wed, 16 Sep 2020 at 16:57, Roland Peelen 
wrote:

> Hi Guys,
>
> Quick question. Is there such a thing in the rust library as there is in
> the pyarrow lib to convert from Arrow types to rust internal types? For
> instance, to convert to JSON down the line?
>
> Or will I have to just go through the colums / rows and match on the
> schema type to cast the buffers to a certain type?
>
> Thanks!
> Roland
>


[Rust] Creating a separate branch for Arrow Partuet writer until next release

2020-08-08 Thread Neville Dipale
Good day,

This relates to https://github.com/apache/arrow/pull/7319 (ARROW-8289)

In the past few months we haven't had enough review bandwidth on Rust's
Parquet implementation (mostly relying on Chao for non-trivial reviews),
and given the amount of work needed for an Arrow writer + the interest so
far (I think few people already using the draft PR), I'd like to propose:

* We create a temporary branch in the apache/arrow repo, where the arrow
writer can temporarily live
* We can merge changes into the branch, esp if there aren't enough
reviewers at the time
* When we're close to a release, we merge what's on the temp branch into
the branch that's currently called `master` but will be renamed soon 

ITO the Arrow Parquet writer PR, I think I've gotten arbitrary nesting
covered, but there's a lot more work that we can now divide more easily so
others can contribute better.
I'm also unsure of how to test deeply nested arrays directly in the code (I
had to use Spark because Arrow reader doesn't yet support that).

Given that we have a linear git timeline where each commit is roughly =
JIRA ticket; I don't know if this would mess up the timeline, or whether
we'd still be able to merge into the temporary branch, and then rebase into
the main branch later.

Any thoughts and suggestions?

Neville


Re: Status of Rust Integration Testing

2020-07-11 Thread Neville Dipale
Hi Micah,

Yes, those files are read correctly. We test against them.
I was trying to generate gold files based on 0.17.1, so I could debug
against those, I'll work on that in the coming days.

On Sat, 11 Jul 2020, 05:58 Micah Kornfield,  wrote:

> Hi Neville,
> Thanks for the update.  One question, we have "gold" files for 0.14.0
> checked into the test-data repo and run integration tests on those to
> ensure we can read them in a few implementations.  Does Rust at least read
> those correctly?
>
> Thanks,
> Micah
>
> On Fri, Jul 10, 2020 at 1:03 PM Neville Dipale 
> wrote:
>
> > Good day Arrow devs,
> >
> > I've spent a few evenings looking into the issues that we're experiencing
> > with Rust integration testing.
> > In summary, none of our tests pass (zero batch doesn't count :) ).
> > This is mainly because of changes from the legacy padding in the 0.15.0
> > release, which we never made in Rust (
> > https://issues.apache.org/jira/browse/ARROW-6313).
> > I honestly didn't see this at the time.
> >
> > Anyways, the implication is that Rust is using the 'legacy' format of
> > 4-byte alignment, and fails to parse and read the ipc::Message into
> > RecordBatches.
> >
> > I'm so far struggling with the fixes in Rust, so this work might take
> long
> > and likely won't make it to 1.0.0. I'll open JIRAs for whatever work is
> > necessary.
> >
> > The Rust implementation appears to be behind with quite some
> implementation
> > details, so I'll try catching up by the next release.
> >
> > Regards
> > Neville
> >
>


Status of Rust Integration Testing

2020-07-10 Thread Neville Dipale
Good day Arrow devs,

I've spent a few evenings looking into the issues that we're experiencing
with Rust integration testing.
In summary, none of our tests pass (zero batch doesn't count :) ).
This is mainly because of changes from the legacy padding in the 0.15.0
release, which we never made in Rust (
https://issues.apache.org/jira/browse/ARROW-6313).
I honestly didn't see this at the time.

Anyways, the implication is that Rust is using the 'legacy' format of
4-byte alignment, and fails to parse and read the ipc::Message into
RecordBatches.

I'm so far struggling with the fixes in Rust, so this work might take long
and likely won't make it to 1.0.0. I'll open JIRAs for whatever work is
necessary.

The Rust implementation appears to be behind with quite some implementation
details, so I'll try catching up by the next release.

Regards
Neville


Re: [Integration] Errors running archery integration on Windows

2020-07-06 Thread Neville Dipale
Thanks Rok and Antoine,

I couldn't see what the issue could have been, so the SO link was
very helpful and informative.

I'll try it out, and submit a PR if I get it right.

On Mon, 6 Jul 2020 at 14:30, Antoine Pitrou  wrote:

>
> Yes, that's certainly the case.
> Changing:
> values = np.random.randint(lower, upper, size=size)
> to:
> values = np.random.randint(lower, upper, size=size, dtype=np.int64)
>
> would hopefully fix the issue.  Neville, could you try it out?
>
> Thank you
>
> Antoine.
>
> Le 06/07/2020 à 14:16, Rok Mihevc a écrit :
> > Numpy on windows has different default bitwidth than on linux. Perhaps
> this
> > is causing the issue? (see:
> >
> https://stackoverflow.com/questions/36278590/numpy-array-dtype-is-coming-as-int32-by-default-in-a-windows-10-64-bit-machine
> > )
> >
> > Rok
> >
> > On Mon, Jul 6, 2020 at 12:57 PM Neville Dipale 
> > wrote:
> >
> >> Hi Arrow devs,
> >>
> >> I'm trying to run archery integration tests in Windows 10 (Python 3.7.7;
> >> conda 4.8.3), but I'm getting an error *ValueError: low is out of bounds
> >> for int32* (
> >> https://gist.github.com/nevi-me/4946eabb2dc111e10b98c074b45b73b1
> >> ).
> >>
> >> Has someone else encountered this problem before?
> >>
> >> Regards
> >> Neville
> >>
> >
>


[Integration] Errors running archery integration on Windows

2020-07-06 Thread Neville Dipale
Hi Arrow devs,

I'm trying to run archery integration tests in Windows 10 (Python 3.7.7;
conda 4.8.3), but I'm getting an error *ValueError: low is out of bounds
for int32* (https://gist.github.com/nevi-me/4946eabb2dc111e10b98c074b45b73b1
).

Has someone else encountered this problem before?

Regards
Neville


Re: Bot to set "In Progress" status in JIRA

2020-06-30 Thread Neville Dipale
Thanks Wes,

I noticed this today, but was a bit confused as to why you reassigned a
JIRA to yourself, then back to me.
This clarifies what happened :)

Neville

On Tue, 30 Jun 2020 at 15:39, Wes McKinney  wrote:

> hi,
>
> Yesterday I set up a bot to set issues to In Progress if they have an
> assignee once a pull request has been opened.
>
> A consequence of issuing the "Start Progress" transition in JIRA is
> that it assigns the issue to the JIRA user, so the bot (which uses my
> JIRA credentials) will then immediately reassign the issue to the
> original assignee. I will look into setting up an "Arrow JIRA Bot"
> user to make it a little more clear what's going on
>
> - Wes
>


Re: [VOTE] Increment MetadataVersion in Schema.fbs from V4 to V5 for 1.0.0 release

2020-06-30 Thread Neville Dipale
+1 (non-binding)

On Tue, 30 Jun 2020 at 06:29, Ben Kietzman  wrote:

> +1 (non binding)
>
> On Tue, Jun 30, 2020, 00:25 Wes McKinney  wrote:
>
> > +1 (binding)
> >
> > On Mon, Jun 29, 2020 at 10:49 PM Micah Kornfield 
> > wrote:
> > >
> > > +1 (binding)
> > >
> > > On Mon, Jun 29, 2020 at 2:43 PM Wes McKinney 
> > wrote:
> > >
> > > > Hi,
> > > >
> > > > As discussed on the mailing list [1], in order to demarcate the
> > > > pre-1.0.0 and post-1.0.0 worlds, and to allow the
> > > > forward-compatibility-protection changes we are making to actually
> > > > work (i.e. so that libraries can recognize that they have received
> > > > data with a feature that they do not support), I have proposed to
> > > > increment the MetadataVersion from V4 to V5. Additionally, if the
> > > > union validity bitmap changes are accepted, the MetadataVersion could
> > > > be used to control whether unions are permitted to be serialized or
> > > > not (with V4 -- used by v0.8.0 to v0.17.1, unions would not be
> > > > permitted).
> > > >
> > > > Since there have been no backward incompatible changes to the Arrow
> > > > format since 0.8.0, this would be no different, and (aside from the
> > > > union issue) libraries supporting V5 are expected to accept BOTH V4
> > > > and V5 so that backward compatibility is not broken, and any
> > > > serialized data from prior versions of the Arrow libraries (0.8.0
> > > > onward) will continue to be readable.
> > > >
> > > > Implementations are recommended, but not required, to provide an
> > > > optional "V4 compatibility mode" for forward compatibility
> > > > (serializing data from >= 1.0.0 that needs to be readable by older
> > > > libraries, e.g. Spark deployments stuck on an older Java-Arrow
> > > > version). In this compatibility mode, non-forward-compatible features
> > > > added in 1.0.0 and beyond would not be permitted.
> > > >
> > > > A PR with the changes to Schema.fbs (possibly subject to some
> > > > clarifying changes to the comments) is at [2].
> > > >
> > > > Once the PR is merged, it will be necessary for implementations to be
> > > > updated and tested as appropriate at minimum to validate that
> backward
> > > > compatibility is preserved (i.e. V4 IPC payloads are still readable
> --
> > > > we have some in apache/arrow-testing and can add more as needed).
> > > >
> > > > The vote will be open for at least 72 hours.
> > > >
> > > > [ ] +1 Accept addition of MetadataVersion::V5 along with its general
> > > > implications above
> > > > [ ] +0
> > > > [ ] -1 Do not accept because...
> > > >
> > > > [1]:
> > > >
> >
> https://lists.apache.org/thread.html/r856822cc366d944b3ecdf32c2ea9b1ad8fc9d12507baa2f2840a64b6%40%3Cdev.arrow.apache.org%3E
> > > > [2]: https://github.com/apache/arrow/pull/7566
> > > >
> >
>


[jira] [Created] (ARROW-9095) [Rust] Fix NullArray to comply with spec

2020-06-10 Thread Neville Dipale (Jira)
Neville Dipale created ARROW-9095:
-

 Summary: [Rust] Fix NullArray to comply with spec
 Key: ARROW-9095
 URL: https://issues.apache.org/jira/browse/ARROW-9095
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Rust
Affects Versions: 0.17.0
Reporter: Neville Dipale


When I implemented the NullArray, I didn't comply with the spec under the 
premise that I'd handle reading and writing IPC in a spec-compliant way as that 
looked like the easier approach.

After some integration testing, I realised that I wasn't doing it correctly, so 
it's better to comply with the spec by not allocating any buffers for the array.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9053) [Rust] Add sort for lists and structs

2020-06-06 Thread Neville Dipale (Jira)
Neville Dipale created ARROW-9053:
-

 Summary: [Rust] Add sort for lists and structs
 Key: ARROW-9053
 URL: https://issues.apache.org/jira/browse/ARROW-9053
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Rust
Reporter: Neville Dipale






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9007) [Rust] Support appending arrays by merging array data

2020-06-02 Thread Neville Dipale (Jira)
Neville Dipale created ARROW-9007:
-

 Summary: [Rust] Support appending arrays by merging array data
 Key: ARROW-9007
 URL: https://issues.apache.org/jira/browse/ARROW-9007
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Rust
Affects Versions: 0.17.0
Reporter: Neville Dipale


ARROW-9005 introduces a concat kernel which allows for concatenating multiple 
arrays of the same type into a single array. This is useful for sorting on 
multiple arrays, among other things.

The concat kernel is implemented for most array types, but not yet for nested 
arrays (lists, structs, etc).

This Jira is for creating a way of appending/merging all array types, so that 
concat (and functionality that depends on it) can support all array types.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8883) [Rust] [Integration Testing] Disable unsupported tests

2020-05-21 Thread Neville Dipale (Jira)
Neville Dipale created ARROW-8883:
-

 Summary: [Rust] [Integration Testing] Disable unsupported tests
 Key: ARROW-8883
 URL: https://issues.apache.org/jira/browse/ARROW-8883
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Integration, Rust
Affects Versions: 0.17.0
Reporter: Neville Dipale


Some of the integration test failures can be avoided by disabling unsupported 
tests, like large lists and nested types



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8881) [Rust] Add large list and binary support

2020-05-21 Thread Neville Dipale (Jira)
Neville Dipale created ARROW-8881:
-

 Summary: [Rust] Add large list and binary support
 Key: ARROW-8881
 URL: https://issues.apache.org/jira/browse/ARROW-8881
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Rust
Affects Versions: 0.17.0
Reporter: Neville Dipale


Rust does not yet support large lists and large binary arrays. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8308) [Rust] [Flight] Implement DoExchange on examples

2020-04-01 Thread Neville Dipale (Jira)
Neville Dipale created ARROW-8308:
-

 Summary: [Rust] [Flight] Implement DoExchange on examples
 Key: ARROW-8308
 URL: https://issues.apache.org/jira/browse/ARROW-8308
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Neville Dipale


The gRPC server examples in Rust require all trait members to be exhaustively 
implemented. The recent `DoExchange` endpoint to the Flight service is causing 
failures in Rust.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] Adding "trivial" buffer compression option to IPC protocol (ARROW-300)

2020-03-01 Thread Neville Dipale
I also support compression at the buffer level, and making it an extra
message.

Talking about compression and flight, has anyone tested using grpc's
compression to compress at the transport level (if that's a correct way to
describe it)? I believe only gzip and brotli are currently supported, so
that might be insufficient.

On Sun, 01 Mar 2020, 23:14 Antoine Pitrou,  wrote:

>
> Le 01/03/2020 à 22:01, Wes McKinney a écrit :
> > In the context of a "next version of the Feather format" ARROW-5510
> > (which is consumed only by Python and R at the moment), I have been
> > looking at compressing buffers using fast compressors like ZSTD when
> > writing the RecordBatch bodies. This could be handled privately as an
> > implementation detail of the Feather file, but since ZSTD compression
> > could improve throughput in Flight, for example, I thought I would
> > bring it up for discussion.
> >
> > I can see two simple compression strategies:
> >
> > * Compress the entire message body in one-shot, writing the result out
> > with an 8-byte int64 prefix indicating the uncompressed size
> > * Compress each non-zero-length constituent Buffer prior to writing to
> > the body (and using the same uncompressed-length-prefix when writing
> > the compressed buffer)
> >
> > The latter strategy is preferable for scenarios where we may project
> > out only a few fields from a larger record batch (such as reading from
> > a memory-mapped file).
>
> Agreed.  It may also allow using different compression strategies for
> different kinds of buffers (for example a bytestream splitting strategy
> for floats and doubles, or a delta encoding strategy for integers).
>
> > Implementation could be accomplished by one of the following methods:
> >
> > * Setting a field in Message.custom_metadata
> > * Adding a new field to Message
>
> I think it has to be a new field in Message.  Making it an ignorable
> metadata field means non-supporting receivers will decode and interpret
> the data wrongly.
>
> Regards
>
> Antoine.
>


[jira] [Created] (ARROW-7924) [Rust] Add sort for float types

2020-02-23 Thread Neville Dipale (Jira)
Neville Dipale created ARROW-7924:
-

 Summary: [Rust] Add sort for float types
 Key: ARROW-7924
 URL: https://issues.apache.org/jira/browse/ARROW-7924
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Rust
Reporter: Neville Dipale


Floats need a different sort approach than other primitives, and this ticket 
will implement them separately



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7705) [Rust] Initial sort implementation

2020-01-28 Thread Neville Dipale (Jira)
Neville Dipale created ARROW-7705:
-

 Summary: [Rust] Initial sort implementation
 Key: ARROW-7705
 URL: https://issues.apache.org/jira/browse/ARROW-7705
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Rust
Reporter: Neville Dipale


An initial sort implementation that allows sorting an array by various options 
(e.g. sort order). This is mainly to iterate on the design and inner workings 
of a sort algorithm.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7704) [Rust] Support sort

2020-01-28 Thread Neville Dipale (Jira)
Neville Dipale created ARROW-7704:
-

 Summary: [Rust] Support sort
 Key: ARROW-7704
 URL: https://issues.apache.org/jira/browse/ARROW-7704
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Rust
Reporter: Neville Dipale


This lays out the work needed to support sorting arrays and record batches



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [Rust] Possible blocking issue for 0.16 release

2020-01-26 Thread Neville Dipale
Hi Andy,

I think `cargo update` is the correct approach to resolve this issue. Yes,
we've got quite a bit of backlog in Rust, but we should initiate the
process to adopt parquet-format with Chao and Ivan's approval post 0.16.

In the interim we could fix the thrift dependency issue upstream (as that
could be quicker). I'm hoping to make some time before 1.0.0 to update
parquet to the latest format.

On Sun, 26 Jan 2020, 18:01 Andy Grove,  wrote:

> Apologies for showing up at the last minute but I'm now re-engaged in the
> project after a bit of an absence and I noticed that we have some
> dependency conflicts due to the parquet-format crate (not controlled by
> Apache) using Thrift 0.12 whereas the parquet crate uses Thrift 0.13 and
> they require different versions of the byteorder crate.
>
> However, I have seen that not everyone is running into this issue, which
> confuses me. Maybe this is related to cached dependencies and the fact that
> we do not check in Cargo.lock (which we should not need to since this
> project is a library rather than a binary).
>
> See ARROW-7563 and ARROW-7507 for more context.
>
> I have created a PR against the parquet-format crate and hopefully, we can
> get a new version published.
>
> We need to get this crate under ASF control IMHO. See ARROW-6256.
>
> Andy.
>


[jira] [Created] (ARROW-7620) [Rust] Windows builds failing due to flatbuffer compile error

2020-01-20 Thread Neville Dipale (Jira)
Neville Dipale created ARROW-7620:
-

 Summary: [Rust] Windows builds failing due to flatbuffer compile 
error
 Key: ARROW-7620
 URL: https://issues.apache.org/jira/browse/ARROW-7620
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust
Reporter: Neville Dipale


I've noticed now on a few PRs whose tests should otherwise pass, that the Rust 
Windows tests are failing due to `*_generated.rs` not being found while trying 
to rename the generated flatbuffer files.

An example is at 
[https://github.com/apache/arrow/pull/6227/checks?check_run_id=397505832]

 

    + flatc --rust -o arrow/src/ipc/gen/ ../format/File.fbs 
../format/Message.fbs ../format/Schema.fbs ../format/SparseTensor.fbs 
../format/Tensor.fbs

    + find arrow/src/ipc/gen/ -name '*_generated.rs' -exec sed -i 
s/type__type/type_type/g '{}' ';'

    File not found - *_generated.rs



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [Discuss][Rust] Policy regarding "unsafe"

2020-01-17 Thread Neville Dipale
Hi Paddy, Arrow Developers,

I've given this some thought, and I preliminarily think that perhaps we can
audit our use of unsafe and evaluate where we can remove it, propagate it
upwards (and provide safe alternatives) or provide some safety to callers.

Looking at the 3 options that Paul Kernfeld raised in the linked JIRA:


   1. Add in bounds checking so that we don't need to deal with unsafe at
   all.
   2. Propagate the unsafes up through the code.
   3. Maintain a safe and unsafe version of each function that is currently
   unsafe.


I think bounds checking would hurt performance, an example being the
changes introduced in https://issues.apache.org/jira/browse/ARROW-4670. In
ARROW-4670, I believe we were able to get the compiler to auto-vectorise
due to Array::value() avoiding bounds checks. In the case of compute, we
are in control of the array length, and so we know that it's safe to skip
bounds checking. I presume this would largely be the case in tabular-data
use-cases (because we assert that arrows in a record batch meet certain
criteria).

>From a cursory glance, if we do find that we don't need explicit SIMD
(still immature in Rust, I've found it difficult to implement in some
cases), we could potentially reduce our unsafe count by around 20%. The
flatbuffers generated files also introduce a lot of unsafe (~26%), so we'd
need to maybe adopt option 2 from Paul on IPC once we're done with the
basics.

We'd then mainly be left with bit manipulation and `Buffer` (which as as
much unsafe as the fbs generated files). I think the API around buffer
would depend on whether we're expecting (based on what can be done with
buffers) this to be exposed to users beyond those using Arrow as a
development platform.

The above are some of my thoughts, but important's that I don't have a lot
of experience with Rust, especially `unsafe` and the other dark corners of
the language.

Regards
Neville

On Fri, 10 Jan 2020 at 04:13, paddy horan  wrote:

> Hi All,
>
> This time last year there was a brief discussion on the usage of unsafe in
> Rust (a user on github raised the issue and I created the JIRA). [1]
>
> So far we mostly avoid unsafe in the public API's.  The thinking here is
> that Arrow is a "development platform", i.e. lower level that most
> libraries, and library builders will want to avoid any performance hit of
> bounds checking, etc.
>
> This is not typical in the Rust community where unsafe is a clear signal
> that care is needed.  Although it might clutter the API a little more I
> would be in favor of having safe and unsafe variants of methods as needed.
> For instance, "value" for array access would be changed to "value" and
> "value_unchecked" where the latter is unsafe and does not perform bounds
> checks.
>
> We don't have a huge number of libraries building on top of Arrow in Rust
> at the moment so it seems like a good time, before 1.0, to decide on this
> to avoid breaking changes to the public API in post 1.0.
>
> Thoughts?
>
> Paddy
>
> [1] https://issues.apache.org/jira/browse/ARROW-3776?filter=12343557
>
>


Re: [Discuss][Rust] Policy regarding "unsafe"

2020-01-17 Thread Neville Dipale
Hi Paddy,

On Fri, 10 Jan 2020 at 04:13, paddy horan  wrote:

> Hi All,
>
> This time last year there was a brief discussion on the usage of unsafe in
> Rust (a user on github raised the issue and I created the JIRA). [1]
>
> So far we mostly avoid unsafe in the public API's.  The thinking here is
> that Arrow is a "development platform", i.e. lower level that most
> libraries, and library builders will want to avoid any performance hit of
> bounds checking, etc.
>
> This is not typical in the Rust community where unsafe is a clear signal
> that care is needed.  Although it might clutter the API a little more I
> would be in favor of having safe and unsafe variants of methods as needed.
> For instance, "value" for array access would be changed to "value" and
> "value_unchecked" where the latter is unsafe and does not perform bounds
> checks.
>
> We don't have a huge number of libraries building on top of Arrow in Rust
> at the moment so it seems like a good time, before 1.0, to decide on this
> to avoid breaking changes to the public API in post 1.0.
>
> Thoughts?
>
> Paddy
>
> [1] https://issues.apache.org/jira/browse/ARROW-3776?filter=12343557
>
>


[jira] [Created] (ARROW-7521) [Rust] Remove tuple on FixedSizeList datatype

2020-01-08 Thread Neville Dipale (Jira)
Neville Dipale created ARROW-7521:
-

 Summary: [Rust] Remove tuple on FixedSizeList datatype
 Key: ARROW-7521
 URL: https://issues.apache.org/jira/browse/ARROW-7521
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Neville Dipale


The FixedSizeList datatype takes a tuple of Box and length, but this 
could be simplified to take the two values without a tuple.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7475) [Rust] Create Arrow Stream writer

2019-12-29 Thread Neville Dipale (Jira)
Neville Dipale created ARROW-7475:
-

 Summary: [Rust] Create Arrow Stream writer
 Key: ARROW-7475
 URL: https://issues.apache.org/jira/browse/ARROW-7475
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Rust
Reporter: Neville Dipale






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7460) [Rust] Improve arithmetic kernels with autovec

2019-12-22 Thread Neville Dipale (Jira)
Neville Dipale created ARROW-7460:
-

 Summary: [Rust] Improve arithmetic kernels with autovec
 Key: ARROW-7460
 URL: https://issues.apache.org/jira/browse/ARROW-7460
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Affects Versions: 0.15.1
Reporter: Neville Dipale


In a comment to an open ticket for optimising a cast kernel by using SIMD, 
[~andy-thomason] mentioned that LLVM does autovec well for Rust.

I'd like to explore whether we could improve the kernel performance by 
simplifying the loops enough to allow the compiler to vectorise.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7364) [Rust] Add cast options to cast kernel

2019-12-10 Thread Neville Dipale (Jira)
Neville Dipale created ARROW-7364:
-

 Summary: [Rust] Add cast options to cast kernel
 Key: ARROW-7364
 URL: https://issues.apache.org/jira/browse/ARROW-7364
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Neville Dipale


The cast kernels currently do not take explicit options, but instead convert 
overflows and invalid uft8 to nulls. We can create options that customise the 
behaviour, similarly to CastOptions in CPP 
([https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/kernels/cast.h#L38])



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7324) [Rust] Add Timezone to Timestamp

2019-12-04 Thread Neville Dipale (Jira)
Neville Dipale created ARROW-7324:
-

 Summary: [Rust] Add Timezone to Timestamp
 Key: ARROW-7324
 URL: https://issues.apache.org/jira/browse/ARROW-7324
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Rust
Reporter: Neville Dipale


Proposal to add timestamp to timezone type



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7207) [Rust] Update Generated Flatbuffer Files

2019-11-19 Thread Neville Dipale (Jira)
Neville Dipale created ARROW-7207:
-

 Summary: [Rust] Update Generated Flatbuffer Files
 Key: ARROW-7207
 URL: https://issues.apache.org/jira/browse/ARROW-7207
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Rust
Reporter: Neville Dipale


We last built the fbs files early in the year, and since then there have been 
some changes like LargeLists. We should update the generated Rust files to 
incorporate these changes



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [Help Needed] Arrow IPC Reader in Rust

2019-11-18 Thread Neville Dipale
Thanks Paddy,

I've left the fix as is for now. If we come across problems with the
precision in the future, we can tweak it accordingly.

Interesting that the array printing was a separate issue, I would have
never figured it out.

On Mon, 18 Nov 2019 at 22:07, paddy horan  wrote:

> I should have mentioned that I pushed the fix to your branch.
>
> P
> 
> From: paddy horan 
> Sent: Monday, November 18, 2019 3:04 PM
> To: dev@arrow.apache.org 
> Subject: Re: [Help Needed] Arrow IPC Reader in Rust
>
> Hey Neville,
>
> I had a chance to look at this.  The debugging output is a separate, but
> misleading, issue.  The real cause is the precision of the 32-bit floating
> point values.  The JSON data has 3 decimal places and the array returned
> from the reader has more than 3, this might be due to the fact that we read
> in 64-bit floats and cast?
>
> I implemented a quick fix to test and I can pass all tests locally,
> although I will leave it to you to change as I'm not sure where in your
> process it's best to adjust the precision.
>
> Regards,
> Paddy
> 
> From: paddy horan 
> Sent: Saturday, November 16, 2019 1:03 PM
> To: dev@arrow.apache.org 
> Subject: Re: [Help Needed] Arrow IPC Reader in Rust
>
> Hey Neville,
>
> I'll take a look if no-one beats me to it (I might not have time today or
> tomorrow).
>
> P
>
> 
> From: Neville Dipale 
> Sent: Saturday, November 16, 2019 1:42 AM
> To: dev@arrow.apache.org 
> Subject: [Help Needed] Arrow IPC Reader in Rust
>
> Hi Arrow developers,
>
> I'm "done" with the Arrow IPC Reader in Rust (for supported data types),
> but am having issues with reading some of the test data.
> Specifically, I've noticed that when reading the integration test data
> (primitve_generated), where I expect an array with 17 values, the arrow
> array contains 20 values.
>
> To illustrate what's happening, I've added some debug statements to the
> unit test, and the behaviour can be seen at (
> https://ci.ursalabs.org/#/builders/93/builds/1550/steps/3/logs/stdio).
> In the logs, there are a number of arrays which have a length of 17, but
> have 20 printed values. 3 of those values are duplicated.
>
> It's been hard trying to inspect the binary data to verify if there's an
> issue with them, and I'm able to correctly read 17 values with Python, so I
> suspect it has to be a Rust issue.
> Would anyone have some time to look into this with me?
>
> Thanks
> Neville
>


Re: [DISCUSS] [Rust] Adding support for Flight protocol

2019-11-17 Thread Neville Dipale
Hi Andy,

I've fixed the issue and left a description of the problem on the PR.

On Sun, 17 Nov 2019 at 19:23, Andy Grove  wrote:

> I'm now trying to create a Flight server in Rust and am struggling a bit.
> See https://github.com/apache/arrow/pull/5852 for more information. If
> anyone is available to take a look I'd appreciate it.
>
> I'm going to reach out to Lucio directly since he isn't currently
> subscribed to the mailing list IIRC.
>
> Thanks,
>
> Andy.
>
>
>
> On Fri, Oct 18, 2019 at 11:17 AM Lucio Franco 
> wrote:
>
> > Hi all!
> >
> > I am the author of Tonic, I'd love to see the rust flight implementation
> > done with Tonic. David, it looks like what you implemented with
> tower-grpc
> > should work just fine with tonic as well. I am also interested in helping
> > out. Since, I guess most of my experience up to now is with Tonic
> itself, I
> > wanted to make myself available to you all for help or special
> > implementations coming from the Tonic side. I am also willing to help
> with
> > the flight rust implementation. So please let me know around that what I
> > can do!
> > Thanks,
> > Lucio
> >
> > On Oct 17 2019, at 7:26 pm, Neville Dipale 
> wrote:
> > > Thanks Andy,
> > >
> > > Please see
> > https://github.com/apache/arrow/pull/4167#issuecomment-543381089
> > > for the status of the PR. We have a few missing data types (fixed list,
> > > timezone to timestamp, etc.) that are currently stopping me from
> testing
> > > the reading of files.
> > >
> > > I'm trying out creating a fixed size list, and I'll open a PR for that
> if
> > > my attempt works.
> > >
> > > On Fri, 18 Oct 2019 at 01:10, Andy Grove 
> wrote:
> > > > Thanks for all the updates. I'd like to get involved and help out
> with
> > this
> > > > effort as well. I don't have any major work planned for DataFusion
> for
> > > > 1.0.0 now other than maybe moving to the new parquet ArrowReader, if
> > it is
> > > > ready in time.
> > > >
> > > > I have been chatting with the author of the Rust Flatbuffer project
> > about
> > > > some of the issues and I can take an action to follow up with that.
> > > >
> > > > I have also been talking with one of the authors of Tonic, and I
> > believe
> > > > they might be interested in helping here too.
> > > >
> > > > Let me know how else I can help out with this effort.
> > > > Andy.
> > > >
> > > >
> > > > On Thu, Oct 17, 2019 at 2:42 PM Neville Dipale <
> nevilled...@gmail.com>
> > > > wrote:
> > > >
> > > > > Good evening
> > > > > With support for testing against integration files now done, I've
> > resumed
> > > > > work on the IPC reader. If I don't encounter trouble reading the
> > existing
> > > > > files, I expect to be done with this work by the end of the
> weekend.
> > I
> > > >
> > > > had
> > > > > taken the approach of one large PR to include all Rust-supported
> > Arrow
> > > > > types.
> > > > >
> > > > > I'm not sure of how long the writer would take, but I remain
> > committed to
> > > > > having this work completed by 1.0.
> > > > >
> > > > > I have to catch up on null-type roundtrip and the padding alignment
> > work
> > > > as
> > > > > I haven't been able to keep abreast with development these last few
> > > >
> > > > months.
> > > > > Also, the Rust flatbuffer issues that we've had haven't progressed
> > in the
> > > > > relevant repo, so it still makes ergonomics not great, but at least
> > our
> > > > > users don't have to worry about that.
> > > > >
> > > > > @Andy I have good experience with gRPC in Rust, and also want to
> see
> > > > Flight
> > > > > support landing soon.
> > > > >
> > > > > On Thu, 17 Oct 2019, 17:06 David Li, 
> wrote:
> > > > > > Just for reference, it's possible once the basic IPC support is
> > > > merged; I
> > > > > > had a proof of concept, though it needs to be updated to use
> Tonic
> > over
> > > > > > tower-grpc, actually implement the zero-copy optimizations,
> > provide a
> > > > >
> > > > > real

[jira] [Created] (ARROW-7194) [Rust] CSV Writer causing recursion errors

2019-11-16 Thread Neville Dipale (Jira)
Neville Dipale created ARROW-7194:
-

 Summary: [Rust] CSV Writer causing recursion errors
 Key: ARROW-7194
 URL: https://issues.apache.org/jira/browse/ARROW-7194
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust
Reporter: Neville Dipale


As reported in [https://github.com/apache/arrow/pull/5805], the CSV writer's 
use of std::io::Write is causing recursion issues.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[Help Needed] Arrow IPC Reader in Rust

2019-11-15 Thread Neville Dipale
Hi Arrow developers,

I'm "done" with the Arrow IPC Reader in Rust (for supported data types),
but am having issues with reading some of the test data.
Specifically, I've noticed that when reading the integration test data
(primitve_generated), where I expect an array with 17 values, the arrow
array contains 20 values.

To illustrate what's happening, I've added some debug statements to the
unit test, and the behaviour can be seen at (
https://ci.ursalabs.org/#/builders/93/builds/1550/steps/3/logs/stdio).
In the logs, there are a number of arrays which have a length of 17, but
have 20 printed values. 3 of those values are duplicated.

It's been hard trying to inspect the binary data to verify if there's an
issue with them, and I'm able to correctly read 17 values with Python, so I
suspect it has to be a Rust issue.
Would anyone have some time to look into this with me?

Thanks
Neville


[jira] [Created] (ARROW-6944) [Rust] Add StringType

2019-10-19 Thread Neville Dipale (Jira)
Neville Dipale created ARROW-6944:
-

 Summary: [Rust] Add StringType
 Key: ARROW-6944
 URL: https://issues.apache.org/jira/browse/ARROW-6944
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Rust
Reporter: Neville Dipale


Create a separate String type which uses UTF8, and restrict the BinaryArray to 
opaque binary data



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6928) [Rust] Add FixedSizeList type

2019-10-17 Thread Neville Dipale (Jira)
Neville Dipale created ARROW-6928:
-

 Summary: [Rust] Add FixedSizeList type
 Key: ARROW-6928
 URL: https://issues.apache.org/jira/browse/ARROW-6928
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Rust
Reporter: Neville Dipale


Support FixedSizeList, which is required for integration testing



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] [Rust] Adding support for Flight protocol

2019-10-17 Thread Neville Dipale
Thanks Andy,

Please see https://github.com/apache/arrow/pull/4167#issuecomment-543381089
for the status of the PR. We have a few missing data types (fixed list,
timezone to timestamp, etc.) that are currently stopping me from testing
the reading of files.

I'm trying out creating a fixed size list, and I'll open a PR for that if
my attempt works.

On Fri, 18 Oct 2019 at 01:10, Andy Grove  wrote:

> Thanks for all the updates. I'd like to get involved and help out with this
> effort as well. I don't have any major work planned for DataFusion for
> 1.0.0 now other than maybe moving to the new parquet ArrowReader, if it is
> ready in time.
>
> I have been chatting with the author of the Rust Flatbuffer project about
> some of the issues and I can take an action to follow up with that.
>
> I have also been talking with one of the authors of Tonic, and I believe
> they might be interested in helping here too.
>
> Let me know how else I can help out with this effort.
>
> Andy.
>
>
>
> On Thu, Oct 17, 2019 at 2:42 PM Neville Dipale 
> wrote:
>
> > Good evening
> >
> > With support for testing against integration files now done, I've resumed
> > work on the IPC reader. If I don't encounter trouble reading the existing
> > files, I expect to be done with this work by the end of the weekend. I
> had
> > taken the approach of one large PR to include all Rust-supported Arrow
> > types.
> >
> > I'm not sure of how long the writer would take, but I remain committed to
> > having this work completed by 1.0.
> >
> > I have to catch up on null-type roundtrip and the padding alignment work
> as
> > I haven't been able to keep abreast with development these last few
> months.
> > Also, the Rust flatbuffer issues that we've had haven't progressed in the
> > relevant repo, so it still makes ergonomics not great, but at least our
> > users don't have to worry about that.
> >
> > @Andy I have good experience with gRPC in Rust, and also want to see
> Flight
> > support landing soon.
> >
> > On Thu, 17 Oct 2019, 17:06 David Li,  wrote:
> >
> > > Just for reference, it's possible once the basic IPC support is
> merged; I
> > > had a proof of concept, though it needs to be updated to use Tonic over
> > > tower-grpc, actually implement the zero-copy optimizations, provide a
> > real
> > > API, etc.
> > >
> > > https://github.com/apache/arrow/pull/4167#issuecomment-529695811
> > >
> > > On Thu, Oct 17, 2019, 10:51 Wes McKinney  wrote:
> > >
> > > > I hope to see Flight in all the reference implementations eventually.
> > > >
> > > > Having hardened IPC support is a pre-requisite, it would be ideal to
> > > > have Rust as a participant in the integration tests
> > > >
> > > > On Thu, Oct 17, 2019 at 9:41 AM Andy Grove 
> > > wrote:
> > > > >
> > > > > I was approached directly about adding Flight support to the Rust
> > > > > implementation, and said I would start a discussion here on the
> > mailing
> > > > > list.
> > > > >
> > > > > There is ongoing work with IPC and integration and I believe that
> it
> > > > would
> > > > > make sense to start looking at adding Flight support.
> > > > >
> > > > > I'd like to hear what others think though.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Andy.
> > > >
> > >
> >
>


[jira] [Created] (ARROW-6650) [Rust] [Integration] Add method to generate JSON from RecordBatch

2019-09-20 Thread Neville Dipale (Jira)
Neville Dipale created ARROW-6650:
-

 Summary: [Rust] [Integration] Add method to generate JSON from 
RecordBatch
 Key: ARROW-6650
 URL: https://issues.apache.org/jira/browse/ARROW-6650
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Integration, Rust
Affects Versions: 0.14.1
Reporter: Neville Dipale


[~emkornfi...@gmail.com] recommended that we use the integration IPC files. To 
be able to compare against the JSON files that are used, we need to be able to 
generate a JSON represention of Arrow data in Rust.

We can already do this for schemas, and this ticket is for supporting 
converting RecordBatch to JSON.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-5408) [Rust] Create struct array builder that creates null buffers

2019-05-23 Thread Neville Dipale (JIRA)
Neville Dipale created ARROW-5408:
-

 Summary: [Rust] Create struct array builder that creates null 
buffers
 Key: ARROW-5408
 URL: https://issues.apache.org/jira/browse/ARROW-5408
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Neville Dipale


We currently have a way of creating a struct array from a list of (field, 
array) tuples. This does not create null buffers for the struct (because no 
index is null). While this works fine for Rust, it often leads to incompatible 
data with IPC data and kernel function outputs.

Having a function that caters for nulls, or expanding the current one, would 
alleviate this issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5400) [Rust] Test/ensure that reader and writer support zero-length record batches

2019-05-23 Thread Neville Dipale (JIRA)
Neville Dipale created ARROW-5400:
-

 Summary: [Rust] Test/ensure that reader and writer support 
zero-length record batches
 Key: ARROW-5400
 URL: https://issues.apache.org/jira/browse/ARROW-5400
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Rust
Reporter: Neville Dipale






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5399) [Rust] [Testing] Add IPC test files to arrow-testing

2019-05-23 Thread Neville Dipale (JIRA)
Neville Dipale created ARROW-5399:
-

 Summary: [Rust] [Testing] Add IPC test files to arrow-testing
 Key: ARROW-5399
 URL: https://issues.apache.org/jira/browse/ARROW-5399
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Rust
Reporter: Neville Dipale


We're generating a lot of files for testing, which should ideally live in 
arrow-testing



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5367) [Rust] Add temporal kernels

2019-05-18 Thread Neville Dipale (JIRA)
Neville Dipale created ARROW-5367:
-

 Summary: [Rust] Add temporal kernels
 Key: ARROW-5367
 URL: https://issues.apache.org/jira/browse/ARROW-5367
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Rust
Reporter: Neville Dipale


When creating temporal arrays, we added a sample function that extracts the 
hour from a temporal array. This ticket is to add support for other common 
temporal functions like minute, second, hour, and might include temporal 
arithmetic as adding dates and times, calculating durations etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5366) [Rust] Implement Duration and Interval Types

2019-05-18 Thread Neville Dipale (JIRA)
Neville Dipale created ARROW-5366:
-

 Summary: [Rust] Implement Duration and Interval Types
 Key: ARROW-5366
 URL: https://issues.apache.org/jira/browse/ARROW-5366
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Rust
Reporter: Neville Dipale


This should ideally include covering:
 * data types
 * arrays and builders
 * adding to kernels (e.g. including support in cast)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5360) [Rust] Builds are broken by rustyline on nightly 2019-05-16+

2019-05-17 Thread Neville Dipale (JIRA)
Neville Dipale created ARROW-5360:
-

 Summary: [Rust] Builds are broken by rustyline on nightly 
2019-05-16+
 Key: ARROW-5360
 URL: https://issues.apache.org/jira/browse/ARROW-5360
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust - DataFusion
Reporter: Neville Dipale


Rust builds are broken on nightly since 2019-05-16. Please see 
[https://github.com/kkawakam/rustyline/issues/217]

The issue might need to be fixed on the rustyline crate.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5352) [Rust] BinaryArray filter loses replaces nulls with empty strings

2019-05-16 Thread Neville Dipale (JIRA)
Neville Dipale created ARROW-5352:
-

 Summary: [Rust] BinaryArray filter loses replaces nulls with empty 
strings
 Key: ARROW-5352
 URL: https://issues.apache.org/jira/browse/ARROW-5352
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust
Affects Versions: 0.13.0
Reporter: Neville Dipale


The filter implementation for BinaryArray discards nullness of data. 
BinaryArrays that are null (seem to) always return an empty string slice when 
getting a value, so the way filter works might be a bug depending on what Arrow 
developers' or users' intentions are.

I think we should either preserve nulls (and their count) or document this as 
intended behaviour.

Below is a test case that reproduces the bug.
{code:java}
#[test]
fn test_filter_binary_array_with_nulls() {
let mut a: BinaryBuilder = BinaryBuilder::new(100);
a.append_null().unwrap();
a.append_string("a string").unwrap();
a.append_null().unwrap();
a.append_string("with nulls").unwrap();
let array = a.finish();
let b = BooleanArray::from(vec![true, true, true, true]);
let c = filter(, ).unwrap();
let d:  = c.as_any().downcast_ref::().unwrap();
// I didn't expect this behaviour
assert_eq!("", d.get_string(0));
// fails here
assert!(d.is_null(0));
assert_eq!(4, d.len());
// fails here
assert_eq!(2, d.null_count());
assert_eq!("a string", d.get_string(1));
// fails here
assert!(d.is_null(2));
assert_eq!("with nulls", d.get_string(3));
}
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5351) [Rust] Add support for take kernel functions

2019-05-16 Thread Neville Dipale (JIRA)
Neville Dipale created ARROW-5351:
-

 Summary: [Rust] Add support for take kernel functions
 Key: ARROW-5351
 URL: https://issues.apache.org/jira/browse/ARROW-5351
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Neville Dipale


Similar to https://issues.apache.org/jira/browse/ARROW-772, a take function 
would allow us random-access on arrays, which is useful for sorting and 
(potentially) filtering.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5350) [Rust] Support filtering on nested array types

2019-05-16 Thread Neville Dipale (JIRA)
Neville Dipale created ARROW-5350:
-

 Summary: [Rust] Support filtering on nested array types
 Key: ARROW-5350
 URL: https://issues.apache.org/jira/browse/ARROW-5350
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Neville Dipale


We currently only filter on primitive types, but not on lists and structs. Add 
the ability to filter on nested array types



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[Rust] [Discuss] Generalising Eq, Neq Kernel Functions Beyond Numeric Types

2019-05-13 Thread Neville Dipale
Hi Arrow[Rust] developers,

I came across an instance where I wanted to compare 2 arrays that aren't
numeric (bool, string, list?), and couldn't conveniently leverage the
comparison array_ops for this. This is due to the trait bounds that require
that PrimitiveArray satisfy T: ArrowNumericType.

Users might need/want to compare non-numeric arrays, at least with {equal |
not equal} functions. It's not hard to write a custom function to do so,
but we would leave a lot of detail down to the user.

I would like to propose that we expand the *compute::eq* and *compute::neq*
functions to cater for non-numeric arrays.
For reference, these can be found in
https://github.com/apache/arrow/blob/master/rust/arrow/src/compute/kernels/comparison.rs

So far, I see 2 options:

1. A fast path for booleans that uses existing SIMD-enabled *eq* and *neq*,
if we can cast True=1, False=0 fast enough (the cast kernel already exists)
2. A slow path for non-numeric arrays where we perform element-wise
comparisons
3. A hashing approach where we hash values (to i64?) and leverage the
SIMD-enabled *eq* and *neq*.

Do you have any opinions on the above?

Thanks
Neville


Re: [ANNOUNCE] New Arrow committer: Neville Dipale

2019-05-13 Thread Neville Dipale
Thanks everyone for the invite and privilege.

Neville

On Mon, 13 May 2019 at 15:19, Wes McKinney  wrote:

> Congrats!
>
> On Mon, May 13, 2019 at 4:25 AM Krisztián Szűcs
>  wrote:
> >
> > Congrats Neville!
> >
> > On Mon, May 13, 2019 at 11:02 AM Fan Liya  wrote:
> >
> > > Congrats!!!
> > >
> > > On Sun, May 12, 2019 at 10:10 AM Philipp Moritz 
> > > wrote:
> > >
> > > > Congrats Neville!
> > > >
> > > > On Sat, May 11, 2019 at 6:09 PM Renjie Liu 
> > > > wrote:
> > > >
> > > > > Congrats!
> > > > >
> > > > > Chao Sun  于 2019年5月12日周日 上午12:38写道:
> > > > >
> > > > > > Congrats Neville!
> > > > > >
> > > > > > On Sat, May 11, 2019 at 9:36 AM Micah Kornfield <
> > > emkornfi...@gmail.com
> > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Congrats!!
> > > > > > >
> > > > > > > On Saturday, May 11, 2019, paddy horan  >
> > > > wrote:
> > > > > > >
> > > > > > > > Congrats Neville!  Thank you for your contributions!
> > > > > > > >
> > > > > > > > Get Outlook for iOS<https://aka.ms/o0ukef>
> > > > > > > > 
> > > > > > > > From: Andy Grove 
> > > > > > > > Sent: Saturday, May 11, 2019 11:23 AM
> > > > > > > > To: dev@arrow.apache.org
> > > > > > > > Subject: [ANNOUNCE] New Arrow committer: Neville Dipale
> > > > > > > >
> > > > > > > > On behalf of the Arrow PMC, I'm happy to announce that
> Neville
> > > has
> > > > > > > >
> > > > > > > > accepted an invitation to become a committer on Apache Arrow.
> > > > > > > >
> > > > > > > > Welcome, and thank you for your contributions!
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
>


[jira] [Created] (ARROW-5303) [Rust] Add SIMD vectorization of numeric casts

2019-05-12 Thread Neville Dipale (JIRA)
Neville Dipale created ARROW-5303:
-

 Summary: [Rust] Add SIMD vectorization of numeric casts
 Key: ARROW-5303
 URL: https://issues.apache.org/jira/browse/ARROW-5303
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Affects Versions: 0.13.0
Reporter: Neville Dipale


To improve the performance of cast kernels, we need SIMD support in numeric 
casts.

An initial exploration shows that we can't trivially add SIMD casts between our 
Arrow T::Simd types, because `packed_simd` only supports a cast between T::Simd 
types that have the same number of lanes.

This means that adding casts from f64 to i64 (same lane length) satisfies the 
bound trait `where TO::Simd : packed_simd::FromCast`, but f64 to 
i32 (different lane length) doesn't.

We would benefit from investigating work-arounds to this limitation. Please see 
[github::nevi_me::arrow/\{branch:simd-cast}/../kernels/cast.rs|[https://github.com/nevi-me/arrow/blob/simd-cast/rust/arrow/src/compute/kernels/cast.rs#L601]]
 for an example implementation that's limited by the differences in lane length.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[Rust] [Format] Should Null Bitmaps be Padded to 8 or 64 Bits?

2019-04-26 Thread Neville Dipale
Hi Arrow developers,

I'm currently working on IPC in Rust, specifically reading Arrow files.
I've noticed that null buffers/bitmaps are always padded to 64 bits (from
pyarrow, not sure about others), while in Rust we pad to 8 bits.

1. Is this fine re. Rust per the spec?

I'm having issues with reading, but only because I'm comparing array data
and not only the values and nullness of slots. I see this being more of a
problem when writing to files and streams as we'd need to pad null buffers
almost every time (since for large arrays IPC could need 2048 while we have
2046, so it's not a small data issue)

2. If implementations are allowed to choose either 8 or 64, are the Rust
commiters happy with us changing to 64-bit padding?

The benefits of changing to 64 would be removing the need to then pad the
buffer when writing to streams and files, and it'll make us more compatible
with other implementations. I suspect this would still come as an issue
when we get to add Rust to interop tests.

I tried changing to 64-bit before writing this mail, but bit-fu is still
beyond my knowledge, so I'd need help from someone else with implementing
this, or at least letting me know which lines to change. I don't mind then
making sure all tests still pass.

My goal is to complete IPC work by 0.14 release, so this would be a bit
urgent as I'm stuck right now.

Thanks
Neville


Re: Proper way to retrigger Travis CI builds

2019-04-25 Thread Neville Dipale
To add here, sometimes builds for unrelated changes are caused by your
branch being behind master. I've noticed that whenever I rebase my changes
to latest master, I reliably only trigger the Rust jobs to run.
Maybe that could also help non-Arrow commiters :)

On Thu, 25 Apr 2019 at 14:11, Wes McKinney  wrote:

> If you are an Arrow committer you can restart builds in the Travis CI
> UI, but otherwise the method that Antoine indicated is the best option
> for non-committers
>
> On Thu, Apr 25, 2019 at 4:51 AM Antoine Pitrou  wrote:
> >
> >
> > Hi,
> >
> > I often do a force-push of identical contents, with a different
> > changeset id:
> >
> > $ git commit -a --amend && git push --force
> >
> > Regards
> >
> > Antoine.
> >
> >
> > Le 25/04/2019 à 11:39, Yurui Zhou a écrit :
> > > Hey guys:
> > >
> > > When submitting PR to master, I often run into Travis CI build
> failures that are unrelated to my changes. I usually close and reopen the
> PR to re-trigger the build. Just wondering is there any other way (like a
> button) that  allow me to re-trigger the falling builds without closing and
> reopening my PR?
> > >
> > > Thanks
> > > Yurui
> > >
>


[jira] [Created] (ARROW-5191) [Rust] Expose schema in readers (CSV, JSON) without reading batches

2019-04-21 Thread Neville Dipale (JIRA)
Neville Dipale created ARROW-5191:
-

 Summary: [Rust] Expose schema in readers (CSV, JSON) without 
reading batches
 Key: ARROW-5191
 URL: https://issues.apache.org/jira/browse/ARROW-5191
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Neville Dipale


It's sometimes convenient to be able to view a datasource's schema without 
reading the first record batch. This is a proposal to create a `pub fn 
schema() -> Arc` on the various readers that we support.

I think this would also enable schema inference in datafusion



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5188) [Rust] Add temporal builders for StructArray

2019-04-19 Thread Neville Dipale (JIRA)
Neville Dipale created ARROW-5188:
-

 Summary: [Rust] Add temporal builders for StructArray
 Key: ARROW-5188
 URL: https://issues.apache.org/jira/browse/ARROW-5188
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Rust
Reporter: Neville Dipale


StructBuilder currently doesn't have builders for temporal arrays.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5187) [Rust] Ability to flatten StructArray into a RecordBatch

2019-04-19 Thread Neville Dipale (JIRA)
Neville Dipale created ARROW-5187:
-

 Summary: [Rust] Ability to flatten StructArray into a RecordBatch
 Key: ARROW-5187
 URL: https://issues.apache.org/jira/browse/ARROW-5187
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Rust
Affects Versions: 0.13.0
Reporter: Neville Dipale


Add the ability to flatten a schema into a record batch.

StructBuilder and StructArray have convenient methods to build multiple arrays. 
Being able to use these convenient methods and then convert the result to a 
record batch reduces the amount of boilerplate when creating Arrow data from 
sources like databases.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5182) [Rust] Create Arrow File writer

2019-04-17 Thread Neville Dipale (JIRA)
Neville Dipale created ARROW-5182:
-

 Summary: [Rust] Create Arrow File writer
 Key: ARROW-5182
 URL: https://issues.apache.org/jira/browse/ARROW-5182
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Rust
Reporter: Neville Dipale






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5181) [Rust] Create Arrow File reader

2019-04-17 Thread Neville Dipale (JIRA)
Neville Dipale created ARROW-5181:
-

 Summary: [Rust] Create Arrow File reader
 Key: ARROW-5181
 URL: https://issues.apache.org/jira/browse/ARROW-5181
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Rust
Reporter: Neville Dipale


Initial support for reading the Arrow File format



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5180) [Rust] IPC Support

2019-04-17 Thread Neville Dipale (JIRA)
Neville Dipale created ARROW-5180:
-

 Summary: [Rust] IPC Support
 Key: ARROW-5180
 URL: https://issues.apache.org/jira/browse/ARROW-5180
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Rust
Reporter: Neville Dipale


The overall ticket to keep track of initial IPC support



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4968) [Rust] StructArray builder and From<> methods should check that field types match schema

2019-03-19 Thread Neville Dipale (JIRA)
Neville Dipale created ARROW-4968:
-

 Summary: [Rust] StructArray builder and From<> methods should 
check that field types match schema
 Key: ARROW-4968
 URL: https://issues.apache.org/jira/browse/ARROW-4968
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust
Affects Versions: 0.13.0
Reporter: Neville Dipale


Similar to how we assert that array data types are equal to their field types, 
we should do the same for StructArray and StructBuilder where necessary



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Timeline for 0.13 Arrow release

2019-03-19 Thread Neville Dipale
When is the cut-off for PRs? We have a public holiday on Thursday, and I
want to use that to finish off my work on array casting.

If that'll be too late I can defer to the next release.

On Tue, 19 Mar 2019, 12:34 Antoine Pitrou,  wrote:

>
> The only potential blocker from my POV is
> https://issues.apache.org/jira/browse/ARROW-3578, but we've already
> lived with it for previous releases, so perhaps it's ok anyway?
>
> Regards
>
> Antoine.
>
>
> Le 19/03/2019 à 02:51, Wes McKinney a écrit :
> > hi folks,
> >
> > I think we're basically at the 0.13 end game here. There's some more
> > patches can get in, but do we all think we can cut an RC by the end of
> > the week? What are the blocking issues?
> >
> > Thanks
> > Wes
> >
> > On Sat, Mar 16, 2019 at 9:57 PM Kouhei Sutou  wrote:
> >>
> >> Hi,
> >>
> >>> Submitted the packaging builds:
> >>>
> https://github.com/kszucs/crossbow/branches/all?utf8=%E2%9C%93=build-452
> >>
> >> I've fixed .deb/.rpm packages:
> https://github.com/apache/arrow/pull/3934
> >> It has been merged.
> >> So .deb/.rpm packages are ready for release.
> >>
> >> Thanks,
> >> --
> >> kou
> >>
> >> In 
> >>   "Re: Timeline for 0.13 Arrow release" on Thu, 14 Mar 2019 16:24:43
> +0100,
> >>   Krisztián Szűcs  wrote:
> >>
> >>> Submitted the packaging builds:
> >>>
> https://github.com/kszucs/crossbow/branches/all?utf8=%E2%9C%93=build-452
> >>>
> >>> On Thu, Mar 14, 2019 at 4:19 PM Wes McKinney 
> wrote:
> >>>
>  The CMake refactor is merged! Kudos to Uwe for 3+ weeks of hard labor
> on
>  this.
> 
>  We should run all the packaging tasks and get a full accounting of
>  what is broken so we aren't surprised during the release process
> 
>  On Wed, Mar 13, 2019 at 9:39 AM Krisztián Szűcs
>   wrote:
> >
> > The proof of the pudding is in the eating. You convinced me.
> >
> > On Wed, Mar 13, 2019 at 3:31 PM Wes McKinney 
>  wrote:
> >
> >> Krisztian -- are you all right with proceeding with merging the
> CMake
> >> refactor? I'm pretty committed to helping fix the problems that come
> >> up. Since most consumers of the project don't test until _after_ a
> >> release, we won't find out about some problems until we merge it and
> >> release it. Thus, IMHO it doesn't make sense to wait another 8-10
> >> weeks since we'd be delaying feedback for that long. There are also
> a
> >> number of follow-on issues blocking on the refactor
> >>
> >> On Tue, Mar 12, 2019 at 11:39 AM Andy Grove 
>  wrote:
> >>>
> >>> I've cleaned up my issues for Rust, moving most of them to 0.14.0.
> >>>
> >>> I have two PRs in progress that I would appreciate reviews on:
> >>>
> >>> https://github.com/apache/arrow/pull/3671 - [Rust] Table API
> (a.k.a
> >>> DataFrame)
> >>>
> >>> https://github.com/apache/arrow/pull/3851 - [Rust] Parquet data
>  source
> >> in
> >>> DataFusion
> >>>
> >>> Once these are merged I have some small follow up PRs for 0.13.0
>  that I
> >> can
> >>> get done this week.
> >>>
> >>> Thanks,
> >>>
> >>> Andy.
> >>>
> >>>
> >>> On Tue, Mar 12, 2019 at 8:21 AM Wes McKinney 
> >> wrote:
> >>>
>  hi folks,
> 
>  I think we are on track to be able to release toward the end of
>  this
>  month. My proposed timeline:
> 
>  * This week (March 11-15): feature/improvement push mostly
>  * Next week (March 18-22): shift to bug fixes, stabilization,
> empty
>  backlog of feature/improvement JIRAs
>  * Week of March 25: propose release candidate
> 
>  Does this seem reasonable? This puts us at about 9-10 weeks from
>  0.12.
> 
>  We need an RM for 0.13, any PMCs want to volunteer?
> 
>  Take a look at our release page:
> 
> 
> >>
> 
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=103091219
> 
>  Out of the open or in-progress issues, we have:
> 
>  * C#: 3 issues
>  * C++ (all components): 51 issues
>  * Java: 3 issues
>  * Python: 38 issues
>  * Rust (all components): 33 issues
> 
>  Please help curating the backlogs for each component. There's a
>  smattering of issues in other categories. There are also 10 open
>  issues with No Component (and 20 resolved issues), those need
> their
>  metadata fixed.
> 
>  Thanks,
>  Wes
> 
>  On Wed, Feb 27, 2019 at 1:49 PM Wes McKinney  >
> >> wrote:
> >
> > The timeline for the 0.13 release is drawing closer. I would say
>  we
> > should consider a release candidate either the week of March 18
>  or
> > March 25, which gives us ~3 weeks to close out backlog items.
> >
> > There are around 220 issues open 

Re: Timeline for 0.13 Arrow release

2019-03-19 Thread Neville Dipale
Thanks Chao, I've provided details to reproduce in the array.rs unit tests

On Tue, 19 Mar 2019 at 06:24, Chao Sun  wrote:

> Neville, I think we should be able to fix the two bugs you mentioned within
> this week. I'll take a look. It would be great if you can provide more
> details in the JIRAs (e.g., test case to reproduce). Array currently
> doesn't expose a bitmask API, and I don't think we need specialized
> implementations for struct & list.
>
> Chao
>
> On Mon, Mar 18, 2019 at 8:24 PM Neville Dipale 
> wrote:
>
> > Hi Wes,
> >
> > In Rust, we have 2 bugs (
> https://issues.apache.org/jira/browse/ARROW-4914,
> > https://issues.apache.org/jira/browse/ARROW-4886) both related to array
> > slicing.
> >
> > In summary:
> >
> > * ARROW-4914, the bitmask of the original array is used to determine the
> > validity of the sliced array, but offsets aren't read correctly. An array
> > with 10111 sliced with (offset=2, len=3) will return bitmask of 101
> instead
> > of 111
> > * ARROW-4886, we implemented slice on the Array interface, but don't have
> > specialised implementations for struct and list, so we leak the
> > implementation.
> >
> > I think if we can't get to both by the time we release an RC, the best
> > solution would be to revert
> > https://issues.apache.org/jira/browse/ARROW-3954
> > .
> >
> > Any thoughts from Rust commiters?
> >
> > Neville
> >
> > On Tue, 19 Mar 2019, 03:51 Wes McKinney,  wrote:
> >
> > > hi folks,
> > >
> > > I think we're basically at the 0.13 end game here. There's some more
> > > patches can get in, but do we all think we can cut an RC by the end of
> > > the week? What are the blocking issues?
> > >
> > > Thanks
> > > Wes
> > >
> > > On Sat, Mar 16, 2019 at 9:57 PM Kouhei Sutou 
> wrote:
> > > >
> > > > Hi,
> > > >
> > > > > Submitted the packaging builds:
> > > > >
> > >
> >
> https://github.com/kszucs/crossbow/branches/all?utf8=%E2%9C%93=build-452
> > > >
> > > > I've fixed .deb/.rpm packages:
> > https://github.com/apache/arrow/pull/3934
> > > > It has been merged.
> > > > So .deb/.rpm packages are ready for release.
> > > >
> > > > Thanks,
> > > > --
> > > > kou
> > > >
> > > > In <
> cahm19a5somzxgcphc6ee-mr2usvvhwb252udgjrvocq-cb2...@mail.gmail.com
> > >
> > > >   "Re: Timeline for 0.13 Arrow release" on Thu, 14 Mar 2019 16:24:43
> > > +0100,
> > > >   Krisztián Szűcs  wrote:
> > > >
> > > > > Submitted the packaging builds:
> > > > >
> > >
> >
> https://github.com/kszucs/crossbow/branches/all?utf8=%E2%9C%93=build-452
> > > > >
> > > > > On Thu, Mar 14, 2019 at 4:19 PM Wes McKinney 
> > > wrote:
> > > > >
> > > > >> The CMake refactor is merged! Kudos to Uwe for 3+ weeks of hard
> > labor
> > > on
> > > > >> this.
> > > > >>
> > > > >> We should run all the packaging tasks and get a full accounting of
> > > > >> what is broken so we aren't surprised during the release process
> > > > >>
> > > > >> On Wed, Mar 13, 2019 at 9:39 AM Krisztián Szűcs
> > > > >>  wrote:
> > > > >> >
> > > > >> > The proof of the pudding is in the eating. You convinced me.
> > > > >> >
> > > > >> > On Wed, Mar 13, 2019 at 3:31 PM Wes McKinney <
> wesmck...@gmail.com
> > >
> > > > >> wrote:
> > > > >> >
> > > > >> > > Krisztian -- are you all right with proceeding with merging
> the
> > > CMake
> > > > >> > > refactor? I'm pretty committed to helping fix the problems
> that
> > > come
> > > > >> > > up. Since most consumers of the project don't test until
> > _after_ a
> > > > >> > > release, we won't find out about some problems until we merge
> it
> > > and
> > > > >> > > release it. Thus, IMHO it doesn't make sense to wait another
> > 8-10
> > > > >> > > weeks since we'd be delaying feedback for that long. There are
> > > also a
> > > > >> > > number of follow-on issues blocking

  1   2   >