Re: [VOTE][RUST] Release Apache Arrow Rust 14.0.0 RC1

2022-05-13 Thread Yang hao
Thank you, Andrew!
+1 (non-binding)
Verified on MacOS 12.2 M1 Pro.

Best regards,
Remzi

From: L. C. Hsieh 
Date: Saturday, May 14, 2022 at 07:06
To: dev@arrow.apache.org 
Subject: Re: [VOTE][RUST] Release Apache Arrow Rust 14.0.0 RC1
+1 (non-binding)

Verified on Intel Mac.

On Fri, May 13, 2022 at 11:26 AM Andy Grove  wrote:
>
> +1 (binding)
>
> Verified on Ubuntu 20.04.4 LTS
>
> On Fri, May 13, 2022 at 12:05 PM Andrew Lamb  wrote:
>
> > Hi,
> >
> > I would like to propose a release of Apache Arrow Rust Implementation,
> > version 14.0.0.
> >
> > This release candidate is based on commit:
> > 33e298444f251258dd289c8377c68a80925ab0b4 [1]
> >
> > The proposed release tarball and signatures are hosted at [2].
> >
> > The changelog is located at [3].
> >
> > Please download, verify checksums and signatures, run the unit tests,
> > and vote on the release. There is a script [4] that automates some of
> > the verification.
> >
> > The vote will be open for at least 72 hours.
> >
> > [ ] +1 Release this as Apache Arrow Rust
> > [ ] +0
> > [ ] -1 Do not release this as Apache Arrow Rust  because...
> >
> > [1]:
> >
> > https://github.com/apache/arrow-rs/tree/33e298444f251258dd289c8377c68a80925ab0b4
> > [2]:
> > https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-rs-14.0.0-rc1
> > [3]:
> >
> > https://github.com/apache/arrow-rs/blob/33e298444f251258dd289c8377c68a80925ab0b4/CHANGELOG.md
> > [4]:
> >
> > https://github.com/apache/arrow-rs/blob/master/dev/release/verify-release-candidate.sh
> >


Re: [VOTE][RUST] Release Apache Arrow Rust 14.0.0 RC1

2022-05-13 Thread L. C. Hsieh
+1 (non-binding)

Verified on Intel Mac.

On Fri, May 13, 2022 at 11:26 AM Andy Grove  wrote:
>
> +1 (binding)
>
> Verified on Ubuntu 20.04.4 LTS
>
> On Fri, May 13, 2022 at 12:05 PM Andrew Lamb  wrote:
>
> > Hi,
> >
> > I would like to propose a release of Apache Arrow Rust Implementation,
> > version 14.0.0.
> >
> > This release candidate is based on commit:
> > 33e298444f251258dd289c8377c68a80925ab0b4 [1]
> >
> > The proposed release tarball and signatures are hosted at [2].
> >
> > The changelog is located at [3].
> >
> > Please download, verify checksums and signatures, run the unit tests,
> > and vote on the release. There is a script [4] that automates some of
> > the verification.
> >
> > The vote will be open for at least 72 hours.
> >
> > [ ] +1 Release this as Apache Arrow Rust
> > [ ] +0
> > [ ] -1 Do not release this as Apache Arrow Rust  because...
> >
> > [1]:
> >
> > https://github.com/apache/arrow-rs/tree/33e298444f251258dd289c8377c68a80925ab0b4
> > [2]:
> > https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-rs-14.0.0-rc1
> > [3]:
> >
> > https://github.com/apache/arrow-rs/blob/33e298444f251258dd289c8377c68a80925ab0b4/CHANGELOG.md
> > [4]:
> >
> > https://github.com/apache/arrow-rs/blob/master/dev/release/verify-release-candidate.sh
> >


Re: [Rust] Proposal to move Ballista to a top-level arrow-ballista repository

2022-05-13 Thread Andy Grove
I have updated the proposal, especially around how we use CI to avoid
breaking compatibility between DataFusion and Ballista, based on the
earlier discussions in the document.

I think that it would be ideal to put this plan into action as soon as
possible after the DataFusion 8.0.0 release so would like to see if there
is support for the latest proposal.

Thanks,

Andy.

On Wed, May 11, 2022 at 8:17 AM Andy Grove  wrote:

> I would like to propose that we move the Ballista project to a new
> top-level *arrow-ballista* repository.
>
> The rationale for this (copied from the GitHub issue [1]) is:
>
>- Decouple release process for DataFusion and Ballista
>- Allow each project to have top-level documentation and user guides
>that are targeting the appropriate audience
>- Reduce issue tracking and PR review burden for DataFusion
>maintainers who are not as interested in Ballista
>- Help avoid accidental circular dependencies being introduced between
>the projects (such as
>- datafusion-cli crate has circular dependency #2433
>)
>- Helps formalize the public API for DataFusion that other query
>engines should be using
>
> There is also a design document [2] where we will be discussing the finer
> details of this and coordinating on the plan to implement.
>
> I do not recall if a change of this nature requires a formal vote or not
> but I will plan on holding one before we create the new repo unless anyone
> tells me this is not required.
>
> Thanks,
>
> Andy.
>
> [1] https://github.com/apache/arrow-datafusion/issues/2502
> [2]
> https://docs.google.com/document/d/1jNRbadyStSrV5kifwn0khufAwq6OnzGczG4z8oTQJP4/edit?usp=sharing
>


Re: [VOTE][RUST] Release Apache Arrow Rust 14.0.0 RC1

2022-05-13 Thread Andy Grove
+1 (binding)

Verified on Ubuntu 20.04.4 LTS

On Fri, May 13, 2022 at 12:05 PM Andrew Lamb  wrote:

> Hi,
>
> I would like to propose a release of Apache Arrow Rust Implementation,
> version 14.0.0.
>
> This release candidate is based on commit:
> 33e298444f251258dd289c8377c68a80925ab0b4 [1]
>
> The proposed release tarball and signatures are hosted at [2].
>
> The changelog is located at [3].
>
> Please download, verify checksums and signatures, run the unit tests,
> and vote on the release. There is a script [4] that automates some of
> the verification.
>
> The vote will be open for at least 72 hours.
>
> [ ] +1 Release this as Apache Arrow Rust
> [ ] +0
> [ ] -1 Do not release this as Apache Arrow Rust  because...
>
> [1]:
>
> https://github.com/apache/arrow-rs/tree/33e298444f251258dd289c8377c68a80925ab0b4
> [2]:
> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-rs-14.0.0-rc1
> [3]:
>
> https://github.com/apache/arrow-rs/blob/33e298444f251258dd289c8377c68a80925ab0b4/CHANGELOG.md
> [4]:
>
> https://github.com/apache/arrow-rs/blob/master/dev/release/verify-release-candidate.sh
>


[VOTE][RUST] Release Apache Arrow Rust 14.0.0 RC1

2022-05-13 Thread Andrew Lamb
Hi,

I would like to propose a release of Apache Arrow Rust Implementation,
version 14.0.0.

This release candidate is based on commit:
33e298444f251258dd289c8377c68a80925ab0b4 [1]

The proposed release tarball and signatures are hosted at [2].

The changelog is located at [3].

Please download, verify checksums and signatures, run the unit tests,
and vote on the release. There is a script [4] that automates some of
the verification.

The vote will be open for at least 72 hours.

[ ] +1 Release this as Apache Arrow Rust
[ ] +0
[ ] -1 Do not release this as Apache Arrow Rust  because...

[1]:
https://github.com/apache/arrow-rs/tree/33e298444f251258dd289c8377c68a80925ab0b4
[2]: https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-rs-14.0.0-rc1
[3]:
https://github.com/apache/arrow-rs/blob/33e298444f251258dd289c8377c68a80925ab0b4/CHANGELOG.md
[4]:
https://github.com/apache/arrow-rs/blob/master/dev/release/verify-release-candidate.sh


Re: June 23 virtual conference to highlight work in the Arrow ecosystem

2022-05-13 Thread Andrew Lamb
> If folks would find it interesting, I could do a short talk on a
use-case for FlightSQL (and Substrait)

I would personally find it very interesting


On Fri, May 13, 2022 at 11:46 AM Gavin Ray  wrote:

> Super neat, saw the announcement post on Twitter and signed up the other
> day!
>
> If folks would find it interesting, I could do a short talk on a
> use-case for FlightSQL (and Substrait)
> The gist of it is having a central API that allows users/vendors to write
> "plugins" to register new data sources:
>
> [image: image.png]
>
> You lose a lot of the benefits of Arrow in the serialization to JSON, but
> FlightSQL as a specification is a great language-agnostic way to share
> schema metadata and handle queries.
> With Substrait you get a spec for expressing data compute operations as
> well, so you can have things solved on both the "tell me what you have" and
> "give me what you have" fronts.
>
> (Have to wait for write operations in Substrait though, for full
> functionality)
>
> On Fri, May 13, 2022 at 9:51 AM Wes McKinney  wrote:
>
>> hi all,
>>
>> My employer (Voltron Data) is organizing a free virtual conference on
>> June 23 to highlight development work and usage of Apache Arrow — you
>> can register for this or apply to give a talk here:
>>
>> https://thedatathread.com/
>>
>> We are especially interested in hearing from users (as opposed to only
>> project developers/contributors!) about how they are using Arrow in
>> their downstream applications. If you would be interested in speaking
>> (talks will be pre-recorded, so you don't need to be available on June
>> 23), please apply to give a short talk (~15 min) on the website!
>>
>> Thanks,
>> Wes
>>
>


Re: Arrow C-Data and DuckDB

2022-05-13 Thread Antoine Pitrou



I don't think this needs a vote, there is no functional change in the 
spec, it's just an additional technical recommendation that can go 
through the regular PR process.


Regards

Antoine.


Le 12/05/2022 à 22:24, David Li a écrit :

Thanks all for the comments. I see Tom also put up a PR to add this to DuckDB 
[1].

Do we need a vote for this? If so unless there are further comments I think we 
can start one.

[1]: https://github.com/duckdb/duckdb/pull/3628

On Tue, May 10, 2022, at 13:31, David Li wrote:

For discussion I've put up https://github.com/apache/arrow/pull/13115
to add this for the C data/stream interfaces.

On Mon, May 9, 2022, at 15:42, Antoine Pitrou wrote:

Le 09/05/2022 à 20:28, Tomek Drabas a écrit :

I am new to this board so please, let me know if any of this doesn't make
sense.

I am building a FligthSQL example with DuckDB backend. DuckDB already has
an Arrow interface defined in duckdb.h that returns ArrowArray. However,
the import is not guarded in any way, and ArrowArray is redefined in
duckdb.h, so including arrow/c/bridge.h throws an error that ArrowArray is
defined in multiple places.

I'd like to propose adding canonical guardrails in arrow/c/bridge.h to
avoid this. Is this the best way to do this?


It should probably be included in the spec:
https://arrow.apache.org/docs/format/CDataInterface.html#structure-definitions

Regards

Antoine.




Thanks,
-Tom



Re: Arrow sync call May 11 at 12:00 US/Eastern, 16:00 UTC

2022-05-13 Thread Antoine Pitrou



Le 13/05/2022 à 16:30, Alessandro Molina a écrit :

I think Arrow should definitely consider adding a DataFrame-like API.

There are multiple reasons why exposing Arrow to end users instead of
restricting it to developers of framework would be beneficial for the Arrow
project itself.

A rough approximation of DataFrame like API has been growing during the
years anyway in many bindings and it's probably better to consolidate that
effort in a structured process.


I'm not sure about this. Different languages have different de facto 
standards for dataframe APIs (e.g. Pandas for Python), so it may not be 
wise to try to unify them all.


There's also an argument that Arrow C++ should focus on the fundamental 
building blocks and let other people nifty APIs on top of this if they 
want to.


Regards

Antoine.



The main thing I'm concerned about is adding one more interface for users.
If we want to grow DataFrame like APIs we should grow them on top of
Dataset (Table probably wouldn't give us enough memory management
flexibility)  as for most users it's already confusing enough to understand
why they should use Table or Dataset. Figure if we add one more tabular
data structure.

On Thu, May 12, 2022 at 7:14 PM Wes McKinney  wrote:


Discussion about whether the community around Arrow would like to have

DataFrame-like APIs for Arrow in more languages, for example C++

We've discussed this a bit on the mailing list in the past, see


https://docs.google.com/document/d/1XHe_j87n2VHGzEbnLe786GHbbcbrzbjgG8D0IXWAeHg/edit#heading=h.g70gstc7jq4h

for example. It's a complicated subject because the problems that need
solving in a "data frame library" are much more than defining an API —
they involve establishing execution and mutation/copy-on-write
semantics (the latter which has been a huge topic of discussion in the
pandas community, for example). The API would be driving an internal
data management logic engine (similar to pandas's internal logic
engine — but hopefully we could make something without as many
problems) which would manipulate chunks of in-memory and out-of-core
Arrow data internally.

I still would be interested in an Arrow-native "data frame library"
similar to the SFrame library that's part of Apple's (now defunct?)
Turi Create library [1]

It's a can of worms but a problem not approached lightly (thinking of
that "one does not simply..." meme right now) and best done in heavy
consultation with communities that have experience supporting
production use of data frames for data science use cases for many
years.

[1]: https://github.com/apple/turicreate

On Wed, May 11, 2022 at 11:38 PM Ian Cook  wrote:


Attendees:

Joris Van den Bossche
Ian Cook
Nic Crane
Raul Cumplido
Ian Joiner
David Li
Rok Mihevc
Dragoș Moldovan-Grünfeld
Aldrin Montana
Weston Pace
Eduardo Ponce
Matthew Topol
Jacob Wujciak


Discussion:

Eduardo: Draft PR with a guide showing how to create a new Arrow C++
compute kernel [1]
  - Review requested

Weston: Proposed changes to ExecPlan in Arrow C++ compute engine [2]
  - Feedback requested on details described in the Jira

Rok: Temporal rounding kernels option in Arrow C++ compute engine [3]
  - Feedback requested about what we should name it
  - Possibilities include ceil_on_boundary, ceil_is_strictly_greater,
strict_ceil, ceil_is_strictly_greater, is_strict_ceil, ceil_is_strict
  - Joris favors ceil_is_strictly_greater

Ian C: Discussion about naming the Arrow C++ engine [4]
  - Comments welcome on the mailing list

David: ADBC (Arrow Database Connectivity) proposal [5][6]
  - Feedback requested

Ian C: Discussion about whether the community around Arrow would like
to have DataFrame-like APIs for Arrow in more languages, for example
C++
  - For C++, maybe this would look similar to xframe [7]
  - Probably better to approach projects like these outside of Arrow
and have them produce plans in Substrait format [8] which the Arrow
C++ engine (and other engines) could consume and execute

Arrow 8.0.0 release
  - Most post-release tasks complete
  - Please contribute to the release blog post [9]

Release process
  - Please comment on the proposed RC process change [10]
  - There is a discussion about changing to a bimonthly major releases
(instead of quarterly which is what we do now)
  - To make this work we could need nightly builds to be more stable;
Raul and Jacob are working on this

Should we publicly share a link that Arrow developers can use to join
the Zuilp chat?
  - Zulip has instructions for how to do this  [11]
  - We would need a Zulip admin to change the permissions to enable
this (Wes, Antonie, Weston, at al are admins)
  - What about the ASF Slack [12] ? Should we share the details about

that?

- The Slack has a rarely used Arrow channel and a Rust Arrow
channel which is more popular
- There were some doubts about whether committer permissions or the
associated apache.org email address are required to join, but in fact
anyone can join this Slack
  - Ian will 

Re: June 23 virtual conference to highlight work in the Arrow ecosystem

2022-05-13 Thread Gavin Ray
Super neat, saw the announcement post on Twitter and signed up the other
day!

If folks would find it interesting, I could do a short talk on a
use-case for FlightSQL (and Substrait)
The gist of it is having a central API that allows users/vendors to write
"plugins" to register new data sources:

[image: image.png]

You lose a lot of the benefits of Arrow in the serialization to JSON, but
FlightSQL as a specification is a great language-agnostic way to share
schema metadata and handle queries.
With Substrait you get a spec for expressing data compute operations as
well, so you can have things solved on both the "tell me what you have" and
"give me what you have" fronts.

(Have to wait for write operations in Substrait though, for full
functionality)

On Fri, May 13, 2022 at 9:51 AM Wes McKinney  wrote:

> hi all,
>
> My employer (Voltron Data) is organizing a free virtual conference on
> June 23 to highlight development work and usage of Apache Arrow — you
> can register for this or apply to give a talk here:
>
> https://thedatathread.com/
>
> We are especially interested in hearing from users (as opposed to only
> project developers/contributors!) about how they are using Arrow in
> their downstream applications. If you would be interested in speaking
> (talks will be pre-recorded, so you don't need to be available on June
> 23), please apply to give a short talk (~15 min) on the website!
>
> Thanks,
> Wes
>


Re: Arrow sync call May 11 at 12:00 US/Eastern, 16:00 UTC

2022-05-13 Thread Gavin Ray
I agree with this as well, and I it's also along the lines of what I was
trying to propose here:

"[RFC] [Java] Higher-level "DataFrame"-like API. Lower barrier to entry,
increase adoption/audience and productivity."
https://github.com/apache/arrow/issues/12618

It would be really nice if there was a canonical, language-independent
specification (or something close to it) for what a DataFrame-like API on
top of Arrow should look like.
Then you get continuity between languages and (in theory) it should be
easier to make contributions since they wouldn't be locked to a particular
language implementation.

On Fri, May 13, 2022 at 10:30 AM Alessandro Molina <
alessan...@ursacomputing.com> wrote:

> I think Arrow should definitely consider adding a DataFrame-like API.
>
> There are multiple reasons why exposing Arrow to end users instead of
> restricting it to developers of framework would be beneficial for the Arrow
> project itself.
>
> A rough approximation of DataFrame like API has been growing during the
> years anyway in many bindings and it's probably better to consolidate that
> effort in a structured process.
> The main thing I'm concerned about is adding one more interface for users.
> If we want to grow DataFrame like APIs we should grow them on top of
> Dataset (Table probably wouldn't give us enough memory management
> flexibility)  as for most users it's already confusing enough to understand
> why they should use Table or Dataset. Figure if we add one more tabular
> data structure.
>
> On Thu, May 12, 2022 at 7:14 PM Wes McKinney  wrote:
>
> > > Discussion about whether the community around Arrow would like to have
> > DataFrame-like APIs for Arrow in more languages, for example C++
> >
> > We've discussed this a bit on the mailing list in the past, see
> >
> >
> >
> https://docs.google.com/document/d/1XHe_j87n2VHGzEbnLe786GHbbcbrzbjgG8D0IXWAeHg/edit#heading=h.g70gstc7jq4h
> >
> > for example. It's a complicated subject because the problems that need
> > solving in a "data frame library" are much more than defining an API —
> > they involve establishing execution and mutation/copy-on-write
> > semantics (the latter which has been a huge topic of discussion in the
> > pandas community, for example). The API would be driving an internal
> > data management logic engine (similar to pandas's internal logic
> > engine — but hopefully we could make something without as many
> > problems) which would manipulate chunks of in-memory and out-of-core
> > Arrow data internally.
> >
> > I still would be interested in an Arrow-native "data frame library"
> > similar to the SFrame library that's part of Apple's (now defunct?)
> > Turi Create library [1]
> >
> > It's a can of worms but a problem not approached lightly (thinking of
> > that "one does not simply..." meme right now) and best done in heavy
> > consultation with communities that have experience supporting
> > production use of data frames for data science use cases for many
> > years.
> >
> > [1]: https://github.com/apple/turicreate
> >
> > On Wed, May 11, 2022 at 11:38 PM Ian Cook  wrote:
> > >
> > > Attendees:
> > >
> > > Joris Van den Bossche
> > > Ian Cook
> > > Nic Crane
> > > Raul Cumplido
> > > Ian Joiner
> > > David Li
> > > Rok Mihevc
> > > Dragoș Moldovan-Grünfeld
> > > Aldrin Montana
> > > Weston Pace
> > > Eduardo Ponce
> > > Matthew Topol
> > > Jacob Wujciak
> > >
> > >
> > > Discussion:
> > >
> > > Eduardo: Draft PR with a guide showing how to create a new Arrow C++
> > > compute kernel [1]
> > >  - Review requested
> > >
> > > Weston: Proposed changes to ExecPlan in Arrow C++ compute engine [2]
> > >  - Feedback requested on details described in the Jira
> > >
> > > Rok: Temporal rounding kernels option in Arrow C++ compute engine [3]
> > >  - Feedback requested about what we should name it
> > >  - Possibilities include ceil_on_boundary, ceil_is_strictly_greater,
> > > strict_ceil, ceil_is_strictly_greater, is_strict_ceil, ceil_is_strict
> > >  - Joris favors ceil_is_strictly_greater
> > >
> > > Ian C: Discussion about naming the Arrow C++ engine [4]
> > >  - Comments welcome on the mailing list
> > >
> > > David: ADBC (Arrow Database Connectivity) proposal [5][6]
> > >  - Feedback requested
> > >
> > > Ian C: Discussion about whether the community around Arrow would like
> > > to have DataFrame-like APIs for Arrow in more languages, for example
> > > C++
> > >  - For C++, maybe this would look similar to xframe [7]
> > >  - Probably better to approach projects like these outside of Arrow
> > > and have them produce plans in Substrait format [8] which the Arrow
> > > C++ engine (and other engines) could consume and execute
> > >
> > > Arrow 8.0.0 release
> > >  - Most post-release tasks complete
> > >  - Please contribute to the release blog post [9]
> > >
> > > Release process
> > >  - Please comment on the proposed RC process change [10]
> > >  - There is a discussion about changing to a bimonthly major releases
> > > (instead of 

Re: Arrow sync call May 11 at 12:00 US/Eastern, 16:00 UTC

2022-05-13 Thread Alessandro Molina
I think Arrow should definitely consider adding a DataFrame-like API.

There are multiple reasons why exposing Arrow to end users instead of
restricting it to developers of framework would be beneficial for the Arrow
project itself.

A rough approximation of DataFrame like API has been growing during the
years anyway in many bindings and it's probably better to consolidate that
effort in a structured process.
The main thing I'm concerned about is adding one more interface for users.
If we want to grow DataFrame like APIs we should grow them on top of
Dataset (Table probably wouldn't give us enough memory management
flexibility)  as for most users it's already confusing enough to understand
why they should use Table or Dataset. Figure if we add one more tabular
data structure.

On Thu, May 12, 2022 at 7:14 PM Wes McKinney  wrote:

> > Discussion about whether the community around Arrow would like to have
> DataFrame-like APIs for Arrow in more languages, for example C++
>
> We've discussed this a bit on the mailing list in the past, see
>
>
> https://docs.google.com/document/d/1XHe_j87n2VHGzEbnLe786GHbbcbrzbjgG8D0IXWAeHg/edit#heading=h.g70gstc7jq4h
>
> for example. It's a complicated subject because the problems that need
> solving in a "data frame library" are much more than defining an API —
> they involve establishing execution and mutation/copy-on-write
> semantics (the latter which has been a huge topic of discussion in the
> pandas community, for example). The API would be driving an internal
> data management logic engine (similar to pandas's internal logic
> engine — but hopefully we could make something without as many
> problems) which would manipulate chunks of in-memory and out-of-core
> Arrow data internally.
>
> I still would be interested in an Arrow-native "data frame library"
> similar to the SFrame library that's part of Apple's (now defunct?)
> Turi Create library [1]
>
> It's a can of worms but a problem not approached lightly (thinking of
> that "one does not simply..." meme right now) and best done in heavy
> consultation with communities that have experience supporting
> production use of data frames for data science use cases for many
> years.
>
> [1]: https://github.com/apple/turicreate
>
> On Wed, May 11, 2022 at 11:38 PM Ian Cook  wrote:
> >
> > Attendees:
> >
> > Joris Van den Bossche
> > Ian Cook
> > Nic Crane
> > Raul Cumplido
> > Ian Joiner
> > David Li
> > Rok Mihevc
> > Dragoș Moldovan-Grünfeld
> > Aldrin Montana
> > Weston Pace
> > Eduardo Ponce
> > Matthew Topol
> > Jacob Wujciak
> >
> >
> > Discussion:
> >
> > Eduardo: Draft PR with a guide showing how to create a new Arrow C++
> > compute kernel [1]
> >  - Review requested
> >
> > Weston: Proposed changes to ExecPlan in Arrow C++ compute engine [2]
> >  - Feedback requested on details described in the Jira
> >
> > Rok: Temporal rounding kernels option in Arrow C++ compute engine [3]
> >  - Feedback requested about what we should name it
> >  - Possibilities include ceil_on_boundary, ceil_is_strictly_greater,
> > strict_ceil, ceil_is_strictly_greater, is_strict_ceil, ceil_is_strict
> >  - Joris favors ceil_is_strictly_greater
> >
> > Ian C: Discussion about naming the Arrow C++ engine [4]
> >  - Comments welcome on the mailing list
> >
> > David: ADBC (Arrow Database Connectivity) proposal [5][6]
> >  - Feedback requested
> >
> > Ian C: Discussion about whether the community around Arrow would like
> > to have DataFrame-like APIs for Arrow in more languages, for example
> > C++
> >  - For C++, maybe this would look similar to xframe [7]
> >  - Probably better to approach projects like these outside of Arrow
> > and have them produce plans in Substrait format [8] which the Arrow
> > C++ engine (and other engines) could consume and execute
> >
> > Arrow 8.0.0 release
> >  - Most post-release tasks complete
> >  - Please contribute to the release blog post [9]
> >
> > Release process
> >  - Please comment on the proposed RC process change [10]
> >  - There is a discussion about changing to a bimonthly major releases
> > (instead of quarterly which is what we do now)
> >  - To make this work we could need nightly builds to be more stable;
> > Raul and Jacob are working on this
> >
> > Should we publicly share a link that Arrow developers can use to join
> > the Zuilp chat?
> >  - Zulip has instructions for how to do this  [11]
> >  - We would need a Zulip admin to change the permissions to enable
> > this (Wes, Antonie, Weston, at al are admins)
> >  - What about the ASF Slack [12] ? Should we share the details about
> that?
> >- The Slack has a rarely used Arrow channel and a Rust Arrow
> > channel which is more popular
> >- There were some doubts about whether committer permissions or the
> > associated apache.org email address are required to join, but in fact
> > anyone can join this Slack
> >  - Ian will follow up about this
> >
> > The Data Thread [13]
> >  - Voltron Data is hosting an Arrow-focused virtual 

Re: [R] Install arrow package: arrow.so undefined symbol

2022-05-13 Thread Neal Richardson
I created a PR to apply this change:
https://github.com/apache/arrow/pull/13151

On Tue, May 3, 2022 at 5:08 PM Sutou Kouhei  wrote:

> Hi,
>
> > * On the latter, I see that we're using snappy and lz4 from the system
> > (cmake finds them in the Arrow C++ build) but when the arrow.so is built
> > for the R package, there is -llz4 in the libs but no -lsnappy. The line
> > where the link libs are assembled from the C++ build via pkg-config is
> > here: https://github.com/apache/arrow/blob/master/r/configure#L185,
> maybe
> > the --static is relevant? Kou, do you know?
>
> We can't use --libs-only-l and --libs-only-other at the same
> time. If we use both of them at the same time,
> --libs-only-other is ignored. The following outputs were
> executed on CentOS 7 with the official arrow-devel RPM
> package:
>
>   $ pkg-config --libs-only-l --static arrow
>   -larrow -larrow_bundled_dependencies -lbrotlidec -lbrotlienc -lz -llz4
> -lzstd -lbrotlicommon
>   $ pkg-config --libs-only-other --static arrow
>   /usr/lib64/libsnappy.so /usr/lib64/libbz2.so -pthread
>   $ pkg-config --libs-only-l --libs-only-other --static arrow
>   -larrow -larrow_bundled_dependencies -lbrotlidec -lbrotlienc -lz -llz4
> -lzstd -lbrotlicommon
>
> How about changing the line in r/configure to the following?
>
>   PKG_LIBS="$PKG_LIBS `PKG_CONFIG_PATH=${LIB_DIR}/pkgconfig pkg-config
> --libs-only-l --static  --silence-errors ${PKG_CONFIG_NAME}`"
>   PKG_LIBS="$PKG_LIBS `PKG_CONFIG_PATH=${LIB_DIR}/pkgconfig pkg-config
> --libs-only-other --static  --silence-errors ${PKG_CONFIG_NAME}`"
>
>
> Thanks,
> --
> kou
>
> In 
>   "Re: [R] Install arrow package: arrow.so undefined symbol" on Tue, 3 May
> 2022 14:05:09 -0400,
>   Neal Richardson  wrote:
>
> > Hmm, I see a couple of things:
> >
> > * There are two errors at the end, one about pthread_cancel and one about
> > undefined symbols for snappy. I can't tell if the pthread issue is fatal.
> > * On the latter, I see that we're using snappy and lz4 from the system
> > (cmake finds them in the Arrow C++ build) but when the arrow.so is built
> > for the R package, there is -llz4 in the libs but no -lsnappy. The line
> > where the link libs are assembled from the C++ build via pkg-config is
> > here: https://github.com/apache/arrow/blob/master/r/configure#L185,
> maybe
> > the --static is relevant? Kou, do you know?
> > * In terms of working around this, you have a few options, which are
> > described in more detail here:
> > https://arrow.apache.org/docs/r/articles/install.html
> > * Set NOT_CRAN=true and get a fully-functioning C++ binary downloaded
> > * Install a binary from RStudio Package Manager, if that's an option
> > * conda may also be an option
> > * If none of those work and you're building from source, you can set
> > EXTRA_CMAKE_FLAGS="-DSNAPPY_SOURCE=BUNDLED" to skip the system version of
> > snappy and build it in the Arrow build
> > * You may also want to use a newer devtoolset since you're building
> > with gcc 4.8, and some features aren't supported with that compiler.
> >
> > I've trimmed the installation logs to the relevant bits in the quoted
> reply
> > that follows, in case that's useful.
> >
> > Neal
> >
> >
> > On Tue, May 3, 2022 at 1:08 PM Rares Vernica  wrote:
> >
> >> Hi Dragos,
> >>
> >> It still fails after setting the environment variable. Here is the log.
> >>
> >> Cheers,
> >> Rares
> >>
> >> ...
> >> > install.packages("arrow")
> >> Installing package into '/usr/lib64/R/library'
> >> ...
> >> *** Building libarrow from source
> >> For a faster, more complete installation, set the environment
> variable
> >> NOT_CRAN=true before installing
> >> See install vignette for details:
> >>
> https://cran.r-project.org/web/packages/arrow/vignettes/install.html
> >> *** Building with MAKEFLAGS= -j2
> >>  arrow with SOURCE_DIR='tools/cpp'
> >> BUILD_DIR='/tmp/RtmpObd2i3/file5310bef585'
> DEST_DIR='libarrow/arrow-7.0.0'
> >> CMAKE='/usr/bin/cmake3' EXTRA_CMAKE_FLAGS='' CC='gcc -m64 -std=gnu99'
> >> CXX='g++ -m64 -std=gnu++11' LDFLAGS='-Wl,-z,relro' ARROW_S3='OFF'
> >> ARROW_MIMALLOC='OFF'
> >> ...
> >> + /usr/bin/cmake3 -DARROW_BOOST_USE_SHARED=OFF -DARROW_BUILD_TESTS=OFF
> >> -DARROW_BUILD_SHARED=OFF -DARROW_BUILD_STATIC=ON -DARROW_COMPUTE=ON
> >> -DARROW_CSV=ON -DARROW_DATASET=ON -DARROW_DEPENDENCY_SOURCE=AUTO
> >> -DAWSSDK_SOURCE= -DARROW_FILESYSTEM=ON -DARROW_JEMALLOC=OFF
> >> -DARROW_MIMALLOC=OFF -DARROW_JSON=ON -DARROW_PARQUET=ON -DARROW_S3=OFF
> >> -DARROW_WITH_BROTLI=OFF -DARROW_WITH_BZ2=OFF -DARROW_WITH_LZ4=ON
> >> -DARROW_WITH_RE2=ON -DARROW_WITH_SNAPPY=ON -DARROW_WITH_UTF8PROC=ON
> >> -DARROW_WITH_ZLIB=OFF -DARROW_WITH_ZSTD=OFF
> >> -DARROW_VERBOSE_THIRDPARTY_BUILD=OFF -DCMAKE_BUILD_TYPE=Release
> >> -DCMAKE_INSTALL_LIBDIR=lib
> >>
> >>
> -DCMAKE_INSTALL_PREFIX=/tmp/Rtmp8rFBf9/R.INSTALL2f7e48ba40/arrow/libarrow/arrow-7.0.0
> >> -DCMAKE_EXPORT_NO_PACKAGE_REGISTRY=ON
> >> -DCMAKE_FIND_PACKAGE_NO_PACKAGE_REGISTRY=ON 

June 23 virtual conference to highlight work in the Arrow ecosystem

2022-05-13 Thread Wes McKinney
hi all,

My employer (Voltron Data) is organizing a free virtual conference on
June 23 to highlight development work and usage of Apache Arrow — you
can register for this or apply to give a talk here:

https://thedatathread.com/

We are especially interested in hearing from users (as opposed to only
project developers/contributors!) about how they are using Arrow in
their downstream applications. If you would be interested in speaking
(talks will be pre-recorded, so you don't need to be available on June
23), please apply to give a short talk (~15 min) on the website!

Thanks,
Wes


[Rust] Issues with signing release

2022-05-13 Thread Andy Grove
As Andrew notes in the current VOTE thread for DataFusion 8.0.0-rc2, there
is an issue with the key I used to sign the release:

gpg: Good signature from "Andy Grove " [unknown]
gpg: WARNING: This key is not certified with a trusted signature!
gpg:  There is no indication that the signature belongs to the

I found the current documentation a little lacking so could use some
guidance on what I need to do, and I can then better document this in the
repo.

The KEYS file has this header:

Users: pgp < KEYS
  gpg --import KEYS
Developers:
  pgp -kxa  and append it to this file.
  (pgpk -ll  && pgpk -xa ) >> this file.
  (gpg --list-sigs 
&& gpg --armor --export ) >> this file.

Was I supposed to run both the pgp and gpg commands in the developer
section? I perhaps naively assumed these were alternate options and I just
ran the following:

(gpg --list-sigs "Andy Grove" && gpg --armor --export "Andy Grove") >> KEYS
svn commit KEYS -m "Add key for Andy Grove"

Also, It wasn't immediately obvious to me how to install "pgpk" on Ubuntu.

Were there other steps that I have missed?

Thanks,

Andy.


Re: [VOTE][RUST][DataFusion] Release Apache Arrow DataFusion 8.0.0 RC2

2022-05-13 Thread Andy Grove
Thanks, Andrew. I will start a separate email thread about signing the
release.

On Fri, May 13, 2022 at 7:26 AM Andrew Lamb  wrote:

> If the issue is just in the test, I don't think a new RC is necessary
>
> On Fri, May 13, 2022 at 9:16 AM Andy Grove  wrote:
>
> > Thanks, Remzi. It looks like this test might be non-deterministic since
> it
> > does not have an ORDER BY clause. I will put up a patch and then cut rc3
> > once that is merged.
> >
> > On Fri, May 13, 2022 at 7:12 AM Yang hao <1371656737...@gmail.com>
> wrote:
> >
> > > Hi:
> > > Verified (8.0.0 rc2) on MacOS 12.2 M1 pro (non-binding)
> > >
> > > One test failed:
> > >
> > >
> > > expected:
> > >
> > >
> > >
> > > [
> > >
> > > "+-+-+-+",
> > >
> > > "| column2 | column1 | column3 |",
> > >
> > > "+-+-+-+",
> > >
> > > "| 2   | 1   | 3   |",
> > >
> > > "| 5   | 4   | 6   |",
> > >
> > > "+-+-+-+",
> > >
> > > ]
> > >
> > > actual:
> > >
> > >
> > >
> > > [
> > > "+-+-+-+",
> > > "| column2 | column1 | column3 |",
> > > "+-+-+-+",
> > > "| 5   | 4   | 6   |",
> > > "| 2   | 1   | 3   |",
> > > "+-+-+-+",
> > > ]
> > >
> > > ', datafusion/core/src/datasource/view.rs:239:9
> > >
> > >
> > > failures:
> > > datasource::view::tests::query_join_views
> > >
> > > Best regards,
> > > Remzi
> > >
> > > From: Andy Grove 
> > > Date: Friday, May 13, 2022 at 19:57
> > > To: dev 
> > > Subject: [VOTE][RUST][DataFusion] Release Apache Arrow DataFusion 8.0.0
> > RC2
> > > Hi,
> > >
> > > I would like to propose a release of Apache Arrow DataFusion
> > > Implementation,
> > > version 8.0.0.
> > >
> > > This release candidate is based on commit:
> > > b9f6e6b7c353c1109bd7b306008e006db29b46f8 [1]
> > > The proposed release tarball and signatures are hosted at [2].
> > > The changelog is located at [3].
> > >
> > > Please download, verify checksums and signatures, run the unit tests,
> and
> > > vote
> > > on the release. The vote will be open for at least 72 hours.
> > >
> > > Only votes from PMC members are binding, but all members of the
> community
> > > are
> > > encouraged to test the release and vote with "(non-binding)".
> > >
> > > The standard verification procedure is documented at
> > >
> > >
> >
> https://github.com/apache/arrow-datafusion/blob/master/dev/release/README.md#verifying-release-candidates
> > > .
> > >
> > > [ ] +1 Release this as Apache Arrow DataFusion 8.0.0
> > > [ ] +0
> > > [ ] -1 Do not release this as Apache Arrow DataFusion 8.0.0 because...
> > >
> > > Here is my vote:
> > >
> > > +1 (binding)
> > >
> > > [1]:
> > >
> > >
> >
> https://github.com/apache/arrow-datafusion/tree/b9f6e6b7c353c1109bd7b306008e006db29b46f8
> > > [2]:
> > >
> > >
> >
> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-datafusion-8.0.0-rc2
> > > [3]:
> > >
> > >
> >
> https://github.com/apache/arrow-datafusion/blob/b9f6e6b7c353c1109bd7b306008e006db29b46f8/CHANGELOG.md
> > >
> >
>


Re: [VOTE][RUST][DataFusion] Release Apache Arrow DataFusion 8.0.0 RC2

2022-05-13 Thread Andrew Lamb
If the issue is just in the test, I don't think a new RC is necessary

On Fri, May 13, 2022 at 9:16 AM Andy Grove  wrote:

> Thanks, Remzi. It looks like this test might be non-deterministic since it
> does not have an ORDER BY clause. I will put up a patch and then cut rc3
> once that is merged.
>
> On Fri, May 13, 2022 at 7:12 AM Yang hao <1371656737...@gmail.com> wrote:
>
> > Hi:
> > Verified (8.0.0 rc2) on MacOS 12.2 M1 pro (non-binding)
> >
> > One test failed:
> >
> >
> > expected:
> >
> >
> >
> > [
> >
> > "+-+-+-+",
> >
> > "| column2 | column1 | column3 |",
> >
> > "+-+-+-+",
> >
> > "| 2   | 1   | 3   |",
> >
> > "| 5   | 4   | 6   |",
> >
> > "+-+-+-+",
> >
> > ]
> >
> > actual:
> >
> >
> >
> > [
> > "+-+-+-+",
> > "| column2 | column1 | column3 |",
> > "+-+-+-+",
> > "| 5   | 4   | 6   |",
> > "| 2   | 1   | 3   |",
> > "+-+-+-+",
> > ]
> >
> > ', datafusion/core/src/datasource/view.rs:239:9
> >
> >
> > failures:
> > datasource::view::tests::query_join_views
> >
> > Best regards,
> > Remzi
> >
> > From: Andy Grove 
> > Date: Friday, May 13, 2022 at 19:57
> > To: dev 
> > Subject: [VOTE][RUST][DataFusion] Release Apache Arrow DataFusion 8.0.0
> RC2
> > Hi,
> >
> > I would like to propose a release of Apache Arrow DataFusion
> > Implementation,
> > version 8.0.0.
> >
> > This release candidate is based on commit:
> > b9f6e6b7c353c1109bd7b306008e006db29b46f8 [1]
> > The proposed release tarball and signatures are hosted at [2].
> > The changelog is located at [3].
> >
> > Please download, verify checksums and signatures, run the unit tests, and
> > vote
> > on the release. The vote will be open for at least 72 hours.
> >
> > Only votes from PMC members are binding, but all members of the community
> > are
> > encouraged to test the release and vote with "(non-binding)".
> >
> > The standard verification procedure is documented at
> >
> >
> https://github.com/apache/arrow-datafusion/blob/master/dev/release/README.md#verifying-release-candidates
> > .
> >
> > [ ] +1 Release this as Apache Arrow DataFusion 8.0.0
> > [ ] +0
> > [ ] -1 Do not release this as Apache Arrow DataFusion 8.0.0 because...
> >
> > Here is my vote:
> >
> > +1 (binding)
> >
> > [1]:
> >
> >
> https://github.com/apache/arrow-datafusion/tree/b9f6e6b7c353c1109bd7b306008e006db29b46f8
> > [2]:
> >
> >
> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-datafusion-8.0.0-rc2
> > [3]:
> >
> >
> https://github.com/apache/arrow-datafusion/blob/b9f6e6b7c353c1109bd7b306008e006db29b46f8/CHANGELOG.md
> >
>


Re: [VOTE][RUST][DataFusion] Release Apache Arrow DataFusion 8.0.0 RC2

2022-05-13 Thread Andrew Lamb
+1 (binding)

I tested it both manually as well as with the release script and it looks
great to me.

Thank you Andy for taking the lead to make it happen.

Andrew

p.s. I did notice the signature wasn't signed with other (trusted)
signatures

alamb@MacBook-Pro-6 Downloads % gpg --verify
apache-arrow-datafusion-8.0.0.tar.gz.asc
gpg: assuming signed data in 'apache-arrow-datafusion-8.0.0.tar.gz'
gpg: Signature made Fri May 13 07:37:17 2022 EDT
gpg:using RSA key B6550C65A4B9EE9F26111DB40B8A854E87467E2C
gpg: Good signature from "Andy Grove " [unknown]
gpg: WARNING: This key is not certified with a trusted signature!
gpg:  There is no indication that the signature belongs to the
owner.
Primary key fingerprint: B655 0C65 A4B9 EE9F 2611  1DB4 0B8A 854E 8746 7E2C

On Fri, May 13, 2022 at 7:57 AM Andy Grove  wrote:

> Hi,
>
> I would like to propose a release of Apache Arrow DataFusion
> Implementation,
> version 8.0.0.
>
> This release candidate is based on commit:
> b9f6e6b7c353c1109bd7b306008e006db29b46f8 [1]
> The proposed release tarball and signatures are hosted at [2].
> The changelog is located at [3].
>
> Please download, verify checksums and signatures, run the unit tests, and
> vote
> on the release. The vote will be open for at least 72 hours.
>
> Only votes from PMC members are binding, but all members of the community
> are
> encouraged to test the release and vote with "(non-binding)".
>
> The standard verification procedure is documented at
>
> https://github.com/apache/arrow-datafusion/blob/master/dev/release/README.md#verifying-release-candidates
> .
>
> [ ] +1 Release this as Apache Arrow DataFusion 8.0.0
> [ ] +0
> [ ] -1 Do not release this as Apache Arrow DataFusion 8.0.0 because...
>
> Here is my vote:
>
> +1 (binding)
>
> [1]:
>
> https://github.com/apache/arrow-datafusion/tree/b9f6e6b7c353c1109bd7b306008e006db29b46f8
> [2]:
>
> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-datafusion-8.0.0-rc2
> [3]:
>
> https://github.com/apache/arrow-datafusion/blob/b9f6e6b7c353c1109bd7b306008e006db29b46f8/CHANGELOG.md
>


Re: [VOTE][RUST][DataFusion] Release Apache Arrow DataFusion 8.0.0 RC2

2022-05-13 Thread Andy Grove
Thanks, Remzi. It looks like this test might be non-deterministic since it
does not have an ORDER BY clause. I will put up a patch and then cut rc3
once that is merged.

On Fri, May 13, 2022 at 7:12 AM Yang hao <1371656737...@gmail.com> wrote:

> Hi:
> Verified (8.0.0 rc2) on MacOS 12.2 M1 pro (non-binding)
>
> One test failed:
>
>
> expected:
>
>
>
> [
>
> "+-+-+-+",
>
> "| column2 | column1 | column3 |",
>
> "+-+-+-+",
>
> "| 2   | 1   | 3   |",
>
> "| 5   | 4   | 6   |",
>
> "+-+-+-+",
>
> ]
>
> actual:
>
>
>
> [
> "+-+-+-+",
> "| column2 | column1 | column3 |",
> "+-+-+-+",
> "| 5   | 4   | 6   |",
> "| 2   | 1   | 3   |",
> "+-+-+-+",
> ]
>
> ', datafusion/core/src/datasource/view.rs:239:9
>
>
> failures:
> datasource::view::tests::query_join_views
>
> Best regards,
> Remzi
>
> From: Andy Grove 
> Date: Friday, May 13, 2022 at 19:57
> To: dev 
> Subject: [VOTE][RUST][DataFusion] Release Apache Arrow DataFusion 8.0.0 RC2
> Hi,
>
> I would like to propose a release of Apache Arrow DataFusion
> Implementation,
> version 8.0.0.
>
> This release candidate is based on commit:
> b9f6e6b7c353c1109bd7b306008e006db29b46f8 [1]
> The proposed release tarball and signatures are hosted at [2].
> The changelog is located at [3].
>
> Please download, verify checksums and signatures, run the unit tests, and
> vote
> on the release. The vote will be open for at least 72 hours.
>
> Only votes from PMC members are binding, but all members of the community
> are
> encouraged to test the release and vote with "(non-binding)".
>
> The standard verification procedure is documented at
>
> https://github.com/apache/arrow-datafusion/blob/master/dev/release/README.md#verifying-release-candidates
> .
>
> [ ] +1 Release this as Apache Arrow DataFusion 8.0.0
> [ ] +0
> [ ] -1 Do not release this as Apache Arrow DataFusion 8.0.0 because...
>
> Here is my vote:
>
> +1 (binding)
>
> [1]:
>
> https://github.com/apache/arrow-datafusion/tree/b9f6e6b7c353c1109bd7b306008e006db29b46f8
> [2]:
>
> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-datafusion-8.0.0-rc2
> [3]:
>
> https://github.com/apache/arrow-datafusion/blob/b9f6e6b7c353c1109bd7b306008e006db29b46f8/CHANGELOG.md
>


Re: [VOTE][RUST][DataFusion] Release Apache Arrow DataFusion 8.0.0 RC2

2022-05-13 Thread Yang hao
Hi:
Verified (8.0.0 rc2) on MacOS 12.2 M1 pro (non-binding)

One test failed:


expected:



[

"+-+-+-+",

"| column2 | column1 | column3 |",

"+-+-+-+",

"| 2   | 1   | 3   |",

"| 5   | 4   | 6   |",

"+-+-+-+",

]

actual:



[
"+-+-+-+",
"| column2 | column1 | column3 |",
"+-+-+-+",
"| 5   | 4   | 6   |",
"| 2   | 1   | 3   |",
"+-+-+-+",
]

', datafusion/core/src/datasource/view.rs:239:9


failures:
datasource::view::tests::query_join_views

Best regards,
Remzi

From: Andy Grove 
Date: Friday, May 13, 2022 at 19:57
To: dev 
Subject: [VOTE][RUST][DataFusion] Release Apache Arrow DataFusion 8.0.0 RC2
Hi,

I would like to propose a release of Apache Arrow DataFusion Implementation,
version 8.0.0.

This release candidate is based on commit:
b9f6e6b7c353c1109bd7b306008e006db29b46f8 [1]
The proposed release tarball and signatures are hosted at [2].
The changelog is located at [3].

Please download, verify checksums and signatures, run the unit tests, and
vote
on the release. The vote will be open for at least 72 hours.

Only votes from PMC members are binding, but all members of the community
are
encouraged to test the release and vote with "(non-binding)".

The standard verification procedure is documented at
https://github.com/apache/arrow-datafusion/blob/master/dev/release/README.md#verifying-release-candidates
.

[ ] +1 Release this as Apache Arrow DataFusion 8.0.0
[ ] +0
[ ] -1 Do not release this as Apache Arrow DataFusion 8.0.0 because...

Here is my vote:

+1 (binding)

[1]:
https://github.com/apache/arrow-datafusion/tree/b9f6e6b7c353c1109bd7b306008e006db29b46f8
[2]:
https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-datafusion-8.0.0-rc2
[3]:
https://github.com/apache/arrow-datafusion/blob/b9f6e6b7c353c1109bd7b306008e006db29b46f8/CHANGELOG.md


[VOTE][RUST][DataFusion] Release Apache Arrow DataFusion 8.0.0 RC2

2022-05-13 Thread Andy Grove
Hi,

I would like to propose a release of Apache Arrow DataFusion Implementation,
version 8.0.0.

This release candidate is based on commit:
b9f6e6b7c353c1109bd7b306008e006db29b46f8 [1]
The proposed release tarball and signatures are hosted at [2].
The changelog is located at [3].

Please download, verify checksums and signatures, run the unit tests, and
vote
on the release. The vote will be open for at least 72 hours.

Only votes from PMC members are binding, but all members of the community
are
encouraged to test the release and vote with "(non-binding)".

The standard verification procedure is documented at
https://github.com/apache/arrow-datafusion/blob/master/dev/release/README.md#verifying-release-candidates
.

[ ] +1 Release this as Apache Arrow DataFusion 8.0.0
[ ] +0
[ ] -1 Do not release this as Apache Arrow DataFusion 8.0.0 because...

Here is my vote:

+1 (binding)

[1]:
https://github.com/apache/arrow-datafusion/tree/b9f6e6b7c353c1109bd7b306008e006db29b46f8
[2]:
https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-datafusion-8.0.0-rc2
[3]:
https://github.com/apache/arrow-datafusion/blob/b9f6e6b7c353c1109bd7b306008e006db29b46f8/CHANGELOG.md