Re: Question about `minibatch`

2023-06-20 Thread Ruoxi Sun
Thanks Weston, that makes a lot of sense. Please let me rephrase to make
sure I get this right.

So the main purpose of minibatch is actually about keeping the working set
within L1 (in addition with the side benefit of more chances to shortcut).
This requires splitting the input batch into minibatches. And this
minibatch is fix-sized, plus we want to avoid allocation, so it is
desirable to reuse a preallocated buffer across multiple minibatches. The
assumption in my original question about limiting the memory size of the
working set is not the main consideration but another possible side
benefit, i.e., comparing with having to calculate hashes for the whole
input batch, for example, for hash join?

*Rossi*


Weston Pace  于2023年6月21日周三 12:26写道:

> Those goals are somewhat compatible.  Sasha can probably correct me if I
> get this wrong but my understanding is that the minibatch is just large
> enough to ensure reliable vectorized execution.  It is used in some
> innermost critical sections to both keep the working set small (fit in L1)
> and allocation should be avoided.
>
> In addition to ensuring things fit in L1 there is also, I believe, a side
> benefit of using small loops to increase the chances of encountering
> special cases (e.g. all values null or no values null) which can sometimes
> save you from more complex logic.
>
> On Tue, Jun 20, 2023 at 7:32 PM Ruoxi Sun  wrote:
>
> > Hi,
> >
> > By looking at acero code, I'm curious about the concept `minibatch` being
> > used in swiss join and grouper.
> > I wonder if its purpose is to proactively limit the memory size of the
> > working set? Or is it the consequence of that the temp vector should be
> > fix-sized (to avoid costly memory allocation)? Additionally, what's the
> > impact of choosing the size of the minibatch?
> >
> > Really appreciate if someone can help me to clear this.
> >
> > Thanks.
> >
> > *Rossi*
> >
>


Re: Question about `minibatch`

2023-06-20 Thread Weston Pace
Those goals are somewhat compatible.  Sasha can probably correct me if I
get this wrong but my understanding is that the minibatch is just large
enough to ensure reliable vectorized execution.  It is used in some
innermost critical sections to both keep the working set small (fit in L1)
and allocation should be avoided.

In addition to ensuring things fit in L1 there is also, I believe, a side
benefit of using small loops to increase the chances of encountering
special cases (e.g. all values null or no values null) which can sometimes
save you from more complex logic.

On Tue, Jun 20, 2023 at 7:32 PM Ruoxi Sun  wrote:

> Hi,
>
> By looking at acero code, I'm curious about the concept `minibatch` being
> used in swiss join and grouper.
> I wonder if its purpose is to proactively limit the memory size of the
> working set? Or is it the consequence of that the temp vector should be
> fix-sized (to avoid costly memory allocation)? Additionally, what's the
> impact of choosing the size of the minibatch?
>
> Really appreciate if someone can help me to clear this.
>
> Thanks.
>
> *Rossi*
>


Re: [VOTE] Release Apache Arrow nanoarrow 0.2.0 - RC1

2023-06-20 Thread Sutou Kouhei
Hi,

I think that you needed to specify
"-DCMAKE_INSTALL_RPATH=${CONDA_PREFIX}/lib" when you build
Apache Arrow C++. (Or "LD_LIBRARY_PATH=${CONDA_PREFIX}/lib
dev/release/verify-release-candidate.sh ..." may work.)


Thanks,
-- 
kou

In <8bfb0384-46f0-07f7-a510-2f2eb3134...@python.org>
  "Re: [VOTE] Release Apache Arrow nanoarrow 0.2.0 - RC1" on Tue, 20 Jun 2023 
15:55:38 +0200,
  Antoine Pitrou  wrote:

> 
> I don't have much time to investigate and I don't think it's a blocker
> either way. Perhaps there's room for improvement on the Arrow C++ side
> as well...
> 
> 
> Le 20/06/2023 à 15:40, Dewey Dunnington a écrit :
>> Thanks for verifying!
>> I don't *think* there is anything non-standard about the
>> `find_package(Arrow)` / `target_link_libraries(..., arrow_shared)`
>> sequence used to link the tests (although clearly they aren't working
>> as intended!). You can pass extra arguments to CMake to help it find
>> the right Arrow using export NANOARROW_CMAKE_OPTIONS="-DArrow_DIR=..."
>> but here it sounds like it's finding the .so but failing to link the
>> dependencies. There are also instructions on creating a conda
>> environment with all required dependencies at [1].
>> [1]
>> https://github.com/apache/arrow-nanoarrow/blob/main/dev/release/README.md#conda-linux-and-macos
>> On Tue, Jun 20, 2023 at 9:32 AM Antoine Pitrou 
>> wrote:
>>>
>>>
>>> Ok, now running from the right repo :-), I get linker errors against
>>> Arrow C++ dependencies:
>>>
>>> [ 44%] Linking CXX executable utils_test
>>> /home/antoine/mambaforge/envs/pyarrow/bin/../lib/gcc/x86_64-conda-linux-gnu/12.2.0/../../../../x86_64-conda-linux-gnu/bin/ld:
>>> warning: libcrypto.so.3, needed by
>>> /home/antoine/mambaforge/envs/pyarrow/lib/libarrow.so.1300.0.0, not
>>> found (try using -rpath or -rpath-link)
>>>
>>> (etc.)
>>>
>>> https://gist.github.com/pitrou/3e6e9621e3b6cc2aff932eafdafef82b
>>>
>>> Note that Arrow C++ is compiled by myself inside a conda environment
>>> (which is activated when running the verification script).
>>>
>>> Regards
>>>
>>> Antoine.
>>>
>>>
>>>
>>> Le 20/06/2023 à 12:38, Raúl Cumplido a écrit :
 +1 (non-binding)

 I've run:
 ./verify-release-candidate.sh 0.2.0 1

 on Ubuntu 22.04 with conda:
 * arrow-cpp 12.0.0
 * gcc (conda-forge gcc 11.4.0-0) 11.4.0
 * r-base  4.2.3

 Thanks,
 Raúl

 El mar, 20 jun 2023 a las 1:55, Sutou Kouhei ()
 escribió:
>
> +1
>
> I ran the following command line on Debian GNU/Linux sid:
>
> CMAKE_PREFIX_PATH=/tmp/local \
>   dev/release/verify-release-candidate.sh 0.2.0 1
>
> with:
>
> * Apache Arrow C++ main
> * gcc (Debian 12.2.0-14) 12.2.0
> * R version 4.3.0 (2023-04-21) -- "Already Tomorrow"
>
>
> Thanks,
> --
> kou
>
> In
> 
> "[VOTE] Release Apache Arrow nanoarrow 0.2.0 - RC1" on Mon, 19 Jun
> 2023 15:58:45 -0300,
> Dewey Dunnington  wrote:
>
>> Hello,
>>
>> I would like to propose the following release candidate (RC1) of
>> Apache Arrow nanoarrow version 0.2.0. This release consists of 17
>> resolved GitHub issues [1].
>>
>> This release candidate is based on commit:
>> f71063605e288d9a8dd73cfdd9578773519b6743 [2]
>>
>> The source release rc1 is hosted at [3].
>> The changelog is located at [4].
>> The draft release post is located at [5].
>>
>> Please download, verify checksums and signatures, run the unit tests,
>> and vote on the release. See [6] for how to validate a release
>> candidate.
>>
>> The vote will be open for at least 72 hours.
>>
>> [ ] +1 Release this as Apache Arrow nanoarrow 0.2.0
>> [ ] +0
>> [ ] -1 Do not release this as Apache Arrow nanoarrow 0.2.0 because...
>>
>> [0] https://github.com/apache/arrow-nanoarrow
>> [1] https://github.com/apache/arrow-nanoarrow/milestone/2?closed=1
>> [2]
>> https://github.com/apache/arrow-nanoarrow/tree/apache-arrow-nanoarrow-0.2.0-rc1
>> [3]
>> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-nanoarrow-0.2.0-rc1/
>> [4]
>> https://github.com/apache/arrow-nanoarrow/blob/apache-arrow-nanoarrow-0.2.0-rc1/CHANGELOG.md
>> [5] https://github.com/apache/arrow-site/pull/364
>> [6]
>> https://github.com/apache/arrow-nanoarrow/blob/main/dev/release/README.md


Question about `minibatch`

2023-06-20 Thread Ruoxi Sun
Hi,

By looking at acero code, I'm curious about the concept `minibatch` being
used in swiss join and grouper.
I wonder if its purpose is to proactively limit the memory size of the
working set? Or is it the consequence of that the temp vector should be
fix-sized (to avoid costly memory allocation)? Additionally, what's the
impact of choosing the size of the minibatch?

Really appreciate if someone can help me to clear this.

Thanks.

*Rossi*


Re: [DISCUSS][C++] Can we require CMake 3.16+ since 13.0.0?

2023-06-20 Thread Sutou Kouhei
Hi,

> FYI3: We'll support Amazon Linux 2023 in Apache Arrow C++
> 13.0.0:
> 
> https://github.com/apache/arrow/pull/36081

Merged.

It seems that there is no objection for this.
I'll proceed this in the next week.


Thanks,
-- 
kou

In <20230616.061904.689752341473121297@clear-code.com>
  "[DISCUSS][C++] Can we require CMake 3.16+ since 13.0.0?" on Fri, 16 Jun 2023 
06:19:04 +0900 (JST),
  Sutou Kouhei  wrote:

> Hi,
> 
> We require CMake 3.5+ now because Ubuntu 18.04 ships 3.5.
> We dropped support for Ubuntu 18.04 because it reached EOL.
> 
> Can we require CMake 3.16+ in Apache Arrow C++ 13.0.0?
> 
> Here are CMake versions of our supported platforms:
> 
> * Ubuntu 20.04: CMake 3.16
> * CentOS 7: CMake 3.17
> * Debian GNU/Linux bullseye: 3.18
> * Amazon Linux 2: CMake 3.13!!!
> 
> See the Amazon Linux 2 item. It ships CMake 3.13 not CMake
> 3.16+. Can we drop support for Amazon Linux 2 in Apache
> Arrow C++ 13.0.0?
> 
> FYI1: Amazon Linux released a new version (Amazon Linux
> 2023) on 2023-03-15:
> 
> Amazon Linux 2023, a Cloud-Optimized Linux Distribution with
> Long-Term Support
> https://aws.amazon.com/blogs/aws/amazon-linux-2023-a-cloud-optimized-linux-distribution-with-long-term-support/
> 
> FYI2: Amazon Linux 2 was scheduled to reach EOL on
> 2023-06-30 but has been extended to 2025-06-30:
> 
> https://aws.amazon.com/amazon-linux-2/faqs/
> 
>> Amazon Linux 2 end of support date (End of Life, or EOL)
>> has been extended by two years from 2023-06-30 to
>> 2025-06-30 to provide customers with ample time to migrate
>> to the next version.
> 
> FYI3: We'll support Amazon Linux 2023 in Apache Arrow C++
> 13.0.0:
> 
> https://github.com/apache/arrow/pull/36081
> 
> 
> Related issue:
> 
> [C++] Require CMake 3.16 or later
> https://github.com/apache/arrow/issues/34921
> 
> 
> Thanks,
> -- 
> kou


Re: [DISCUSS][Format][Flight] Result set expiration support

2023-06-20 Thread Sutou Kouhei
Hi,

David provided the Java implementation. Thanks!

If anyone has any comments about this proposal, please share
them.


Thanks,
-- 
kou

In <20230619.151511.1159782462289578136@clear-code.com>
  "[DISCUSS][Format][Flight] Result set expiration support" on Mon, 19 Jun 2023 
15:15:11 +0900 (JST),
  Sutou Kouhei  wrote:

> Hi,
> 
> I would like to propose adding support for result set
> expiration to Apache Arrow Flight. If anyone has comments
> for this proposal, please share them at here or the issue
> for this proposal:
> https://github.com/apache/arrow/issues/35500
> 
> This is one of proposals in "[DISCUSS] Flight RPC/Flight
> SQL/ADBC enhancements":
> 
>   https://lists.apache.org/thread/247z3t06mf132nocngc1jkp3oqglz7jp
> 
> See also the "Flight RPC: Result Set Expiration" section in
> the design document for the proposals:
> 
>   
> https://docs.google.com/document/d/1jhPyPZSOo2iy0LqIJVUs9KWPyFULVFJXTILDfkadx2g/edit#
> 
> Changes since the original proposal:
> 
> * Pre-defined action names:
>   * CancelQuery -> CancelFlightInfo
>   * RefreshQuery -> RefreshFlightEndpoint
>   * CloseQuery -> CloseFlightInfo
>   See also the following discussions:
>   * Query -> FlightInfo:
> https://lists.apache.org/thread/71pp95q6yklodm6lfjttswr3slfowdrb
>   * RefreshQuery -> RefreshFlightEndpoint:
> https://github.com/apache/arrow/issues/35500#issuecomment-1578200076
> 
> Background:
> 
> Currently, it is undefined whether a client can call DoGet
> more than once. Clients may want to retry requests, and
> servers may not want to persist a query result forever.
> 
> Proposal:
> 
> Add an expiration time to FlightEndpoint. If present,
> clients may assume they can retry DoGet requests. Otherwise,
> clients should avoid retrying DoGet requests.
> 
> This proposal is "not" a full retry protocol.
> 
> Also, add "pre-defined" actions to Flight RPC for working
> with result sets. These are pre-defined Protobuf messages
> with standardized encodings for use with DoAction:
> 
>   * CancelFlightInfo: Asynchronously cancel the execution of
> a distributed query. (Replaces the equivalent Flight SQL
> action.)
>   * RefreshFlightEndpoint: Request an extension of the
> expiration of a FlightEndpoint.
>   * CloseFlightInfo: Close a FlightInfo so that the server
> can clean up resources early.
> 
> This lets the ADBC/JDBC/ODBC drivers for Flight SQL
> explicitly manage result set lifetimes. These can be used
> with Flight SQL as regular actions.
> 
> Implementation:
> 
> https://github.com/apache/arrow/pull/36009 is an
> implementation of this proposal. The pull requests has the
> followings:
> 
> 1. Format changes:
>* format/Flight.proto
>  
> https://github.com/apache/arrow/pull/36009/files#diff-53b6c132dcc789483c879f667a1c675792b77aae9a056b257d6b20287bb09dba
>* format/FlightSql.proto
>  
> https://github.com/apache/arrow/pull/36009/files#diff-fd4e5266a841a2b4196aadca76a4563b6770c91d400ee53b6235b96da628a01e
> 
> 2. Documentation changes:
>docs/source/format/Flight.rst
>
> https://github.com/apache/arrow/pull/36009/files#diff-839518fb41e923de682e8587f0b6fdb00eb8f3361d360c2f7249284a136a7d89
> 
> 3. The C++ implementation and an integration test:
>* cpp/src/arrow/flight/
> 
> 4. The Go implementation and an integration test:
>* go/arrow/flight/
>* go/arrow/internal/flight_integration/
> 
> The Java implementation may be added to this pull request.
> 
> Next:
> 
> I'll start a vote for this proposal after we reach a consensus
> on this proposal.
> 
> It's the standard process for format change.
> See also:
> https://arrow.apache.org/docs/dev/format/Changing.html
> 
> 
> Thanks,
> -- 
> kou


Re: [VOTE] Release Apache Arrow nanoarrow 0.2.0 - RC1

2023-06-20 Thread Dane Pitkin
+1 (non-binding)

Verified on MacOS (M1) using conda.

A couple of nuances:
* Had to uninstall gnupg in conda and used brew's gnupg instead (same issue
Will found).
* I initially encountered some intermittent CMake build timeouts with
gtest, but haven't been able to reproduce.

On Tue, Jun 20, 2023 at 9:55 AM Antoine Pitrou  wrote:

>
> I don't have much time to investigate and I don't think it's a blocker
> either way. Perhaps there's room for improvement on the Arrow C++ side
> as well...
>
>
> Le 20/06/2023 à 15:40, Dewey Dunnington a écrit :
> > Thanks for verifying!
> >
> > I don't *think* there is anything non-standard about the
> > `find_package(Arrow)` / `target_link_libraries(..., arrow_shared)`
> > sequence used to link the tests (although clearly they aren't working
> > as intended!). You can pass extra arguments to CMake to help it find
> > the right Arrow using export NANOARROW_CMAKE_OPTIONS="-DArrow_DIR=..."
> > but here it sounds like it's finding the .so but failing to link the
> > dependencies. There are also instructions on creating a conda
> > environment with all required dependencies at [1].
> >
> > [1]
> https://github.com/apache/arrow-nanoarrow/blob/main/dev/release/README.md#conda-linux-and-macos
> >
> > On Tue, Jun 20, 2023 at 9:32 AM Antoine Pitrou 
> wrote:
> >>
> >>
> >> Ok, now running from the right repo :-), I get linker errors against
> >> Arrow C++ dependencies:
> >>
> >> [ 44%] Linking CXX executable utils_test
> >>
> /home/antoine/mambaforge/envs/pyarrow/bin/../lib/gcc/x86_64-conda-linux-gnu/12.2.0/../../../../x86_64-conda-linux-gnu/bin/ld:
> >> warning: libcrypto.so.3, needed by
> >> /home/antoine/mambaforge/envs/pyarrow/lib/libarrow.so.1300.0.0, not
> >> found (try using -rpath or -rpath-link)
> >>
> >> (etc.)
> >>
> >> https://gist.github.com/pitrou/3e6e9621e3b6cc2aff932eafdafef82b
> >>
> >> Note that Arrow C++ is compiled by myself inside a conda environment
> >> (which is activated when running the verification script).
> >>
> >> Regards
> >>
> >> Antoine.
> >>
> >>
> >>
> >> Le 20/06/2023 à 12:38, Raúl Cumplido a écrit :
> >>> +1 (non-binding)
> >>>
> >>> I've run:
> >>> ./verify-release-candidate.sh 0.2.0 1
> >>>
> >>> on Ubuntu 22.04 with conda:
> >>> * arrow-cpp 12.0.0
> >>> * gcc (conda-forge gcc 11.4.0-0) 11.4.0
> >>> * r-base  4.2.3
> >>>
> >>> Thanks,
> >>> Raúl
> >>>
> >>> El mar, 20 jun 2023 a las 1:55, Sutou Kouhei ()
> escribió:
> 
>  +1
> 
>  I ran the following command line on Debian GNU/Linux sid:
> 
>  CMAKE_PREFIX_PATH=/tmp/local \
>    dev/release/verify-release-candidate.sh 0.2.0 1
> 
>  with:
> 
>  * Apache Arrow C++ main
>  * gcc (Debian 12.2.0-14) 12.2.0
>  * R version 4.3.0 (2023-04-21) -- "Already Tomorrow"
> 
> 
>  Thanks,
>  --
>  kou
> 
>  In  oy-8keyn0at47jpmaw...@mail.gmail.com>
>  "[VOTE] Release Apache Arrow nanoarrow 0.2.0 - RC1" on Mon, 19
> Jun 2023 15:58:45 -0300,
>  Dewey Dunnington  wrote:
> 
> > Hello,
> >
> > I would like to propose the following release candidate (RC1) of
> > Apache Arrow nanoarrow version 0.2.0. This release consists of 17
> > resolved GitHub issues [1].
> >
> > This release candidate is based on commit:
> > f71063605e288d9a8dd73cfdd9578773519b6743 [2]
> >
> > The source release rc1 is hosted at [3].
> > The changelog is located at [4].
> > The draft release post is located at [5].
> >
> > Please download, verify checksums and signatures, run the unit tests,
> > and vote on the release. See [6] for how to validate a release
> > candidate.
> >
> > The vote will be open for at least 72 hours.
> >
> > [ ] +1 Release this as Apache Arrow nanoarrow 0.2.0
> > [ ] +0
> > [ ] -1 Do not release this as Apache Arrow nanoarrow 0.2.0 because...
> >
> > [0] https://github.com/apache/arrow-nanoarrow
> > [1] https://github.com/apache/arrow-nanoarrow/milestone/2?closed=1
> > [2]
> https://github.com/apache/arrow-nanoarrow/tree/apache-arrow-nanoarrow-0.2.0-rc1
> > [3]
> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-nanoarrow-0.2.0-rc1/
> > [4]
> https://github.com/apache/arrow-nanoarrow/blob/apache-arrow-nanoarrow-0.2.0-rc1/CHANGELOG.md
> > [5] https://github.com/apache/arrow-site/pull/364
> > [6]
> https://github.com/apache/arrow-nanoarrow/blob/main/dev/release/README.md
>


Re: [DISCUSS][Format] Draft implementation of string view array format

2023-06-20 Thread Weston Pace
Before I say anything else I'll say that I am in favor of this new layout.
There is some existing literature on the idea (e.g. umbra) and your
benchmarks show some nice improvements.

Compared to some of the other layouts we've discussed recently (REE, list
veiw) I do think this layout is more unique and fundamentally different.
Perhaps most fundamentally different:

 * This is the first layout where the number of buffers depends on the data
and not the schema.  I think this is the most architecturally significant
fact.  It does require a (backwards compatible) change to the IPC format
itself, beyond just adding new type codes.  It also poses challenges in
places where we've assumed there will be at most 3 buffers (e.g. in
ArraySpan, though, as you have shown, we can work around this using a raw
pointers representation internally in those spots).

I think you've done some great work to integrate this well with Arrow-C++
and I'm convinced it can work.

I would be interested in hearing some input from the Rust community.

Ben, at one point there was some discussion that this might be a c-data
only type.  However, I believe that was based on the raw pointers
representation.  What you've proposed here, if I understand correctly, is
an index + offsets representation and it is suitable for IPC correct?
(e.g. I see that you have changes and examples in the IPC reader/writer)

On Mon, Jun 19, 2023 at 7:17 AM Benjamin Kietzman 
wrote:

> Hi Gang,
>
> I'm not sure what you mean, sorry if my answers are off base:
>
> Parquet's ByteArray will be unaffected by the addition of the string view
> type;
> all arrow strings (arrow::Type::STRING, arrow::Type::LARGE_STRING, and
> with this patch arrow::Type::STRING_VIEW) are converted to ByteArrays
> during serialization to parquet [1].
>
> If you mean that encoding of arrow::Type::STRING_VIEW will not be as fast
> as encoding of equivalent arrow::Type::STRING, that's something I haven't
> benchmarked so I can't answer definitively. I would expect it to be faster
> than
> first converting STRING_VIEW->STRING then encoding to parquet; direct
> encoding avoids allocating and populating temporary buffers. Of course this
> only applies to cases where you need to encode an array of STRING_VIEW to
> parquet- encoding of STRING to parquet will be unaffected.
>
> Sincerely,
> Ben
>
> [1]
>
> https://github.com/bkietz/arrow/blob/46cf7e67766f0646760acefa4d2d01cdfead2d5d/cpp/src/parquet/encoding.cc#L166-L179
>
> On Thu, Jun 15, 2023 at 10:34 PM Gang Wu  wrote:
>
> > Hi Ben,
> >
> > The posted benchmark [1] looks pretty good to me. However, I want to
> > raise a possible issue from the perspective of parquet-cpp. Parquet-cpp
> > uses a customized parquet::ByteArray type [2] for string/binary, I would
> > expect some regression of conversions between parquet reader/writer
> > and the proposed string view array, especially when some strings use
> > short form and others use long form.
> >
> > [1]
> >
> >
> https://github.com/apache/arrow/blob/41309de8dd91a9821873fc5f94339f0542ca0108/cpp/src/parquet/types.h#L575
> > [2] https://github.com/apache/arrow/pull/35628#issuecomment-1583218617
> >
> > Best,
> > Gang
> >
> > On Fri, Jun 16, 2023 at 3:58 AM Will Jones 
> > wrote:
> >
> > > Cool. Thanks for doing that!
> > >
> > > On Thu, Jun 15, 2023 at 12:40 Benjamin Kietzman 
> > > wrote:
> > >
> > > > I've added https://github.com/apache/arrow/issues/36112 to track
> > > > deduplication of buffers on write.
> > > > I don't think it would require modification of the IPC format.
> > > >
> > > > Ben
> > > >
> > > > On Thu, Jun 15, 2023 at 1:30 PM Matt Topol 
> > > wrote:
> > > >
> > > > > Based on my understanding, in theory a buffer *could* be shared
> > within
> > > a
> > > > > batch since the flatbuffers message just uses an offset and length
> to
> > > > > identify the buffers.
> > > > >
> > > > > That said, I don't believe any current implementation actually does
> > > this
> > > > or
> > > > > takes advantage of this in any meaningful way.
> > > > >
> > > > > --Matt
> > > > >
> > > > > On Thu, Jun 15, 2023 at 1:00 PM Will Jones <
> will.jones...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hi Ben,
> > > > > >
> > > > > > It's exciting to see this move along.
> > > > > >
> > > > > > The buffers will be duplicated. If buffer duplication is becomes
> a
> > > > > concern,
> > > > > > > I'd prefer to handle
> > > > > > > that in the ipc writer. Then buffers which are duplicated could
> > be
> > > > > > detected
> > > > > > > by checking
> > > > > > > pointer identity and written only once.
> > > > > >
> > > > > >
> > > > > > Question: to be able to write buffer only once and reference in
> > > > multiple
> > > > > > arrays, does that require a change to the IPC format? Or is
> sharing
> > > > > buffers
> > > > > > within the same batch already allowed in the IPC format?
> > > > > >
> > > > > > Best,
> > > > > >
> > > > > > Will Jones
> > > > > >
> > > > > > On Thu, Jun 15, 2023 at 9:03 AM 

Re: [ANNOUNCE] New Arrow PMC member: Ben Baumgold,

2023-06-20 Thread Matt Topol
Congrats Ben!

On Tue, Jun 20, 2023, 11:00 AM Weston Pace  wrote:

> Congratulations Ben!
>
> On Tue, Jun 20, 2023 at 7:38 AM Jacob Quinn 
> wrote:
>
> > Yay! Congrats Ben! Love to see more Julia folks here!
> >
> > -Jacob
> >
> > On Tue, Jun 20, 2023 at 4:15 AM Andrew Lamb 
> wrote:
> >
> > > The Project Management Committee (PMC) for Apache Arrow has invited
> > > Ben Baumgold, to become a PMC member and we are pleased to announce
> > > that Ben Baumgold has accepted.
> > >
> > > Congratulations and welcome!
> > >
> >
>


Re: [ANNOUNCE] New Arrow PMC member: Ben Baumgold,

2023-06-20 Thread Weston Pace
Congratulations Ben!

On Tue, Jun 20, 2023 at 7:38 AM Jacob Quinn  wrote:

> Yay! Congrats Ben! Love to see more Julia folks here!
>
> -Jacob
>
> On Tue, Jun 20, 2023 at 4:15 AM Andrew Lamb  wrote:
>
> > The Project Management Committee (PMC) for Apache Arrow has invited
> > Ben Baumgold, to become a PMC member and we are pleased to announce
> > that Ben Baumgold has accepted.
> >
> > Congratulations and welcome!
> >
>


Re: [ANNOUNCE] New Arrow PMC member: Ben Baumgold,

2023-06-20 Thread Jacob Quinn
Yay! Congrats Ben! Love to see more Julia folks here!

-Jacob

On Tue, Jun 20, 2023 at 4:15 AM Andrew Lamb  wrote:

> The Project Management Committee (PMC) for Apache Arrow has invited
> Ben Baumgold, to become a PMC member and we are pleased to announce
> that Ben Baumgold has accepted.
>
> Congratulations and welcome!
>


Re: [VOTE] Release Apache Arrow nanoarrow 0.2.0 - RC1

2023-06-20 Thread Antoine Pitrou



I don't have much time to investigate and I don't think it's a blocker 
either way. Perhaps there's room for improvement on the Arrow C++ side 
as well...



Le 20/06/2023 à 15:40, Dewey Dunnington a écrit :

Thanks for verifying!

I don't *think* there is anything non-standard about the
`find_package(Arrow)` / `target_link_libraries(..., arrow_shared)`
sequence used to link the tests (although clearly they aren't working
as intended!). You can pass extra arguments to CMake to help it find
the right Arrow using export NANOARROW_CMAKE_OPTIONS="-DArrow_DIR=..."
but here it sounds like it's finding the .so but failing to link the
dependencies. There are also instructions on creating a conda
environment with all required dependencies at [1].

[1] 
https://github.com/apache/arrow-nanoarrow/blob/main/dev/release/README.md#conda-linux-and-macos

On Tue, Jun 20, 2023 at 9:32 AM Antoine Pitrou  wrote:



Ok, now running from the right repo :-), I get linker errors against
Arrow C++ dependencies:

[ 44%] Linking CXX executable utils_test
/home/antoine/mambaforge/envs/pyarrow/bin/../lib/gcc/x86_64-conda-linux-gnu/12.2.0/../../../../x86_64-conda-linux-gnu/bin/ld:
warning: libcrypto.so.3, needed by
/home/antoine/mambaforge/envs/pyarrow/lib/libarrow.so.1300.0.0, not
found (try using -rpath or -rpath-link)

(etc.)

https://gist.github.com/pitrou/3e6e9621e3b6cc2aff932eafdafef82b

Note that Arrow C++ is compiled by myself inside a conda environment
(which is activated when running the verification script).

Regards

Antoine.



Le 20/06/2023 à 12:38, Raúl Cumplido a écrit :

+1 (non-binding)

I've run:
./verify-release-candidate.sh 0.2.0 1

on Ubuntu 22.04 with conda:
* arrow-cpp 12.0.0
* gcc (conda-forge gcc 11.4.0-0) 11.4.0
* r-base  4.2.3

Thanks,
Raúl

El mar, 20 jun 2023 a las 1:55, Sutou Kouhei () escribió:


+1

I ran the following command line on Debian GNU/Linux sid:

CMAKE_PREFIX_PATH=/tmp/local \
  dev/release/verify-release-candidate.sh 0.2.0 1

with:

* Apache Arrow C++ main
* gcc (Debian 12.2.0-14) 12.2.0
* R version 4.3.0 (2023-04-21) -- "Already Tomorrow"


Thanks,
--
kou

In 
"[VOTE] Release Apache Arrow nanoarrow 0.2.0 - RC1" on Mon, 19 Jun 2023 
15:58:45 -0300,
Dewey Dunnington  wrote:


Hello,

I would like to propose the following release candidate (RC1) of
Apache Arrow nanoarrow version 0.2.0. This release consists of 17
resolved GitHub issues [1].

This release candidate is based on commit:
f71063605e288d9a8dd73cfdd9578773519b6743 [2]

The source release rc1 is hosted at [3].
The changelog is located at [4].
The draft release post is located at [5].

Please download, verify checksums and signatures, run the unit tests,
and vote on the release. See [6] for how to validate a release
candidate.

The vote will be open for at least 72 hours.

[ ] +1 Release this as Apache Arrow nanoarrow 0.2.0
[ ] +0
[ ] -1 Do not release this as Apache Arrow nanoarrow 0.2.0 because...

[0] https://github.com/apache/arrow-nanoarrow
[1] https://github.com/apache/arrow-nanoarrow/milestone/2?closed=1
[2] 
https://github.com/apache/arrow-nanoarrow/tree/apache-arrow-nanoarrow-0.2.0-rc1
[3] 
https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-nanoarrow-0.2.0-rc1/
[4] 
https://github.com/apache/arrow-nanoarrow/blob/apache-arrow-nanoarrow-0.2.0-rc1/CHANGELOG.md
[5] https://github.com/apache/arrow-site/pull/364
[6] https://github.com/apache/arrow-nanoarrow/blob/main/dev/release/README.md


Re: [VOTE] Release Apache Arrow nanoarrow 0.2.0 - RC1

2023-06-20 Thread Dewey Dunnington
Thanks for verifying!

I don't *think* there is anything non-standard about the
`find_package(Arrow)` / `target_link_libraries(..., arrow_shared)`
sequence used to link the tests (although clearly they aren't working
as intended!). You can pass extra arguments to CMake to help it find
the right Arrow using export NANOARROW_CMAKE_OPTIONS="-DArrow_DIR=..."
but here it sounds like it's finding the .so but failing to link the
dependencies. There are also instructions on creating a conda
environment with all required dependencies at [1].

[1] 
https://github.com/apache/arrow-nanoarrow/blob/main/dev/release/README.md#conda-linux-and-macos

On Tue, Jun 20, 2023 at 9:32 AM Antoine Pitrou  wrote:
>
>
> Ok, now running from the right repo :-), I get linker errors against
> Arrow C++ dependencies:
>
> [ 44%] Linking CXX executable utils_test
> /home/antoine/mambaforge/envs/pyarrow/bin/../lib/gcc/x86_64-conda-linux-gnu/12.2.0/../../../../x86_64-conda-linux-gnu/bin/ld:
> warning: libcrypto.so.3, needed by
> /home/antoine/mambaforge/envs/pyarrow/lib/libarrow.so.1300.0.0, not
> found (try using -rpath or -rpath-link)
>
> (etc.)
>
> https://gist.github.com/pitrou/3e6e9621e3b6cc2aff932eafdafef82b
>
> Note that Arrow C++ is compiled by myself inside a conda environment
> (which is activated when running the verification script).
>
> Regards
>
> Antoine.
>
>
>
> Le 20/06/2023 à 12:38, Raúl Cumplido a écrit :
> > +1 (non-binding)
> >
> > I've run:
> > ./verify-release-candidate.sh 0.2.0 1
> >
> > on Ubuntu 22.04 with conda:
> > * arrow-cpp 12.0.0
> > * gcc (conda-forge gcc 11.4.0-0) 11.4.0
> > * r-base  4.2.3
> >
> > Thanks,
> > Raúl
> >
> > El mar, 20 jun 2023 a las 1:55, Sutou Kouhei () 
> > escribió:
> >>
> >> +1
> >>
> >> I ran the following command line on Debian GNU/Linux sid:
> >>
> >>CMAKE_PREFIX_PATH=/tmp/local \
> >>  dev/release/verify-release-candidate.sh 0.2.0 1
> >>
> >> with:
> >>
> >>* Apache Arrow C++ main
> >>* gcc (Debian 12.2.0-14) 12.2.0
> >>* R version 4.3.0 (2023-04-21) -- "Already Tomorrow"
> >>
> >>
> >> Thanks,
> >> --
> >> kou
> >>
> >> In 
> >>"[VOTE] Release Apache Arrow nanoarrow 0.2.0 - RC1" on Mon, 19 Jun 2023 
> >> 15:58:45 -0300,
> >>Dewey Dunnington  wrote:
> >>
> >>> Hello,
> >>>
> >>> I would like to propose the following release candidate (RC1) of
> >>> Apache Arrow nanoarrow version 0.2.0. This release consists of 17
> >>> resolved GitHub issues [1].
> >>>
> >>> This release candidate is based on commit:
> >>> f71063605e288d9a8dd73cfdd9578773519b6743 [2]
> >>>
> >>> The source release rc1 is hosted at [3].
> >>> The changelog is located at [4].
> >>> The draft release post is located at [5].
> >>>
> >>> Please download, verify checksums and signatures, run the unit tests,
> >>> and vote on the release. See [6] for how to validate a release
> >>> candidate.
> >>>
> >>> The vote will be open for at least 72 hours.
> >>>
> >>> [ ] +1 Release this as Apache Arrow nanoarrow 0.2.0
> >>> [ ] +0
> >>> [ ] -1 Do not release this as Apache Arrow nanoarrow 0.2.0 because...
> >>>
> >>> [0] https://github.com/apache/arrow-nanoarrow
> >>> [1] https://github.com/apache/arrow-nanoarrow/milestone/2?closed=1
> >>> [2] 
> >>> https://github.com/apache/arrow-nanoarrow/tree/apache-arrow-nanoarrow-0.2.0-rc1
> >>> [3] 
> >>> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-nanoarrow-0.2.0-rc1/
> >>> [4] 
> >>> https://github.com/apache/arrow-nanoarrow/blob/apache-arrow-nanoarrow-0.2.0-rc1/CHANGELOG.md
> >>> [5] https://github.com/apache/arrow-site/pull/364
> >>> [6] 
> >>> https://github.com/apache/arrow-nanoarrow/blob/main/dev/release/README.md


[RESULT][VOTE][RUST] Release Apache Arrow Rust 42.0.0 RC1

2023-06-20 Thread Andrew Lamb
With 5 +1 votes(4 binding) the release is approved!

The release is available here:
  https://dist.apache.org/repos/dist/release/arrow/arrow-rs-42.0.0

As well as crates.io:
https://crates.io/crates/arrow/42.0.0 (and similar)

Thanks to everyone who contributed and voted on this release.

Andrew

On Sun, Jun 18, 2023 at 4:10 PM Will Jones  wrote:

> +1, verified on MacOS M1.
>
> Thanks Andrew!
>
> On Sun, Jun 18, 2023 at 7:02 AM Wayne Xia  wrote:
>
> > +1, verified on amd64 linux, thanks!
> >
> >
> > vin jake  :
> >
> > > +1 (binding)
> > >
> > > Verified on M1 macbook.
> > >
> > > Thanks Andrew.
> > >
> > > On Sat, Jun 17, 2023, 02:40 Andrew Lamb  wrote:
> > >
> > > > Hi,
> > > >
> > > > I would like to propose a release of Apache Arrow Rust
> Implementation,
> > > > version 42.0.0.
> > > >
> > > > Please note that there is one known regression in this release
> related
> > to
> > > > parsing intervals like '.5 months' [5], but I do not believe it
> should
> > > > block the release (see [6] for rationale). However, if others feel
> > > > differently, there is a proposed fix [7] and once it is reviewed /
> > > merged I
> > > > can create a new RC as well
> > > >
> > > > This release candidate is based on commit:
> > > > 2c7b4efc1701d9db5a0cc6decacf1df22123645f [1]
> > > >
> > > > The proposed release tarball and signatures are hosted at [2].
> > > >
> > > > The changelog is located at [3].
> > > >
> > > > Please download, verify checksums and signatures, run the unit tests,
> > > > and vote on the release. There is a script [4] that automates some of
> > > > the verification.
> > > >
> > > > The vote will be open for at least 72 hours.
> > > >
> > > > [ ] +1 Release this as Apache Arrow Rust
> > > > [ ] +0
> > > > [ ] -1 Do not release this as Apache Arrow Rust  because...
> > > >
> > > > [1]:
> > > >
> > > >
> > >
> >
> https://github.com/apache/arrow-rs/tree/2c7b4efc1701d9db5a0cc6decacf1df22123645f
> > > > [2]:
> > > >
> > https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-rs-42.0.0-rc1
> > > > [3]:
> > > >
> > > >
> > >
> >
> https://github.com/apache/arrow-rs/blob/2c7b4efc1701d9db5a0cc6decacf1df22123645f/CHANGELOG.md
> > > > [4]:
> > > >
> > > >
> > >
> >
> https://github.com/apache/arrow-rs/blob/master/dev/release/verify-release-candidate.sh
> > > > [5]: https://github.com/apache/arrow-rs/issues/4424
> > > > [6]:
> > https://github.com/apache/arrow-rs/pull/4425#discussion_r1232573299
> > > > [6]: https://github.com/apache/arrow-rs/pull/4425
> > > >
> > >
> >
>


Re: [VOTE] Release Apache Arrow nanoarrow 0.2.0 - RC1

2023-06-20 Thread Antoine Pitrou



Ok, now running from the right repo :-), I get linker errors against 
Arrow C++ dependencies:


[ 44%] Linking CXX executable utils_test
/home/antoine/mambaforge/envs/pyarrow/bin/../lib/gcc/x86_64-conda-linux-gnu/12.2.0/../../../../x86_64-conda-linux-gnu/bin/ld: 
warning: libcrypto.so.3, needed by 
/home/antoine/mambaforge/envs/pyarrow/lib/libarrow.so.1300.0.0, not 
found (try using -rpath or -rpath-link)


(etc.)

https://gist.github.com/pitrou/3e6e9621e3b6cc2aff932eafdafef82b

Note that Arrow C++ is compiled by myself inside a conda environment 
(which is activated when running the verification script).


Regards

Antoine.



Le 20/06/2023 à 12:38, Raúl Cumplido a écrit :

+1 (non-binding)

I've run:
./verify-release-candidate.sh 0.2.0 1

on Ubuntu 22.04 with conda:
* arrow-cpp 12.0.0
* gcc (conda-forge gcc 11.4.0-0) 11.4.0
* r-base  4.2.3

Thanks,
Raúl

El mar, 20 jun 2023 a las 1:55, Sutou Kouhei () escribió:


+1

I ran the following command line on Debian GNU/Linux sid:

   CMAKE_PREFIX_PATH=/tmp/local \
 dev/release/verify-release-candidate.sh 0.2.0 1

with:

   * Apache Arrow C++ main
   * gcc (Debian 12.2.0-14) 12.2.0
   * R version 4.3.0 (2023-04-21) -- "Already Tomorrow"


Thanks,
--
kou

In 
   "[VOTE] Release Apache Arrow nanoarrow 0.2.0 - RC1" on Mon, 19 Jun 2023 
15:58:45 -0300,
   Dewey Dunnington  wrote:


Hello,

I would like to propose the following release candidate (RC1) of
Apache Arrow nanoarrow version 0.2.0. This release consists of 17
resolved GitHub issues [1].

This release candidate is based on commit:
f71063605e288d9a8dd73cfdd9578773519b6743 [2]

The source release rc1 is hosted at [3].
The changelog is located at [4].
The draft release post is located at [5].

Please download, verify checksums and signatures, run the unit tests,
and vote on the release. See [6] for how to validate a release
candidate.

The vote will be open for at least 72 hours.

[ ] +1 Release this as Apache Arrow nanoarrow 0.2.0
[ ] +0
[ ] -1 Do not release this as Apache Arrow nanoarrow 0.2.0 because...

[0] https://github.com/apache/arrow-nanoarrow
[1] https://github.com/apache/arrow-nanoarrow/milestone/2?closed=1
[2] 
https://github.com/apache/arrow-nanoarrow/tree/apache-arrow-nanoarrow-0.2.0-rc1
[3] 
https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-nanoarrow-0.2.0-rc1/
[4] 
https://github.com/apache/arrow-nanoarrow/blob/apache-arrow-nanoarrow-0.2.0-rc1/CHANGELOG.md
[5] https://github.com/apache/arrow-site/pull/364
[6] https://github.com/apache/arrow-nanoarrow/blob/main/dev/release/README.md


Re: [VOTE] Release Apache Arrow nanoarrow 0.2.0 - RC1

2023-06-20 Thread Antoine Pitrou



Ouch, please disregard this message, I was running the script from the 
wrong repo :-(



Le 20/06/2023 à 14:24, Antoine Pitrou a écrit :


Hello,

I tried to run the verification script and got the following error:
https://gist.github.com/pitrou/b2c77f3d7836d92cb6d589c735f98d5d

"""
gpg: Total number processed: 18
gpg:  unchanged: 18
curl: (22) The requested URL returned error: 404
Failed to verify release candidate. See /tmp/arrow-adbc-0.2.0.rzYU7 for
details.
"""

The mentioned directory doesn't contain any details, especially about
the faulty URL.

Regards

Antoine.



Le 19/06/2023 à 20:58, Dewey Dunnington a écrit :

Hello,

I would like to propose the following release candidate (RC1) of
Apache Arrow nanoarrow version 0.2.0. This release consists of 17
resolved GitHub issues [1].

This release candidate is based on commit:
f71063605e288d9a8dd73cfdd9578773519b6743 [2]

The source release rc1 is hosted at [3].
The changelog is located at [4].
The draft release post is located at [5].

Please download, verify checksums and signatures, run the unit tests,
and vote on the release. See [6] for how to validate a release
candidate.

The vote will be open for at least 72 hours.

[ ] +1 Release this as Apache Arrow nanoarrow 0.2.0
[ ] +0
[ ] -1 Do not release this as Apache Arrow nanoarrow 0.2.0 because...

[0] https://github.com/apache/arrow-nanoarrow
[1] https://github.com/apache/arrow-nanoarrow/milestone/2?closed=1
[2] 
https://github.com/apache/arrow-nanoarrow/tree/apache-arrow-nanoarrow-0.2.0-rc1
[3] 
https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-nanoarrow-0.2.0-rc1/
[4] 
https://github.com/apache/arrow-nanoarrow/blob/apache-arrow-nanoarrow-0.2.0-rc1/CHANGELOG.md
[5] https://github.com/apache/arrow-site/pull/364
[6] https://github.com/apache/arrow-nanoarrow/blob/main/dev/release/README.md


Re: [VOTE] Release Apache Arrow nanoarrow 0.2.0 - RC1

2023-06-20 Thread Antoine Pitrou



Hello,

I tried to run the verification script and got the following error:
https://gist.github.com/pitrou/b2c77f3d7836d92cb6d589c735f98d5d

"""
gpg: Total number processed: 18
gpg:  unchanged: 18
curl: (22) The requested URL returned error: 404
Failed to verify release candidate. See /tmp/arrow-adbc-0.2.0.rzYU7 for 
details.

"""

The mentioned directory doesn't contain any details, especially about 
the faulty URL.


Regards

Antoine.



Le 19/06/2023 à 20:58, Dewey Dunnington a écrit :

Hello,

I would like to propose the following release candidate (RC1) of
Apache Arrow nanoarrow version 0.2.0. This release consists of 17
resolved GitHub issues [1].

This release candidate is based on commit:
f71063605e288d9a8dd73cfdd9578773519b6743 [2]

The source release rc1 is hosted at [3].
The changelog is located at [4].
The draft release post is located at [5].

Please download, verify checksums and signatures, run the unit tests,
and vote on the release. See [6] for how to validate a release
candidate.

The vote will be open for at least 72 hours.

[ ] +1 Release this as Apache Arrow nanoarrow 0.2.0
[ ] +0
[ ] -1 Do not release this as Apache Arrow nanoarrow 0.2.0 because...

[0] https://github.com/apache/arrow-nanoarrow
[1] https://github.com/apache/arrow-nanoarrow/milestone/2?closed=1
[2] 
https://github.com/apache/arrow-nanoarrow/tree/apache-arrow-nanoarrow-0.2.0-rc1
[3] 
https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-nanoarrow-0.2.0-rc1/
[4] 
https://github.com/apache/arrow-nanoarrow/blob/apache-arrow-nanoarrow-0.2.0-rc1/CHANGELOG.md
[5] https://github.com/apache/arrow-site/pull/364
[6] https://github.com/apache/arrow-nanoarrow/blob/main/dev/release/README.md


[ANNOUNCE] Apache Arrow ADBC 0.5.0 released

2023-06-20 Thread David Li
The Apache Arrow community is pleased to announce the 0.5.0 release of the 
Apache Arrow ADBC libraries. It includes 37 resolved GitHub issues ([1]). 

The release is available now from [2] and [3].

Release notes are available at: 
https://github.com/apache/arrow-adbc/blob/apache-arrow-adbc-0.5.0/CHANGELOG.md

What is Apache Arrow?
-
Apache Arrow is a columnar in-memory analytics layer designed to accelerate big 
data. It houses a set of canonical in-memory representations of flat and 
hierarchical data along with multiple language-bindings for structure 
manipulation. It also provides low-overhead streaming and batch messaging, 
zero-copy interprocess communication (IPC), and vectorized in-memory analytics 
libraries. Languages currently supported include C, C++, C#, Go, Java, 
JavaScript, Julia, MATLAB, Python, R, Ruby, and Rust.

What is Apache Arrow ADBC?
--
ADBC is a database access abstraction for Arrow-based applications. It provides 
a cross-language API for working with databases while using Arrow data, 
providing an alternative to APIs like JDBC and ODBC for analytical 
applications. For more, see [4].

Please report any feedback to the mailing lists ([5], [6]).

Regards,
The Apache Arrow Community

[1]: 
https://github.com/apache/arrow-adbc/issues?q=is%3Aissue+milestone%3A%22ADBC+Libraries+0.5.0%22+is%3Aclosed
[2]: https://www.apache.org/dyn/closer.cgi/arrow/apache-arrow-adbc-0.5.0 
[3]: https://apache.jfrog.io/ui/native/arrow
[4]: https://arrow.apache.org/blog/2023/01/05/introducing-arrow-adbc/
[5]: https://lists.apache.org/list.html?u...@arrow.apache.org
[6]: https://lists.apache.org/list.html?dev@arrow.apache.org


Re: [ANNOUNCE] New Arrow PMC member: Ben Baumgold,

2023-06-20 Thread Alenka Frim
Congratulations Ben!

On Tue, Jun 20, 2023 at 1:54 PM David Li  wrote:

> Welcome Ben!
>
> On Tue, Jun 20, 2023, at 06:14, Andrew Lamb wrote:
> > The Project Management Committee (PMC) for Apache Arrow has invited
> > Ben Baumgold, to become a PMC member and we are pleased to announce
> > that Ben Baumgold has accepted.
> >
> > Congratulations and welcome!
>


Re: [VOTE] Release Apache Arrow ADBC 0.5.0 - RC0

2023-06-20 Thread David Li
The vote passes with 7 +1 votes (3 binding, 4 non-binding). Thanks all!

Post-release tasks:

[x] Close the GitHub milestone/project
[ ] Add the new release to the Apache Reporter System
[ ] Upload source release artifacts to Subversion
[ ] Create the final GitHub release
[ ] Update website
[ ] Upload wheels/sdist to PyPI
[ ] Publish Maven packages
[ ] Update tags for Go modules
[ ] Deploy APT/Yum repositories
[ ] Upload Ruby packages to RubyGems
[ ] Update conda-forge packages
[ ] Announce the new release
[ ] Remove old artifacts
[ ] Bump versions
[ ] Publish release blog post

On Mon, Jun 19, 2023, at 22:11, Dewey Dunnington wrote:
> +1 (non-binding)
>
> I ran the following on MacOS M1:
>
> USE_CONDA=1 TEST_APT=0 TEST_YUM=0 ./verify-release-candidate.sh 0.5.0 0
>
> On Mon, Jun 19, 2023 at 12:12 PM Jean-Baptiste Onofré  
> wrote:
>>
>> +1 (non binding)
>>
>> Regards
>> JB
>>
>> On Fri, Jun 16, 2023 at 2:06 AM David Li  wrote:
>> >
>> > Hello,
>> >
>> > I would like to propose the following release candidate (RC0) of Apache 
>> > Arrow ADBC version 0.5.0. This is a release consisting of 36 resolved 
>> > GitHub issues [1].
>> >
>> > This release candidate is based on commit: 
>> > ac0e0ef8bd83787f65e53d421fce6ad490d9a37d [2]
>> >
>> > The source release rc0 is hosted at [3].
>> > The binary artifacts are hosted at [4][5][6][7][8].
>> > The changelog is located at [9].
>> >
>> > Please download, verify checksums and signatures, run the unit tests, and 
>> > vote on the release. See [10] for how to validate a release candidate.
>> >
>> > See also a verification result on GitHub Actions [11].
>> >
>> > The vote will be open for at least 72 hours.
>> >
>> > [ ] +1 Release this as Apache Arrow ADBC 0.5.0
>> > [ ] +0
>> > [ ] -1 Do not release this as Apache Arrow ADBC 0.5.0 because...
>> >
>> > Note: to verify APT/YUM packages on macOS/AArch64, you must `export 
>> > DOCKER_DEFAULT_ARCHITECTURE=linux/amd64`. (Or skip this step by `export 
>> > TEST_APT=0 TEST_YUM=0`.)
>> >
>> > [1]: 
>> > https://github.com/apache/arrow-adbc/issues?q=is%3Aissue+milestone%3A%22ADBC+Libraries+0.5.0%22+is%3Aclosed
>> > [2]: 
>> > https://github.com/apache/arrow-adbc/commit/ac0e0ef8bd83787f65e53d421fce6ad490d9a37d
>> > [3]: 
>> > https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-adbc-0.5.0-rc0/
>> > [4]: https://apache.jfrog.io/artifactory/arrow/almalinux-rc/
>> > [5]: https://apache.jfrog.io/artifactory/arrow/debian-rc/
>> > [6]: https://apache.jfrog.io/artifactory/arrow/ubuntu-rc/
>> > [7]: 
>> > https://repository.apache.org/content/repositories/staging/org/apache/arrow/adbc/
>> > [8]: 
>> > https://github.com/apache/arrow-adbc/releases/tag/apache-arrow-adbc-0.5.0-rc0
>> > [9]: 
>> > https://github.com/apache/arrow-adbc/blob/apache-arrow-adbc-0.5.0-rc0/CHANGELOG.md
>> > [10]: 
>> > https://arrow.apache.org/adbc/main/development/releasing.html#how-to-verify-release-candidates
>> > [11]: https://github.com/apache/arrow-adbc/actions/runs/5284608862


Re: [ANNOUNCE] New Arrow PMC member: Ben Baumgold,

2023-06-20 Thread David Li
Welcome Ben!

On Tue, Jun 20, 2023, at 06:14, Andrew Lamb wrote:
> The Project Management Committee (PMC) for Apache Arrow has invited
> Ben Baumgold, to become a PMC member and we are pleased to announce
> that Ben Baumgold has accepted.
>
> Congratulations and welcome!


Re: [VOTE] Release Apache Arrow nanoarrow 0.2.0 - RC1

2023-06-20 Thread Raúl Cumplido
+1 (non-binding)

I've run:
./verify-release-candidate.sh 0.2.0 1

on Ubuntu 22.04 with conda:
* arrow-cpp 12.0.0
* gcc (conda-forge gcc 11.4.0-0) 11.4.0
* r-base  4.2.3

Thanks,
Raúl

El mar, 20 jun 2023 a las 1:55, Sutou Kouhei () escribió:
>
> +1
>
> I ran the following command line on Debian GNU/Linux sid:
>
>   CMAKE_PREFIX_PATH=/tmp/local \
> dev/release/verify-release-candidate.sh 0.2.0 1
>
> with:
>
>   * Apache Arrow C++ main
>   * gcc (Debian 12.2.0-14) 12.2.0
>   * R version 4.3.0 (2023-04-21) -- "Already Tomorrow"
>
>
> Thanks,
> --
> kou
>
> In 
>   "[VOTE] Release Apache Arrow nanoarrow 0.2.0 - RC1" on Mon, 19 Jun 2023 
> 15:58:45 -0300,
>   Dewey Dunnington  wrote:
>
> > Hello,
> >
> > I would like to propose the following release candidate (RC1) of
> > Apache Arrow nanoarrow version 0.2.0. This release consists of 17
> > resolved GitHub issues [1].
> >
> > This release candidate is based on commit:
> > f71063605e288d9a8dd73cfdd9578773519b6743 [2]
> >
> > The source release rc1 is hosted at [3].
> > The changelog is located at [4].
> > The draft release post is located at [5].
> >
> > Please download, verify checksums and signatures, run the unit tests,
> > and vote on the release. See [6] for how to validate a release
> > candidate.
> >
> > The vote will be open for at least 72 hours.
> >
> > [ ] +1 Release this as Apache Arrow nanoarrow 0.2.0
> > [ ] +0
> > [ ] -1 Do not release this as Apache Arrow nanoarrow 0.2.0 because...
> >
> > [0] https://github.com/apache/arrow-nanoarrow
> > [1] https://github.com/apache/arrow-nanoarrow/milestone/2?closed=1
> > [2] 
> > https://github.com/apache/arrow-nanoarrow/tree/apache-arrow-nanoarrow-0.2.0-rc1
> > [3] 
> > https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-nanoarrow-0.2.0-rc1/
> > [4] 
> > https://github.com/apache/arrow-nanoarrow/blob/apache-arrow-nanoarrow-0.2.0-rc1/CHANGELOG.md
> > [5] https://github.com/apache/arrow-site/pull/364
> > [6] 
> > https://github.com/apache/arrow-nanoarrow/blob/main/dev/release/README.md


[ANNOUNCE] New Arrow PMC member: Ben Baumgold,

2023-06-20 Thread Andrew Lamb
The Project Management Committee (PMC) for Apache Arrow has invited
Ben Baumgold, to become a PMC member and we are pleased to announce
that Ben Baumgold has accepted.

Congratulations and welcome!