Re: [VOTE] Release Apache Arrow ADBC 0.5.1 - RC1
+1! I ran USE_CONDA=1 TEST_APT=0 TEST_YUM=0 ./verify-release-candidate.sh 0.5.1 1 on MacOS M1. On Fri, Jun 23, 2023 at 8:50 PM David Li wrote: > > My vote: +1 (Ubuntu Linux 20.04/x86_64; macOS 13.4/AArch64) > > On Fri, Jun 23, 2023, at 17:51, Matt Topol wrote: > > +1 tested on Pop!_Os 22.04 with go 1.19 > > > > On Fri, Jun 23, 2023, 4:52 PM Sutou Kouhei wrote: > > > >> +1 > >> > >> I ran the following on Debian GNU/Linux sid: > >> > >> JAVA_HOME=/usr/lib/jvm/default-java \ > >> dev/release/verify-release-candidate.sh 0.5.1 1 > >> > >> with: > >> > >> * Python 3.11.4 > >> * g++ (Debian 12.3.0-4) 12.3.0 > >> * go version go1.20.5 linux/amd64 > >> * openjdk version "17.0.7" 2023-04-18 > >> * ruby 3.1.2p20 (2022-04-12 revision 4491bb740a) [x86_64-linux-gnu] > >> * R version 4.3.1 (2023-06-16) -- "Beagle Scouts" > >> > >> Thanks, > >> -- > >> kou > >> > >> In <321c4e07-60a1-402d-9574-8437b462e...@app.fastmail.com> > >> "[VOTE] Release Apache Arrow ADBC 0.5.1 - RC1" on Thu, 22 Jun 2023 > >> 22:08:56 -0400, > >> "David Li" wrote: > >> > >> > (I originally sent this with the wrong email, but it appears to have > >> been swallowed. Apologies if this ends up being a duplicate.) > >> > > >> > I would like to propose the following release candidate (RC1) of Apache > >> Arrow ADBC version 0.5.1. This is a release consisting of 8 resolved GitHub > >> issues [1]. The main motivation is to release a fix in the Snowflake > >> driver, as mentioned in an earlier thread. > >> > > >> > This release candidate is based on commit: > >> 01c2f1eb281e8fb003f2d32096a6b0fe336128a9 [2] > >> > (Note I had to manually patch one script; this will be resolved in > >> future releases.) > >> > > >> > The source release rc1 is hosted at [3]. > >> > The binary artifacts are hosted at [4][5][6][7][8]. > >> > The changelog is located at [9]. > >> > > >> > Please download, verify checksums and signatures, run the unit tests, > >> and vote on the release. See [10] for how to validate a release candidate. > >> > > >> > See also a verification result on GitHub Actions [11]. > >> > > >> > The vote will be open for at least 72 hours. > >> > > >> > [ ] +1 Release this as Apache Arrow ADBC 0.5.1 > >> > [ ] +0 > >> > [ ] -1 Do not release this as Apache Arrow ADBC 0.5.1 because... > >> > > >> > Note: to verify APT/YUM packages on macOS/AArch64, you must `export > >> DOCKER_DEFAULT_ARCHITECTURE=linux/amd64`. (Or skip this step by `export > >> TEST_APT=0 TEST_YUM=0`.) > >> > > >> > [1]: > >> https://github.com/apache/arrow-adbc/issues?q=is%3Aissue+milestone%3A%22ADBC+Libraries+0.5.1%22+is%3Aclosed > >> > [2]: > >> https://github.com/apache/arrow-adbc/commit/01c2f1eb281e8fb003f2d32096a6b0fe336128a9 > >> > [3]: > >> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-adbc-0.5.1-rc1/ > >> > [4]: https://apache.jfrog.io/artifactory/arrow/almalinux-rc/ > >> > [5]: https://apache.jfrog.io/artifactory/arrow/debian-rc/ > >> > [6]: https://apache.jfrog.io/artifactory/arrow/ubuntu-rc/ > >> > [7]: > >> https://repository.apache.org/content/repositories/staging/org/apache/arrow/adbc/ > >> > [8]: > >> https://github.com/apache/arrow-adbc/releases/tag/apache-arrow-adbc-0.5.1-rc1 > >> > [9]: > >> https://github.com/apache/arrow-adbc/blob/apache-arrow-adbc-0.5.1-rc1/CHANGELOG.md > >> > [10]: > >> https://arrow.apache.org/adbc/main/development/releasing.html#how-to-verify-release-candidates > >> > [11]: https://github.com/apache/arrow-adbc/actions/runs/5351160439 > >>
Re: [VOTE] Release Apache Arrow ADBC 0.5.1 - RC1
My vote: +1 (Ubuntu Linux 20.04/x86_64; macOS 13.4/AArch64) On Fri, Jun 23, 2023, at 17:51, Matt Topol wrote: > +1 tested on Pop!_Os 22.04 with go 1.19 > > On Fri, Jun 23, 2023, 4:52 PM Sutou Kouhei wrote: > >> +1 >> >> I ran the following on Debian GNU/Linux sid: >> >> JAVA_HOME=/usr/lib/jvm/default-java \ >> dev/release/verify-release-candidate.sh 0.5.1 1 >> >> with: >> >> * Python 3.11.4 >> * g++ (Debian 12.3.0-4) 12.3.0 >> * go version go1.20.5 linux/amd64 >> * openjdk version "17.0.7" 2023-04-18 >> * ruby 3.1.2p20 (2022-04-12 revision 4491bb740a) [x86_64-linux-gnu] >> * R version 4.3.1 (2023-06-16) -- "Beagle Scouts" >> >> Thanks, >> -- >> kou >> >> In <321c4e07-60a1-402d-9574-8437b462e...@app.fastmail.com> >> "[VOTE] Release Apache Arrow ADBC 0.5.1 - RC1" on Thu, 22 Jun 2023 >> 22:08:56 -0400, >> "David Li" wrote: >> >> > (I originally sent this with the wrong email, but it appears to have >> been swallowed. Apologies if this ends up being a duplicate.) >> > >> > I would like to propose the following release candidate (RC1) of Apache >> Arrow ADBC version 0.5.1. This is a release consisting of 8 resolved GitHub >> issues [1]. The main motivation is to release a fix in the Snowflake >> driver, as mentioned in an earlier thread. >> > >> > This release candidate is based on commit: >> 01c2f1eb281e8fb003f2d32096a6b0fe336128a9 [2] >> > (Note I had to manually patch one script; this will be resolved in >> future releases.) >> > >> > The source release rc1 is hosted at [3]. >> > The binary artifacts are hosted at [4][5][6][7][8]. >> > The changelog is located at [9]. >> > >> > Please download, verify checksums and signatures, run the unit tests, >> and vote on the release. See [10] for how to validate a release candidate. >> > >> > See also a verification result on GitHub Actions [11]. >> > >> > The vote will be open for at least 72 hours. >> > >> > [ ] +1 Release this as Apache Arrow ADBC 0.5.1 >> > [ ] +0 >> > [ ] -1 Do not release this as Apache Arrow ADBC 0.5.1 because... >> > >> > Note: to verify APT/YUM packages on macOS/AArch64, you must `export >> DOCKER_DEFAULT_ARCHITECTURE=linux/amd64`. (Or skip this step by `export >> TEST_APT=0 TEST_YUM=0`.) >> > >> > [1]: >> https://github.com/apache/arrow-adbc/issues?q=is%3Aissue+milestone%3A%22ADBC+Libraries+0.5.1%22+is%3Aclosed >> > [2]: >> https://github.com/apache/arrow-adbc/commit/01c2f1eb281e8fb003f2d32096a6b0fe336128a9 >> > [3]: >> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-adbc-0.5.1-rc1/ >> > [4]: https://apache.jfrog.io/artifactory/arrow/almalinux-rc/ >> > [5]: https://apache.jfrog.io/artifactory/arrow/debian-rc/ >> > [6]: https://apache.jfrog.io/artifactory/arrow/ubuntu-rc/ >> > [7]: >> https://repository.apache.org/content/repositories/staging/org/apache/arrow/adbc/ >> > [8]: >> https://github.com/apache/arrow-adbc/releases/tag/apache-arrow-adbc-0.5.1-rc1 >> > [9]: >> https://github.com/apache/arrow-adbc/blob/apache-arrow-adbc-0.5.1-rc1/CHANGELOG.md >> > [10]: >> https://arrow.apache.org/adbc/main/development/releasing.html#how-to-verify-release-candidates >> > [11]: https://github.com/apache/arrow-adbc/actions/runs/5351160439 >>
Re: [VOTE] Release Apache Arrow ADBC 0.5.1 - RC1
+1 tested on Pop!_Os 22.04 with go 1.19 On Fri, Jun 23, 2023, 4:52 PM Sutou Kouhei wrote: > +1 > > I ran the following on Debian GNU/Linux sid: > > JAVA_HOME=/usr/lib/jvm/default-java \ > dev/release/verify-release-candidate.sh 0.5.1 1 > > with: > > * Python 3.11.4 > * g++ (Debian 12.3.0-4) 12.3.0 > * go version go1.20.5 linux/amd64 > * openjdk version "17.0.7" 2023-04-18 > * ruby 3.1.2p20 (2022-04-12 revision 4491bb740a) [x86_64-linux-gnu] > * R version 4.3.1 (2023-06-16) -- "Beagle Scouts" > > Thanks, > -- > kou > > In <321c4e07-60a1-402d-9574-8437b462e...@app.fastmail.com> > "[VOTE] Release Apache Arrow ADBC 0.5.1 - RC1" on Thu, 22 Jun 2023 > 22:08:56 -0400, > "David Li" wrote: > > > (I originally sent this with the wrong email, but it appears to have > been swallowed. Apologies if this ends up being a duplicate.) > > > > I would like to propose the following release candidate (RC1) of Apache > Arrow ADBC version 0.5.1. This is a release consisting of 8 resolved GitHub > issues [1]. The main motivation is to release a fix in the Snowflake > driver, as mentioned in an earlier thread. > > > > This release candidate is based on commit: > 01c2f1eb281e8fb003f2d32096a6b0fe336128a9 [2] > > (Note I had to manually patch one script; this will be resolved in > future releases.) > > > > The source release rc1 is hosted at [3]. > > The binary artifacts are hosted at [4][5][6][7][8]. > > The changelog is located at [9]. > > > > Please download, verify checksums and signatures, run the unit tests, > and vote on the release. See [10] for how to validate a release candidate. > > > > See also a verification result on GitHub Actions [11]. > > > > The vote will be open for at least 72 hours. > > > > [ ] +1 Release this as Apache Arrow ADBC 0.5.1 > > [ ] +0 > > [ ] -1 Do not release this as Apache Arrow ADBC 0.5.1 because... > > > > Note: to verify APT/YUM packages on macOS/AArch64, you must `export > DOCKER_DEFAULT_ARCHITECTURE=linux/amd64`. (Or skip this step by `export > TEST_APT=0 TEST_YUM=0`.) > > > > [1]: > https://github.com/apache/arrow-adbc/issues?q=is%3Aissue+milestone%3A%22ADBC+Libraries+0.5.1%22+is%3Aclosed > > [2]: > https://github.com/apache/arrow-adbc/commit/01c2f1eb281e8fb003f2d32096a6b0fe336128a9 > > [3]: > https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-adbc-0.5.1-rc1/ > > [4]: https://apache.jfrog.io/artifactory/arrow/almalinux-rc/ > > [5]: https://apache.jfrog.io/artifactory/arrow/debian-rc/ > > [6]: https://apache.jfrog.io/artifactory/arrow/ubuntu-rc/ > > [7]: > https://repository.apache.org/content/repositories/staging/org/apache/arrow/adbc/ > > [8]: > https://github.com/apache/arrow-adbc/releases/tag/apache-arrow-adbc-0.5.1-rc1 > > [9]: > https://github.com/apache/arrow-adbc/blob/apache-arrow-adbc-0.5.1-rc1/CHANGELOG.md > > [10]: > https://arrow.apache.org/adbc/main/development/releasing.html#how-to-verify-release-candidates > > [11]: https://github.com/apache/arrow-adbc/actions/runs/5351160439 >
Re: [VOTE] Release Apache Arrow ADBC 0.5.1 - RC1
+1 I ran the following on Debian GNU/Linux sid: JAVA_HOME=/usr/lib/jvm/default-java \ dev/release/verify-release-candidate.sh 0.5.1 1 with: * Python 3.11.4 * g++ (Debian 12.3.0-4) 12.3.0 * go version go1.20.5 linux/amd64 * openjdk version "17.0.7" 2023-04-18 * ruby 3.1.2p20 (2022-04-12 revision 4491bb740a) [x86_64-linux-gnu] * R version 4.3.1 (2023-06-16) -- "Beagle Scouts" Thanks, -- kou In <321c4e07-60a1-402d-9574-8437b462e...@app.fastmail.com> "[VOTE] Release Apache Arrow ADBC 0.5.1 - RC1" on Thu, 22 Jun 2023 22:08:56 -0400, "David Li" wrote: > (I originally sent this with the wrong email, but it appears to have been > swallowed. Apologies if this ends up being a duplicate.) > > I would like to propose the following release candidate (RC1) of Apache Arrow > ADBC version 0.5.1. This is a release consisting of 8 resolved GitHub issues > [1]. The main motivation is to release a fix in the Snowflake driver, as > mentioned in an earlier thread. > > This release candidate is based on commit: > 01c2f1eb281e8fb003f2d32096a6b0fe336128a9 [2] > (Note I had to manually patch one script; this will be resolved in future > releases.) > > The source release rc1 is hosted at [3]. > The binary artifacts are hosted at [4][5][6][7][8]. > The changelog is located at [9]. > > Please download, verify checksums and signatures, run the unit tests, and > vote on the release. See [10] for how to validate a release candidate. > > See also a verification result on GitHub Actions [11]. > > The vote will be open for at least 72 hours. > > [ ] +1 Release this as Apache Arrow ADBC 0.5.1 > [ ] +0 > [ ] -1 Do not release this as Apache Arrow ADBC 0.5.1 because... > > Note: to verify APT/YUM packages on macOS/AArch64, you must `export > DOCKER_DEFAULT_ARCHITECTURE=linux/amd64`. (Or skip this step by `export > TEST_APT=0 TEST_YUM=0`.) > > [1]: > https://github.com/apache/arrow-adbc/issues?q=is%3Aissue+milestone%3A%22ADBC+Libraries+0.5.1%22+is%3Aclosed > [2]: > https://github.com/apache/arrow-adbc/commit/01c2f1eb281e8fb003f2d32096a6b0fe336128a9 > [3]: https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-adbc-0.5.1-rc1/ > [4]: https://apache.jfrog.io/artifactory/arrow/almalinux-rc/ > [5]: https://apache.jfrog.io/artifactory/arrow/debian-rc/ > [6]: https://apache.jfrog.io/artifactory/arrow/ubuntu-rc/ > [7]: > https://repository.apache.org/content/repositories/staging/org/apache/arrow/adbc/ > [8]: > https://github.com/apache/arrow-adbc/releases/tag/apache-arrow-adbc-0.5.1-rc1 > [9]: > https://github.com/apache/arrow-adbc/blob/apache-arrow-adbc-0.5.1-rc1/CHANGELOG.md > [10]: > https://arrow.apache.org/adbc/main/development/releasing.html#how-to-verify-release-candidates > [11]: https://github.com/apache/arrow-adbc/actions/runs/5351160439
Re: [Python][Discuss] PyArrow Dataset as a Python protocol
> The trouble is that Dataset was not designed to serve as a > general-purpose unmaterialized dataframe. For example, the PyArrow > Dataset constructor [5] exposes options for specifying a list of > source files and a partitioning scheme, which are irrelevant for many > of the applications that Will anticipates. And some work is needed to > reconcile the methods of the PyArrow Dataset object [6] with the > methods of the Table object. Some methods like filter() are exposed by > both and behave lazily on Datasets and eagerly on Tables, as a user > might expect. But many other Table methods are not implemented for > Dataset though they potentially could be, and it is unclear where we > should draw the line between adding methods to Dataset vs. encouraging > new scanner implementations to expose options controlling what lazy > operations should be performed as they see fit. In my mind there is a distinction between the "compute domain" (e.g. a pandas dataframe or something like ibis or SQL) and the "data domain" (e.g. pyarrow datasets). I think, in a perfect world, you could push any and all compute up and down the chain as far as possible. However, in practice, I think there is a healthy set of tools and libraries that say "simple column projection and filtering is good enough". I would argue that there is room for both APIs and while the temptation is always present to "shove as much compute as you can" I think pyarrow datasets seem to have found a balance between the two that users like. So I would argue that this protocol may never become a general-purpose unmaterialized dataframe and that isn't necessarily a bad thing. > they are splittable and serializable, so that fragments can be distributed > amongst processes / workers. Just to clarify, the proposal currently only requires the fragments to be serializable correct? On Fri, Jun 23, 2023 at 11:48 AM Will Jones wrote: > Thanks Ian for your extensive feedback. > > I strongly agree with the comments made by David, > > Weston, and Dewey arguing that we should avoid any use of PyArrow > > expressions in this API. Expressions are an implementation detail of > > PyArrow, not a part of the Arrow standard. It would be much safer for > > the initial version of this protocol to not define *any* > > methods/arguments that take expressions. > > > > I would agree with this point, if we were starting from scratch. But one of > my goals is for this protocol to be descriptive of the existing dataset > integrations in the ecosystem, which all currently rely on PyArrow > expressions. For example, you'll notice in the PR that there are unit tests > to verify the current PyArrow Dataset classes conform to this protocol, > without changes. > > I think there's three routes we can go here: > > 1. We keep PyArrow expressions in the API initially, but once we have > Substrait-based alternatives we deprecate the PyArrow expression support. > This is what I intended with the current design, and I think it provides > the most obvious migration paths for existing producers and consumers. > 2. We keep the overall dataset API, but don't introduce the filter and > projection arguments until we have Substrait support. I'm not sure what the > migration path looks like for producers and consumers, but I think this > just implicitly becomes the same as (1), but with worse documentation. > 3. We write a protocol completely from scratch, that doesn't try to > describe the existing dataset API. Producers and consumers would then > migrate to use the new protocol and deprecate their existing dataset > integrations. We could introduce a dunder method in that API (sort of like > __arrow_array__) that would make the migration seamless from the end-user > perspective. > > *Which do you all think is the best path forward?* > > Another concern I have is that we have not fully explained why we want > > to use Dataset instead of RecordBatchReader [9] as the basis of this > > protocol. I would like to see an explanation of why RecordBatchReader > > is not sufficient for this. RecordBatchReader seems like another > > possible way to represent "unmaterialized dataframes" and there are > > some parallels between RecordBatch/RecordBatchReader and > > Fragment/Dataset. > > > > This is a good point. I can add a section describing the differences. The > main ones I can think of are that: (1) Datasets are "pruneable": one can > select a subset of columns and apply a filter on rows to avoid IO and (2) > they are splittable and serializable, so that fragments can be distributed > amongst processes / workers. > > Best, > > Will Jones > > On Fri, Jun 23, 2023 at 10:48 AM Ian Cook wrote: > > > Thanks Will for this proposal! > > > > For anyone familiar with PyArrow, this idea has a clear intuitive > > logic to it. It provides an expedient solution to the current lack of > > a practical means for interchanging "unmaterialized dataframes" > > between different Python libraries. > > > > To elaborate on
Re: [Python][Discuss] PyArrow Dataset as a Python protocol
Thanks Ian for your extensive feedback. I strongly agree with the comments made by David, > Weston, and Dewey arguing that we should avoid any use of PyArrow > expressions in this API. Expressions are an implementation detail of > PyArrow, not a part of the Arrow standard. It would be much safer for > the initial version of this protocol to not define *any* > methods/arguments that take expressions. > I would agree with this point, if we were starting from scratch. But one of my goals is for this protocol to be descriptive of the existing dataset integrations in the ecosystem, which all currently rely on PyArrow expressions. For example, you'll notice in the PR that there are unit tests to verify the current PyArrow Dataset classes conform to this protocol, without changes. I think there's three routes we can go here: 1. We keep PyArrow expressions in the API initially, but once we have Substrait-based alternatives we deprecate the PyArrow expression support. This is what I intended with the current design, and I think it provides the most obvious migration paths for existing producers and consumers. 2. We keep the overall dataset API, but don't introduce the filter and projection arguments until we have Substrait support. I'm not sure what the migration path looks like for producers and consumers, but I think this just implicitly becomes the same as (1), but with worse documentation. 3. We write a protocol completely from scratch, that doesn't try to describe the existing dataset API. Producers and consumers would then migrate to use the new protocol and deprecate their existing dataset integrations. We could introduce a dunder method in that API (sort of like __arrow_array__) that would make the migration seamless from the end-user perspective. *Which do you all think is the best path forward?* Another concern I have is that we have not fully explained why we want > to use Dataset instead of RecordBatchReader [9] as the basis of this > protocol. I would like to see an explanation of why RecordBatchReader > is not sufficient for this. RecordBatchReader seems like another > possible way to represent "unmaterialized dataframes" and there are > some parallels between RecordBatch/RecordBatchReader and > Fragment/Dataset. > This is a good point. I can add a section describing the differences. The main ones I can think of are that: (1) Datasets are "pruneable": one can select a subset of columns and apply a filter on rows to avoid IO and (2) they are splittable and serializable, so that fragments can be distributed amongst processes / workers. Best, Will Jones On Fri, Jun 23, 2023 at 10:48 AM Ian Cook wrote: > Thanks Will for this proposal! > > For anyone familiar with PyArrow, this idea has a clear intuitive > logic to it. It provides an expedient solution to the current lack of > a practical means for interchanging "unmaterialized dataframes" > between different Python libraries. > > To elaborate on that: If you look at how people use the Arrow Dataset > API—which is implemented in the Arrow C++ library [1] and has bindings > not just for Python [2] but also for Java [3] and R [4]—you'll see > that Dataset is often used simply as a "virtual" variant of Table. It > is used in cases when the data is larger than memory or when it is > desirable to defer reading (materializing) the data into memory. > > So we can think of a Table as a materialized dataframe and a Dataset > as an unmaterialized dataframe. That aspect of Dataset is I think what > makes it most attractive as a protocol for enabling interoperability: > it allows libraries to easily "speak Arrow" in cases where > materializing the full data in memory upfront is impossible or > undesirable. > > The trouble is that Dataset was not designed to serve as a > general-purpose unmaterialized dataframe. For example, the PyArrow > Dataset constructor [5] exposes options for specifying a list of > source files and a partitioning scheme, which are irrelevant for many > of the applications that Will anticipates. And some work is needed to > reconcile the methods of the PyArrow Dataset object [6] with the > methods of the Table object. Some methods like filter() are exposed by > both and behave lazily on Datasets and eagerly on Tables, as a user > might expect. But many other Table methods are not implemented for > Dataset though they potentially could be, and it is unclear where we > should draw the line between adding methods to Dataset vs. encouraging > new scanner implementations to expose options controlling what lazy > operations should be performed as they see fit. > > Will, I see that you've already addressed this issue to some extent in > your proposal. For example, you mention that we should initially > define this protocol to include only a minimal subset of the Dataset > API. I agree, but I think there are some loose ends we should be > careful to tie up. I strongly agree with the comments made by David, > Weston, and Dewey arguing that we
[DISCUSS] Possibility of 12.0.2 release
Hi All, I recently became aware of CVE issue https://github.com/advisories/GHSA-6mjq-h674-j845 with the Java netty libraries and using the fixed Netty library in version 4.1.94.Final required a patch for Arrow, already merged in https://github.com/apache/arrow/issues/36209. I know the freeze for 13.0.0 is not too far away, but wanted to check about any interest for a 12.0.2 in the meantime and if there were any other pending issues that might make the minor release worthwhile? Thanks, Bryan
Re: [RESULT][VOTE] Release Apache Arrow nanoarrow 0.2.0 - RC1
Ok! Post-release tasks are complete. Thank you all! [x] Closed GitHub milestone [x] Added release to Apache Reporter System [x] Uploaded artifacts to Subversion [x] Created GitHub release [x] Submit R package to CRAN [x] Sent announcement to annou...@apache.org [x] Release blog post [2] [x] Removed old artifacts from SVN [x] Bumped versions on main [1] https://arrow.apache.org/blog/2023/06/22/nanoarrow-0.120-release/ On Fri, Jun 23, 2023 at 9:28 AM Dewey Dunnington wrote: > > Thanks for offering! Sorry for being slow to update the thread...David > Li ran the upload script yesterday. > > -dewey > > On Thu, Jun 22, 2023 at 11:59 PM Sutou Kouhei wrote: > > > > Hi, > > > > > I believe the upload step requires a PMC member to run the script > > > > I can do it. Can I run > > https://github.com/apache/arrow-nanoarrow/blob/main/dev/release/post-01-upload.sh > > ? > > > > > > Thanks, > > -- > > kou > > > > In > > "[RESULT][VOTE] Release Apache Arrow nanoarrow 0.2.0 - RC1" on Thu, 22 > > Jun 2023 16:05:50 -0300, > > Dewey Dunnington wrote: > > > > > Thank you everybody for verifying and voting! With 3 binding +1s and 3 > > > non-binding +1s, the vote passes! I have opened a PR to improve the > > > verification instructions (particularly on conda where most problems > > > occurred) [1]. > > > > > > Apache Arrow nanoarrow 0.2.0 has the following post-release tasks. I > > > believe the upload step requires a PMC member to run the script but > > > the rest I'm happy to take care of! > > > > > > [x] Closed GitHub milestone > > > [ ] Added release to Apache Reporter System > > > [ ] Uploaded artifacts to Subversion > > > [ ] Created GitHub release > > > [ ] Submit R package to CRAN > > > [ ] Sent announcement to annou...@apache.org > > > [ ] Release blog post [2] > > > [ ] Removed old artifacts from SVN > > > [ ] Bumped versions on main > > > > > > [1] https://github.com/apache/arrow-nanoarrow/pull/243 > > > [2] https://github.com/apache/arrow-site/pull/364
Re: [ANNOUNCE] New Arrow PMC member: Dewey Dunnington
Thank you everybody for the welcome! I'm honoured! On Fri, Jun 23, 2023 at 2:41 PM David Li wrote: > > Welcome Dewey! > > On Fri, Jun 23, 2023, at 13:37, Weston Pace wrote: > > Congrats Dewey! > > > > On Fri, Jun 23, 2023 at 9:00 AM Antoine Pitrou wrote: > > > >> > >> Welcome to the PMC Dewey! > >> > >> > >> Le 23/06/2023 à 16:59, Joris Van den Bossche a écrit : > >> > Congrats Dewey! > >> > > >> > On Fri, 23 Jun 2023 at 16:54, Jacob Wujciak-Jens > >> > wrote: > >> >> > >> >> Well deserved! Congratulations Dewey! > >> >> > >> >> Ian Cook schrieb am Fr., 23. Juni 2023, 16:32: > >> >> > >> >>> Congratulations Dewey! > >> >>> > >> >>> On Fri, Jun 23, 2023 at 10:03 AM Matt Topol > >> >>> wrote: > >> > >> Congrats Dewey!! > >> > >> On Fri, Jun 23, 2023, 9:35 AM Dane Pitkin > >> > >> wrote: > >> > >> > Congrats Dewey! > >> > > >> > On Fri, Jun 23, 2023 at 9:15 AM Nic Crane > >> wrote: > >> > > >> >> Well-deserved Dewey, congratulations! > >> >> > >> >> On Fri, 23 Jun 2023 at 11:53, Vibhatha Abeykoon >> > > >> >> wrote: > >> >> > >> >>> Congratulations Dewey! > >> >>> > >> >>> On Fri, Jun 23, 2023 at 4:16 PM Alenka Frim < > >> >>> ale...@voltrondata.com > >> >>> .invalid> > >> >>> wrote: > >> >>> > >> Congratulations Dewey!! > >> > >> On Fri, Jun 23, 2023 at 12:10 PM Raúl Cumplido < > >> > raulcumpl...@gmail.com > >> >>> > >> wrote: > >> > >> > Congratulations Dewey! > >> > > >> > El vie, 23 jun 2023, 11:55, Andrew Lamb > >> >>> escribió: > >> > > >> >> The Project Management Committee (PMC) for Apache Arrow has > >> > invited > >> >> Dewey Dunnington (paleolimbot) to become a PMC member and we > >> >>> are > >> pleased > >> > to > >> >> announce > >> >> that Dewey Dunnington has accepted. > >> >> > >> >> Congratulations and welcome! > >> >> > >> > > >> > >> >>> > >> >> > >> > > >> >>> > >>
[ANNOUNCE] Apache Arrow nanoarrow 0.2.0 Released
The Apache Arrow community is pleased to announce the 0.2.0 release of Apache Arrow nanoarrow. This initial release covers 19 resolved issues from 6 contributors[1]. The release is available now from [2]. Release notes are available at: https://github.com/apache/arrow-nanoarrow/blob/apache-arrow-nanoarrow-0.2.0/CHANGELOG.md What is Apache Arrow? - Apache Arrow is a columnar in-memory analytics layer designed to accelerate big data. It houses a set of canonical in-memory representations of flat and hierarchical data along with multiple language-bindings for structure manipulation. It also provides low-overhead streaming and batch messaging, zero-copy interprocess communication (IPC), and vectorized in-memory analytics libraries. Languages currently supported include C, C++, C#, Go, Java, JavaScript, Julia, MATLAB, Python, R, Ruby, and Rust. What is Apache Arrow nanoarrow? -- Apache Arrow nanoarrow is a small C library for building and interpreting Arrow C Data interface structures with bindings for users of the R programming language. The vision of nanoarrow is that it should be trivial for a library or application to implement an Arrow-based interface. The library provides helpers to create types, schemas, and metadata, an API for building arrays element-wise, and an API to extract elements element-wise from an array. For a more detailed description of the features nanoarrow provides and motivation for its development, see [3]. Please report any feedback to the mailing lists ([4], [5]). Regards, The Apache Arrow Community [1]: https://github.com/apache/arrow-nanoarrow/issues?q=is%3Aissue+milestone%3A%22nanoarrow+0.2.0%22+is%3Aclosed [2]: https://www.apache.org/dyn/closer.cgi/arrow/apache-arrow-nanoarrow-0.2.0 [3]: https://github.com/apache/arrow-nanoarrow [4]: https://lists.apache.org/list.html?u...@arrow.apache.org [5]: https://lists.apache.org/list.html?dev@arrow.apache.org
Re: [Python][Discuss] PyArrow Dataset as a Python protocol
Thanks Will for this proposal! For anyone familiar with PyArrow, this idea has a clear intuitive logic to it. It provides an expedient solution to the current lack of a practical means for interchanging "unmaterialized dataframes" between different Python libraries. To elaborate on that: If you look at how people use the Arrow Dataset API—which is implemented in the Arrow C++ library [1] and has bindings not just for Python [2] but also for Java [3] and R [4]—you'll see that Dataset is often used simply as a "virtual" variant of Table. It is used in cases when the data is larger than memory or when it is desirable to defer reading (materializing) the data into memory. So we can think of a Table as a materialized dataframe and a Dataset as an unmaterialized dataframe. That aspect of Dataset is I think what makes it most attractive as a protocol for enabling interoperability: it allows libraries to easily "speak Arrow" in cases where materializing the full data in memory upfront is impossible or undesirable. The trouble is that Dataset was not designed to serve as a general-purpose unmaterialized dataframe. For example, the PyArrow Dataset constructor [5] exposes options for specifying a list of source files and a partitioning scheme, which are irrelevant for many of the applications that Will anticipates. And some work is needed to reconcile the methods of the PyArrow Dataset object [6] with the methods of the Table object. Some methods like filter() are exposed by both and behave lazily on Datasets and eagerly on Tables, as a user might expect. But many other Table methods are not implemented for Dataset though they potentially could be, and it is unclear where we should draw the line between adding methods to Dataset vs. encouraging new scanner implementations to expose options controlling what lazy operations should be performed as they see fit. Will, I see that you've already addressed this issue to some extent in your proposal. For example, you mention that we should initially define this protocol to include only a minimal subset of the Dataset API. I agree, but I think there are some loose ends we should be careful to tie up. I strongly agree with the comments made by David, Weston, and Dewey arguing that we should avoid any use of PyArrow expressions in this API. Expressions are an implementation detail of PyArrow, not a part of the Arrow standard. It would be much safer for the initial version of this protocol to not define *any* methods/arguments that take expressions. This will allow us to take some more time to finish up the Substrait expression implementation work that is underway [7][8], then introduce Substrait-based expressions in a latter version of this protocol. This approach will better position this protocol to be implemented in other languages besides Python. Another concern I have is that we have not fully explained why we want to use Dataset instead of RecordBatchReader [9] as the basis of this protocol. I would like to see an explanation of why RecordBatchReader is not sufficient for this. RecordBatchReader seems like another possible way to represent "unmaterialized dataframes" and there are some parallels between RecordBatch/RecordBatchReader and Fragment/Dataset. We should help developers and users understand why Arrow needs both of these. Thanks Will for your thoughtful prose explanations about this proposed API. After we arrive at a decision about this, I think we should reproduce some of these explanations in docs, blog posts, cookbook recipes, etc. because there is some important nuance here that will be important for integrators of this API to understand. Ian [1] https://arrow.apache.org/docs/cpp/api/dataset.html [2] https://arrow.apache.org/docs/python/dataset.html [3] https://arrow.apache.org/docs/java/dataset.html [4] https://arrow.apache.org/docs/r/articles/dataset.html [5] https://arrow.apache.org/docs/python/generated/pyarrow.dataset.dataset.html#pyarrow.dataset.dataset [6] https://arrow.apache.org/docs/python/generated/pyarrow.dataset.Dataset.html [7] https://github.com/apache/arrow/issues/33985 [8] https://github.com/apache/arrow/issues/34252 [9] https://arrow.apache.org/docs/python/generated/pyarrow.RecordBatchReader.html On Wed, Jun 21, 2023 at 2:09 PM Will Jones wrote: > > Hello Arrow devs, > > I have drafted a PR defining an experimental protocol which would allow > third-party libraries to imitate the PyArrow Dataset API [5]. This protocol > is intended to endorse an integration pattern that is starting to be used > in the Python ecosystem, where some libraries are providing their own > scanners with this API, while query engines are accepting these as > duck-typed objects. > > To give some background: back at the end of 2021, we collaborated with > DuckDB to be able to read datasets (an Arrow C++ concept), supporting > column selection and filter pushdown. This was accomplished by having > DuckDB manipulating Python (or R) objects to get a
Re: [ANNOUNCE] New Arrow PMC member: Dewey Dunnington
Welcome Dewey! On Fri, Jun 23, 2023, at 13:37, Weston Pace wrote: > Congrats Dewey! > > On Fri, Jun 23, 2023 at 9:00 AM Antoine Pitrou wrote: > >> >> Welcome to the PMC Dewey! >> >> >> Le 23/06/2023 à 16:59, Joris Van den Bossche a écrit : >> > Congrats Dewey! >> > >> > On Fri, 23 Jun 2023 at 16:54, Jacob Wujciak-Jens >> > wrote: >> >> >> >> Well deserved! Congratulations Dewey! >> >> >> >> Ian Cook schrieb am Fr., 23. Juni 2023, 16:32: >> >> >> >>> Congratulations Dewey! >> >>> >> >>> On Fri, Jun 23, 2023 at 10:03 AM Matt Topol >> >>> wrote: >> >> Congrats Dewey!! >> >> On Fri, Jun 23, 2023, 9:35 AM Dane Pitkin >> >> wrote: >> >> > Congrats Dewey! >> > >> > On Fri, Jun 23, 2023 at 9:15 AM Nic Crane >> wrote: >> > >> >> Well-deserved Dewey, congratulations! >> >> >> >> On Fri, 23 Jun 2023 at 11:53, Vibhatha Abeykoon > > >> >> wrote: >> >> >> >>> Congratulations Dewey! >> >>> >> >>> On Fri, Jun 23, 2023 at 4:16 PM Alenka Frim < >> >>> ale...@voltrondata.com >> >>> .invalid> >> >>> wrote: >> >>> >> Congratulations Dewey!! >> >> On Fri, Jun 23, 2023 at 12:10 PM Raúl Cumplido < >> > raulcumpl...@gmail.com >> >>> >> wrote: >> >> > Congratulations Dewey! >> > >> > El vie, 23 jun 2023, 11:55, Andrew Lamb >> >>> escribió: >> > >> >> The Project Management Committee (PMC) for Apache Arrow has >> > invited >> >> Dewey Dunnington (paleolimbot) to become a PMC member and we >> >>> are >> pleased >> > to >> >> announce >> >> that Dewey Dunnington has accepted. >> >> >> >> Congratulations and welcome! >> >> >> > >> >> >>> >> >> >> > >> >>> >>
Re: [DISCUSS][Format][Flight] Result set expiration support
That sort of thing can be handled by the client, though (and note it says that the error is if the statement is closed, not finished). So it doesn't seem strictly necessary, though it would allow a client to express intent. On Fri, Jun 23, 2023, at 13:25, Weston Pace wrote: > One small difference seems to be that Close is idempotent and Cancel is not. > >> void cancel() >> throws SQLException >> >> Cancels this Statement object if both the DBMS and driver support > aborting an SQL statement. This method can be used by one thread to cancel > a statement that is being executed by another thread. >> >> Throws: >> SQLException - if a database access error occurs or this method is > called on a closed Statement > > In other words, with cancel, you can display an error to the user if the > statement is already finished (and thus was not able to be canceled). > However, I don't know if that is significant at all. > > On Fri, Jun 23, 2023 at 12:17 AM Sutou Kouhei wrote: > >> Hi, >> >> Thanks for sharing your thoughts. >> >> OK. I'll change the current specifications/implementations >> to the followings: >> >> * Remove CloseFlightInfo (if nobody objects it) >> * RefreshFlightEndpoint -> >> RenewFlightEndpoint >> * RenewFlightEndpoint(FlightEndpoint) -> >> RenewFlightEndpoint(RenewFlightEndpointRequest) >> * CancelFlightInfo(FlightInfo) -> >> CancelFlightInfo(CancelFlightInfoRequest) >> >> >> Thanks, >> -- >> kou >> >> In >> "Re: [DISCUSS][Format][Flight] Result set expiration support" on Thu, 22 >> Jun 2023 12:51:55 -0400, >> Matt Topol wrote: >> >> >> That said, I think it's reasonable to only have Cancel at the protocol >> > level. >> > >> > I'd be in favor of only having Cancel too. In theory calling Cancel on >> > something that has already completed should just be equivalent to calling >> > Close anyways rather than requiring a client to guess and call Close if >> > Cancel errors or something. >> > >> >> So this may not be needed for now. How about accepting a >> >> specific request message instead of FlightEndpoint directly >> >> as "PersistFlightEndpoint" input? >> > >> > I'm also in favor of this. >> > >> >> I think Refresh was fine, but if there's confusion, I like Kou's >> > suggestion of Renew the best. >> > >> > I'm in the same boat as David here, I think Refresh was fine but like the >> > suggestion of Renew best if we want to avoid any confusion. >> > >> > >> > >> > On Thu, Jun 22, 2023 at 2:55 AM Antoine Pitrou >> wrote: >> > >> >> >> >> Doesn't protobuf ensure forwards compatibility? Why would it break? >> >> >> >> At worse, you can include the changes necessary for it to compile >> >> cleanly, without adding support for the new fields/methods? >> >> >> >> >> >> Le 22/06/2023 à 02:16, Sutou Kouhei a écrit : >> >> > Hi, >> >> > >> >> > The following part in the original e-mail is the one: >> >> > >> >> >> https://github.com/apache/arrow/pull/36009 is an >> >> >> implementation of this proposal. The pull requests has the >> >> >> followings: >> >> >> >> >> >> 1. Format changes: >> >> >> * format/Flight.proto >> >> >> >> >> >> https://github.com/apache/arrow/pull/36009/files#diff-53b6c132dcc789483c879f667a1c675792b77aae9a056b257d6b20287bb09dba >> >> >> * format/FlightSql.proto >> >> >> >> >> >> https://github.com/apache/arrow/pull/36009/files#diff-fd4e5266a841a2b4196aadca76a4563b6770c91d400ee53b6235b96da628a01e >> >> >> >> >> >> 2. Documentation changes: >> >> >> docs/source/format/Flight.rst >> >> >> >> >> >> https://github.com/apache/arrow/pull/36009/files#diff-839518fb41e923de682e8587f0b6fdb00eb8f3361d360c2f7249284a136a7d89 >> >> > >> >> > We can split the part to a separated pull request. But if we >> >> > split the part and merge the pull requests for format >> >> > related changes and implementation related changes >> >> > separately, our CI will be broken temporary. Because our >> >> > implementations use auto-generated sources that are based on >> >> > *.proto. >> >> > >> >> > >> >> > Thanks, >> >> >>
Re: [ANNOUNCE] New Arrow PMC member: Dewey Dunnington
Congrats Dewey! On Fri, Jun 23, 2023 at 9:00 AM Antoine Pitrou wrote: > > Welcome to the PMC Dewey! > > > Le 23/06/2023 à 16:59, Joris Van den Bossche a écrit : > > Congrats Dewey! > > > > On Fri, 23 Jun 2023 at 16:54, Jacob Wujciak-Jens > > wrote: > >> > >> Well deserved! Congratulations Dewey! > >> > >> Ian Cook schrieb am Fr., 23. Juni 2023, 16:32: > >> > >>> Congratulations Dewey! > >>> > >>> On Fri, Jun 23, 2023 at 10:03 AM Matt Topol > >>> wrote: > > Congrats Dewey!! > > On Fri, Jun 23, 2023, 9:35 AM Dane Pitkin > > wrote: > > > Congrats Dewey! > > > > On Fri, Jun 23, 2023 at 9:15 AM Nic Crane > wrote: > > > >> Well-deserved Dewey, congratulations! > >> > >> On Fri, 23 Jun 2023 at 11:53, Vibhatha Abeykoon > > >> wrote: > >> > >>> Congratulations Dewey! > >>> > >>> On Fri, Jun 23, 2023 at 4:16 PM Alenka Frim < > >>> ale...@voltrondata.com > >>> .invalid> > >>> wrote: > >>> > Congratulations Dewey!! > > On Fri, Jun 23, 2023 at 12:10 PM Raúl Cumplido < > > raulcumpl...@gmail.com > >>> > wrote: > > > Congratulations Dewey! > > > > El vie, 23 jun 2023, 11:55, Andrew Lamb > >>> escribió: > > > >> The Project Management Committee (PMC) for Apache Arrow has > > invited > >> Dewey Dunnington (paleolimbot) to become a PMC member and we > >>> are > pleased > > to > >> announce > >> that Dewey Dunnington has accepted. > >> > >> Congratulations and welcome! > >> > > > > >>> > >> > > > >>> >
Re: [DISCUSS][Format][Flight] Result set expiration support
One small difference seems to be that Close is idempotent and Cancel is not. > void cancel() > throws SQLException > > Cancels this Statement object if both the DBMS and driver support aborting an SQL statement. This method can be used by one thread to cancel a statement that is being executed by another thread. > > Throws: > SQLException - if a database access error occurs or this method is called on a closed Statement In other words, with cancel, you can display an error to the user if the statement is already finished (and thus was not able to be canceled). However, I don't know if that is significant at all. On Fri, Jun 23, 2023 at 12:17 AM Sutou Kouhei wrote: > Hi, > > Thanks for sharing your thoughts. > > OK. I'll change the current specifications/implementations > to the followings: > > * Remove CloseFlightInfo (if nobody objects it) > * RefreshFlightEndpoint -> > RenewFlightEndpoint > * RenewFlightEndpoint(FlightEndpoint) -> > RenewFlightEndpoint(RenewFlightEndpointRequest) > * CancelFlightInfo(FlightInfo) -> > CancelFlightInfo(CancelFlightInfoRequest) > > > Thanks, > -- > kou > > In > "Re: [DISCUSS][Format][Flight] Result set expiration support" on Thu, 22 > Jun 2023 12:51:55 -0400, > Matt Topol wrote: > > >> That said, I think it's reasonable to only have Cancel at the protocol > > level. > > > > I'd be in favor of only having Cancel too. In theory calling Cancel on > > something that has already completed should just be equivalent to calling > > Close anyways rather than requiring a client to guess and call Close if > > Cancel errors or something. > > > >> So this may not be needed for now. How about accepting a > >> specific request message instead of FlightEndpoint directly > >> as "PersistFlightEndpoint" input? > > > > I'm also in favor of this. > > > >> I think Refresh was fine, but if there's confusion, I like Kou's > > suggestion of Renew the best. > > > > I'm in the same boat as David here, I think Refresh was fine but like the > > suggestion of Renew best if we want to avoid any confusion. > > > > > > > > On Thu, Jun 22, 2023 at 2:55 AM Antoine Pitrou > wrote: > > > >> > >> Doesn't protobuf ensure forwards compatibility? Why would it break? > >> > >> At worse, you can include the changes necessary for it to compile > >> cleanly, without adding support for the new fields/methods? > >> > >> > >> Le 22/06/2023 à 02:16, Sutou Kouhei a écrit : > >> > Hi, > >> > > >> > The following part in the original e-mail is the one: > >> > > >> >> https://github.com/apache/arrow/pull/36009 is an > >> >> implementation of this proposal. The pull requests has the > >> >> followings: > >> >> > >> >> 1. Format changes: > >> >> * format/Flight.proto > >> >> > >> > https://github.com/apache/arrow/pull/36009/files#diff-53b6c132dcc789483c879f667a1c675792b77aae9a056b257d6b20287bb09dba > >> >> * format/FlightSql.proto > >> >> > >> > https://github.com/apache/arrow/pull/36009/files#diff-fd4e5266a841a2b4196aadca76a4563b6770c91d400ee53b6235b96da628a01e > >> >> > >> >> 2. Documentation changes: > >> >> docs/source/format/Flight.rst > >> >> > >> > https://github.com/apache/arrow/pull/36009/files#diff-839518fb41e923de682e8587f0b6fdb00eb8f3361d360c2f7249284a136a7d89 > >> > > >> > We can split the part to a separated pull request. But if we > >> > split the part and merge the pull requests for format > >> > related changes and implementation related changes > >> > separately, our CI will be broken temporary. Because our > >> > implementations use auto-generated sources that are based on > >> > *.proto. > >> > > >> > > >> > Thanks, > >> >
Re: [ANNOUNCE] New Arrow PMC member: Dewey Dunnington
Welcome to the PMC Dewey! Le 23/06/2023 à 16:59, Joris Van den Bossche a écrit : Congrats Dewey! On Fri, 23 Jun 2023 at 16:54, Jacob Wujciak-Jens wrote: Well deserved! Congratulations Dewey! Ian Cook schrieb am Fr., 23. Juni 2023, 16:32: Congratulations Dewey! On Fri, Jun 23, 2023 at 10:03 AM Matt Topol wrote: Congrats Dewey!! On Fri, Jun 23, 2023, 9:35 AM Dane Pitkin wrote: Congrats Dewey! On Fri, Jun 23, 2023 at 9:15 AM Nic Crane wrote: Well-deserved Dewey, congratulations! On Fri, 23 Jun 2023 at 11:53, Vibhatha Abeykoon wrote: Congratulations Dewey! On Fri, Jun 23, 2023 at 4:16 PM Alenka Frim < ale...@voltrondata.com .invalid> wrote: Congratulations Dewey!! On Fri, Jun 23, 2023 at 12:10 PM Raúl Cumplido < raulcumpl...@gmail.com wrote: Congratulations Dewey! El vie, 23 jun 2023, 11:55, Andrew Lamb escribió: The Project Management Committee (PMC) for Apache Arrow has invited Dewey Dunnington (paleolimbot) to become a PMC member and we are pleased to announce that Dewey Dunnington has accepted. Congratulations and welcome!
Re: [ANNOUNCE] New Arrow PMC member: Dewey Dunnington
Congrats Dewey! On Fri, 23 Jun 2023 at 16:54, Jacob Wujciak-Jens wrote: > > Well deserved! Congratulations Dewey! > > Ian Cook schrieb am Fr., 23. Juni 2023, 16:32: > > > Congratulations Dewey! > > > > On Fri, Jun 23, 2023 at 10:03 AM Matt Topol > > wrote: > > > > > > Congrats Dewey!! > > > > > > On Fri, Jun 23, 2023, 9:35 AM Dane Pitkin > > > wrote: > > > > > > > Congrats Dewey! > > > > > > > > On Fri, Jun 23, 2023 at 9:15 AM Nic Crane wrote: > > > > > > > > > Well-deserved Dewey, congratulations! > > > > > > > > > > On Fri, 23 Jun 2023 at 11:53, Vibhatha Abeykoon > > > > > wrote: > > > > > > > > > > > Congratulations Dewey! > > > > > > > > > > > > On Fri, Jun 23, 2023 at 4:16 PM Alenka Frim < > > ale...@voltrondata.com > > > > > > .invalid> > > > > > > wrote: > > > > > > > > > > > > > Congratulations Dewey!! > > > > > > > > > > > > > > On Fri, Jun 23, 2023 at 12:10 PM Raúl Cumplido < > > > > raulcumpl...@gmail.com > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > Congratulations Dewey! > > > > > > > > > > > > > > > > El vie, 23 jun 2023, 11:55, Andrew Lamb > > > > > > escribió: > > > > > > > > > > > > > > > > > The Project Management Committee (PMC) for Apache Arrow has > > > > invited > > > > > > > > > Dewey Dunnington (paleolimbot) to become a PMC member and we > > are > > > > > > > pleased > > > > > > > > to > > > > > > > > > announce > > > > > > > > > that Dewey Dunnington has accepted. > > > > > > > > > > > > > > > > > > Congratulations and welcome! > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
Re: [ANNOUNCE] New Arrow PMC member: Dewey Dunnington
Well deserved! Congratulations Dewey! Ian Cook schrieb am Fr., 23. Juni 2023, 16:32: > Congratulations Dewey! > > On Fri, Jun 23, 2023 at 10:03 AM Matt Topol > wrote: > > > > Congrats Dewey!! > > > > On Fri, Jun 23, 2023, 9:35 AM Dane Pitkin > > wrote: > > > > > Congrats Dewey! > > > > > > On Fri, Jun 23, 2023 at 9:15 AM Nic Crane wrote: > > > > > > > Well-deserved Dewey, congratulations! > > > > > > > > On Fri, 23 Jun 2023 at 11:53, Vibhatha Abeykoon > > > > wrote: > > > > > > > > > Congratulations Dewey! > > > > > > > > > > On Fri, Jun 23, 2023 at 4:16 PM Alenka Frim < > ale...@voltrondata.com > > > > > .invalid> > > > > > wrote: > > > > > > > > > > > Congratulations Dewey!! > > > > > > > > > > > > On Fri, Jun 23, 2023 at 12:10 PM Raúl Cumplido < > > > raulcumpl...@gmail.com > > > > > > > > > > > wrote: > > > > > > > > > > > > > Congratulations Dewey! > > > > > > > > > > > > > > El vie, 23 jun 2023, 11:55, Andrew Lamb > > > > > escribió: > > > > > > > > > > > > > > > The Project Management Committee (PMC) for Apache Arrow has > > > invited > > > > > > > > Dewey Dunnington (paleolimbot) to become a PMC member and we > are > > > > > > pleased > > > > > > > to > > > > > > > > announce > > > > > > > > that Dewey Dunnington has accepted. > > > > > > > > > > > > > > > > Congratulations and welcome! > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
Re: [ANNOUNCE] New Arrow PMC member: Dewey Dunnington
Congratulations Dewey! On Fri, Jun 23, 2023 at 10:03 AM Matt Topol wrote: > > Congrats Dewey!! > > On Fri, Jun 23, 2023, 9:35 AM Dane Pitkin > wrote: > > > Congrats Dewey! > > > > On Fri, Jun 23, 2023 at 9:15 AM Nic Crane wrote: > > > > > Well-deserved Dewey, congratulations! > > > > > > On Fri, 23 Jun 2023 at 11:53, Vibhatha Abeykoon > > > wrote: > > > > > > > Congratulations Dewey! > > > > > > > > On Fri, Jun 23, 2023 at 4:16 PM Alenka Frim > > > .invalid> > > > > wrote: > > > > > > > > > Congratulations Dewey!! > > > > > > > > > > On Fri, Jun 23, 2023 at 12:10 PM Raúl Cumplido < > > raulcumpl...@gmail.com > > > > > > > > > wrote: > > > > > > > > > > > Congratulations Dewey! > > > > > > > > > > > > El vie, 23 jun 2023, 11:55, Andrew Lamb > > > > escribió: > > > > > > > > > > > > > The Project Management Committee (PMC) for Apache Arrow has > > invited > > > > > > > Dewey Dunnington (paleolimbot) to become a PMC member and we are > > > > > pleased > > > > > > to > > > > > > > announce > > > > > > > that Dewey Dunnington has accepted. > > > > > > > > > > > > > > Congratulations and welcome! > > > > > > > > > > > > > > > > > > > > > > > > > > >
Re: [ANNOUNCE] New Arrow PMC member: Dewey Dunnington
Congrats Dewey!! On Fri, Jun 23, 2023, 9:35 AM Dane Pitkin wrote: > Congrats Dewey! > > On Fri, Jun 23, 2023 at 9:15 AM Nic Crane wrote: > > > Well-deserved Dewey, congratulations! > > > > On Fri, 23 Jun 2023 at 11:53, Vibhatha Abeykoon > > wrote: > > > > > Congratulations Dewey! > > > > > > On Fri, Jun 23, 2023 at 4:16 PM Alenka Frim > > .invalid> > > > wrote: > > > > > > > Congratulations Dewey!! > > > > > > > > On Fri, Jun 23, 2023 at 12:10 PM Raúl Cumplido < > raulcumpl...@gmail.com > > > > > > > wrote: > > > > > > > > > Congratulations Dewey! > > > > > > > > > > El vie, 23 jun 2023, 11:55, Andrew Lamb > > > escribió: > > > > > > > > > > > The Project Management Committee (PMC) for Apache Arrow has > invited > > > > > > Dewey Dunnington (paleolimbot) to become a PMC member and we are > > > > pleased > > > > > to > > > > > > announce > > > > > > that Dewey Dunnington has accepted. > > > > > > > > > > > > Congratulations and welcome! > > > > > > > > > > > > > > > > > > > > >
Re: [ANNOUNCE] New Arrow PMC member: Dewey Dunnington
Congrats Dewey! On Fri, Jun 23, 2023 at 9:15 AM Nic Crane wrote: > Well-deserved Dewey, congratulations! > > On Fri, 23 Jun 2023 at 11:53, Vibhatha Abeykoon > wrote: > > > Congratulations Dewey! > > > > On Fri, Jun 23, 2023 at 4:16 PM Alenka Frim > .invalid> > > wrote: > > > > > Congratulations Dewey!! > > > > > > On Fri, Jun 23, 2023 at 12:10 PM Raúl Cumplido > > > > wrote: > > > > > > > Congratulations Dewey! > > > > > > > > El vie, 23 jun 2023, 11:55, Andrew Lamb > > escribió: > > > > > > > > > The Project Management Committee (PMC) for Apache Arrow has invited > > > > > Dewey Dunnington (paleolimbot) to become a PMC member and we are > > > pleased > > > > to > > > > > announce > > > > > that Dewey Dunnington has accepted. > > > > > > > > > > Congratulations and welcome! > > > > > > > > > > > > > > >
Re: [ANNOUNCE] New Arrow PMC member: Dewey Dunnington
Well-deserved Dewey, congratulations! On Fri, 23 Jun 2023 at 11:53, Vibhatha Abeykoon wrote: > Congratulations Dewey! > > On Fri, Jun 23, 2023 at 4:16 PM Alenka Frim .invalid> > wrote: > > > Congratulations Dewey!! > > > > On Fri, Jun 23, 2023 at 12:10 PM Raúl Cumplido > > wrote: > > > > > Congratulations Dewey! > > > > > > El vie, 23 jun 2023, 11:55, Andrew Lamb > escribió: > > > > > > > The Project Management Committee (PMC) for Apache Arrow has invited > > > > Dewey Dunnington (paleolimbot) to become a PMC member and we are > > pleased > > > to > > > > announce > > > > that Dewey Dunnington has accepted. > > > > > > > > Congratulations and welcome! > > > > > > > > > >
Re: [RESULT][VOTE] Release Apache Arrow nanoarrow 0.2.0 - RC1
Thanks for offering! Sorry for being slow to update the thread...David Li ran the upload script yesterday. -dewey On Thu, Jun 22, 2023 at 11:59 PM Sutou Kouhei wrote: > > Hi, > > > I believe the upload step requires a PMC member to run the script > > I can do it. Can I run > https://github.com/apache/arrow-nanoarrow/blob/main/dev/release/post-01-upload.sh > ? > > > Thanks, > -- > kou > > In > "[RESULT][VOTE] Release Apache Arrow nanoarrow 0.2.0 - RC1" on Thu, 22 Jun > 2023 16:05:50 -0300, > Dewey Dunnington wrote: > > > Thank you everybody for verifying and voting! With 3 binding +1s and 3 > > non-binding +1s, the vote passes! I have opened a PR to improve the > > verification instructions (particularly on conda where most problems > > occurred) [1]. > > > > Apache Arrow nanoarrow 0.2.0 has the following post-release tasks. I > > believe the upload step requires a PMC member to run the script but > > the rest I'm happy to take care of! > > > > [x] Closed GitHub milestone > > [ ] Added release to Apache Reporter System > > [ ] Uploaded artifacts to Subversion > > [ ] Created GitHub release > > [ ] Submit R package to CRAN > > [ ] Sent announcement to annou...@apache.org > > [ ] Release blog post [2] > > [ ] Removed old artifacts from SVN > > [ ] Bumped versions on main > > > > [1] https://github.com/apache/arrow-nanoarrow/pull/243 > > [2] https://github.com/apache/arrow-site/pull/364
Re: [ANNOUNCE] New Arrow PMC member: Dewey Dunnington
Congratulations Dewey! On Fri, Jun 23, 2023 at 4:16 PM Alenka Frim wrote: > Congratulations Dewey!! > > On Fri, Jun 23, 2023 at 12:10 PM Raúl Cumplido > wrote: > > > Congratulations Dewey! > > > > El vie, 23 jun 2023, 11:55, Andrew Lamb escribió: > > > > > The Project Management Committee (PMC) for Apache Arrow has invited > > > Dewey Dunnington (paleolimbot) to become a PMC member and we are > pleased > > to > > > announce > > > that Dewey Dunnington has accepted. > > > > > > Congratulations and welcome! > > > > > >
Re: [ANNOUNCE] New Arrow PMC member: Dewey Dunnington
Congratulations Dewey!! On Fri, Jun 23, 2023 at 12:10 PM Raúl Cumplido wrote: > Congratulations Dewey! > > El vie, 23 jun 2023, 11:55, Andrew Lamb escribió: > > > The Project Management Committee (PMC) for Apache Arrow has invited > > Dewey Dunnington (paleolimbot) to become a PMC member and we are pleased > to > > announce > > that Dewey Dunnington has accepted. > > > > Congratulations and welcome! > > >
Re: [ANNOUNCE] New Arrow PMC member: Dewey Dunnington
Congratulations Dewey! El vie, 23 jun 2023, 11:55, Andrew Lamb escribió: > The Project Management Committee (PMC) for Apache Arrow has invited > Dewey Dunnington (paleolimbot) to become a PMC member and we are pleased to > announce > that Dewey Dunnington has accepted. > > Congratulations and welcome! >
[ANNOUNCE] New Arrow PMC member: Dewey Dunnington
The Project Management Committee (PMC) for Apache Arrow has invited Dewey Dunnington (paleolimbot) to become a PMC member and we are pleased to announce that Dewey Dunnington has accepted. Congratulations and welcome!
Re: [DISCUSS][Format][Flight] Result set expiration support
Hi, Thanks for sharing your thoughts. OK. I'll change the current specifications/implementations to the followings: * Remove CloseFlightInfo (if nobody objects it) * RefreshFlightEndpoint -> RenewFlightEndpoint * RenewFlightEndpoint(FlightEndpoint) -> RenewFlightEndpoint(RenewFlightEndpointRequest) * CancelFlightInfo(FlightInfo) -> CancelFlightInfo(CancelFlightInfoRequest) Thanks, -- kou In "Re: [DISCUSS][Format][Flight] Result set expiration support" on Thu, 22 Jun 2023 12:51:55 -0400, Matt Topol wrote: >> That said, I think it's reasonable to only have Cancel at the protocol > level. > > I'd be in favor of only having Cancel too. In theory calling Cancel on > something that has already completed should just be equivalent to calling > Close anyways rather than requiring a client to guess and call Close if > Cancel errors or something. > >> So this may not be needed for now. How about accepting a >> specific request message instead of FlightEndpoint directly >> as "PersistFlightEndpoint" input? > > I'm also in favor of this. > >> I think Refresh was fine, but if there's confusion, I like Kou's > suggestion of Renew the best. > > I'm in the same boat as David here, I think Refresh was fine but like the > suggestion of Renew best if we want to avoid any confusion. > > > > On Thu, Jun 22, 2023 at 2:55 AM Antoine Pitrou wrote: > >> >> Doesn't protobuf ensure forwards compatibility? Why would it break? >> >> At worse, you can include the changes necessary for it to compile >> cleanly, without adding support for the new fields/methods? >> >> >> Le 22/06/2023 à 02:16, Sutou Kouhei a écrit : >> > Hi, >> > >> > The following part in the original e-mail is the one: >> > >> >> https://github.com/apache/arrow/pull/36009 is an >> >> implementation of this proposal. The pull requests has the >> >> followings: >> >> >> >> 1. Format changes: >> >> * format/Flight.proto >> >> >> https://github.com/apache/arrow/pull/36009/files#diff-53b6c132dcc789483c879f667a1c675792b77aae9a056b257d6b20287bb09dba >> >> * format/FlightSql.proto >> >> >> https://github.com/apache/arrow/pull/36009/files#diff-fd4e5266a841a2b4196aadca76a4563b6770c91d400ee53b6235b96da628a01e >> >> >> >> 2. Documentation changes: >> >> docs/source/format/Flight.rst >> >> >> https://github.com/apache/arrow/pull/36009/files#diff-839518fb41e923de682e8587f0b6fdb00eb8f3361d360c2f7249284a136a7d89 >> > >> > We can split the part to a separated pull request. But if we >> > split the part and merge the pull requests for format >> > related changes and implementation related changes >> > separately, our CI will be broken temporary. Because our >> > implementations use auto-generated sources that are based on >> > *.proto. >> > >> > >> > Thanks, >>
Re: [DISCUSS][Format][Flight] Result set expiration support
Hi, Could someone who is familiar with JDBC explain the behavior of cancel/close for UPDATE/INSERT case? If we can't find any useful use-case for providing both of cancel and close, we'll provide only cancel in this proposal. If we find an useful use-case for it, we can add close later. Thanks, -- kou In "Re: [DISCUSS][Format][Flight] Result set expiration support" on Wed, 21 Jun 2023 15:53:54 +0200, Antoine Pitrou wrote: > > Ah... in JDBC, if the statement is something like an UPDATE or INSERT, > than cancelling the statement is not the same thing as closing the > result set? The latter would probably just discard the result set but > still commit the results? > > The problem is that Flight RPC doesn't have separate notions of > queries and results sets... > > > Le 21/06/2023 à 15:49, David Li a écrit : >> There is a PR linked in the original message, but here it is again: >> https://github.com/apache/arrow/pull/36009 >> Cancel and Close are close semantically, but Cancel is meant for when >> the (client thinks that) computation is still ongoing, while Close is >> meant to free server resources after reading a result set. (For >> example, JDBC has Statement#cancel [1] and ResultSet#close [2].) >> That said, I think it's reasonable to only have Cancel at the protocol >> level. >> [1]: >> https://docs.oracle.com/javase/8/docs/api/java/sql/Statement.html#cancel-- >> [2]: >> https://docs.oracle.com/javase/8/docs/api/java/sql/ResultSet.html#close-- >> On Wed, Jun 21, 2023, at 09:35, Antoine Pitrou wrote: >>> Hi Kou, >>> >>> Can we have an actual PR with the proposed gRPC field, method and >>> docstring additions? >>> >>> Regardless, I have some comments and questions: >>> >>> * "RefreshFlightEndpoint" suggests the server will recompute (refresh) >>> the results; instead I would suggest "PersistFlightEndpoint" >>> >>> * Perhaps "PersistFlightEndpoint" can take an optional >>> "suggested_expiration" timestamp, which the server is free to ignore >>> (some clients may only need to extend the expiration by two minutes, >>> others by two days...) >>> >>> * Does the client potentially have to call "PersistFlightEndpoint" on >>> each returned endpoint? Can it pass several endpoints at once? >>> >>> * What is the expected difference between "CancelFlightInfo" and >>> "CloseFlightInfo"? Both seem to have a similar effect, and the exact >>> behaviour will probably be server-dependent anyway ("cancel" and >>> "close" >>> may have meaningful differences when putting/uploading data, not so >>> much >>> when getting/downloading data, IMHO?). >>> >>> Regards >>> >>> Antoine. >>> >>> >>> >>> Le 21/06/2023 à 02:28, Sutou Kouhei a écrit : Hi, David provided the Java implementation. Thanks! If anyone has any comments about this proposal, please share them. Thanks,
Re: [DISCUSS][Format][Flight] Result set expiration support
Hi, Sorry. I was wrong. I tried it locally and got no build error. We added "deprecated" metadata in this case. So I thought that we get some deprecated warnings and they are treated as errors in CI. > At worse, you can include the changes necessary for it to compile > cleanly, without adding support for the new fields/methods? Why do we want to split format/ changes even when we require additional changes? Easy to review? I can understand that we can review specification changes without implementations. But some problems may be found by implementing the specification changes. I think that this is the reason why we require at least two reference implementations to change our specifications. So I think that we should not split specification changes and their implementations without a reasonable reason. If we should review/merge specification changes and then review/merge their implementations, how about updating our changing process? https://arrow.apache.org/docs/dev/format/Changing.html Thanks, -- kou In "Re: [DISCUSS][Format][Flight] Result set expiration support" on Thu, 22 Jun 2023 08:55:33 +0200, Antoine Pitrou wrote: > > Doesn't protobuf ensure forwards compatibility? Why would it break? > > At worse, you can include the changes necessary for it to compile > cleanly, without adding support for the new fields/methods? > > > Le 22/06/2023 à 02:16, Sutou Kouhei a écrit : >> Hi, >> The following part in the original e-mail is the one: >> >>> https://github.com/apache/arrow/pull/36009 is an >>> implementation of this proposal. The pull requests has the >>> followings: >>> >>> 1. Format changes: >>> * format/Flight.proto >>> >>> https://github.com/apache/arrow/pull/36009/files#diff-53b6c132dcc789483c879f667a1c675792b77aae9a056b257d6b20287bb09dba >>> * format/FlightSql.proto >>> >>> https://github.com/apache/arrow/pull/36009/files#diff-fd4e5266a841a2b4196aadca76a4563b6770c91d400ee53b6235b96da628a01e >>> >>> 2. Documentation changes: >>> docs/source/format/Flight.rst >>> >>> https://github.com/apache/arrow/pull/36009/files#diff-839518fb41e923de682e8587f0b6fdb00eb8f3361d360c2f7249284a136a7d89 >> We can split the part to a separated pull request. But if we >> split the part and merge the pull requests for format >> related changes and implementation related changes >> separately, our CI will be broken temporary. Because our >> implementations use auto-generated sources that are based on >> *.proto. >> Thanks,