Re: Unsupported/Other Type

2024-04-10 Thread Wes McKinney
In the past we have discussed adding a canonical type for UUID and JSON. I still think this is a good idea and could improve ergonomics in downstream language bindings (e.g. by exposing JSON querying function or automatically boxing UUIDs in built-in UUID types, like the Python uuid library). Has

Re: [ANNOUNCE] New Committer Joel Lubinitsky

2024-04-01 Thread Wes McKinney
Congrats! On Mon, Apr 1, 2024 at 11:01 AM Andrew Lamb wrote: > Congratulations Joel. > > On Mon, Apr 1, 2024 at 11:53 AM Raúl Cumplido > wrote: > > > Congratulations and welcome Joel! > > > > > > El lun, 1 abr 2024, 17:18, Kevin Gurney > > escribió: > > > > > Congratulations, Joel! > > > > >

Re: [ANNOUNCE] New Arrow committer: Bryce Mecum

2024-03-18 Thread Wes McKinney
Congrats! On Mon, Mar 18, 2024 at 12:15 PM James Duong wrote: > Congratulations Bryce! > > From: Dane Pitkin > Date: Monday, March 18, 2024 at 7:28 AM > To: dev@arrow.apache.org > Subject: Re: [ANNOUNCE] New Arrow committer: Bryce Mecum > Congratulations, Bryce!! > > On Mon, Mar 18, 2024 at

Re: [VOTE] Move Arrow DataFusion Subproject to new Top Level Apache Project

2024-03-01 Thread Wes McKinney
at the office of "Vice President, Apache DataFusion" be > > >> > and hereby is created, the person holding such office to > > >> > serve at the direction of the Board of Directors as the chair > > >> > of the Apache DataFusion Project, and to have primary respons

Re: [DISCUSS][RFC] Draft Proposal for new Top Level Project for DataFusion

2024-02-28 Thread Wes McKinney
I'd be happy to help. I think we will have to participate in PMC matters infrequently (should there be a difficult issue in the future, we could offer some perspective from cases in the past). On Wed, Feb 28, 2024 at 2:13 PM Andrew Lamb wrote: > Wes brought up a great point on the document[1]

Re: [VOTE] Protocol for Dissociated Arrow IPC Transports

2024-02-27 Thread Wes McKinney
Have there been efforts to proactively reach out to other third parties that might have an interest in this or be a potential user at some point? There are a lot of interested parties in Arrow that may not actively follow the mailing list. Seems like folks from the Dask, Ray, RAPIDS (especially

Re: [DISCUSS] Donation of a Spark native engine based on DataFusion & Arrow

2024-02-11 Thread Wes McKinney
Congrats all! It's great to see the Arrow+DataFusion ecosystem expand in this way and to bring the work under the ASF umbrella. On Sun, Feb 11, 2024 at 5:02 AM Andrew Lamb wrote: > As a follow up here the acceptance vote [1] has passed, the IP Clearance > Process is complete [2] and the code PR

Re: [DISCUSS] Status and future of @ApacheArrow Twitter account

2024-01-29 Thread Wes McKinney
Is there a different tool other than TweetDeck available that can synchronize posts that go out on different social channels (LinkedIn, Twitter, Mastodon, etc.)? I've heard of things like Hootsuite but that's pretty expensive and definitely overkill for an open source project, but perhaps there is

Re: [VOTE] Accept donation of Comet Spark native engine

2024-01-27 Thread Wes McKinney
+1 (binding) On Sat, Jan 27, 2024 at 12:26 PM Micah Kornfield wrote: > +1 Binding > > On Sat, Jan 27, 2024 at 10:21 AM David Li wrote: > > > +1 (binding) > > > > On Sat, Jan 27, 2024, at 13:03, L. C. Hsieh wrote: > > > +1 (binding) > > > > > > On Sat, Jan 27, 2024 at 8:10 AM Andrew Lamb > >

Re: [DISCUSS] Conventions for transporting Arrow data over HTTP

2024-01-08 Thread Wes McKinney
hi all — I was just catching up on e-mail threads and wanted to give a few historical comments on this. When we were assembling the Arrow PMC and committing to do the project in 2015, standardizing Arrow-over-REST was always something that was on the TODO list — at that time we didn't have the

Re: CIDR 2024

2023-12-05 Thread Wes McKinney
I will also be there. On Mon, Dec 4, 2023 at 12:58 PM Tony Wang wrote: > I am > > Get Outlook for Android > > From: Curt Hagenlocher > Sent: Monday, December 4, 2023 12:53:00 PM > To: dev@arrow.apache.org > Subject: CIDR 2024 > > Who's

Re: [DISCUSS][C++] Raw pointer string views

2023-09-28 Thread Wes McKinney
hi all, I'm just catching up on this thread after having taken a look at the format PRs, the C++ implementation PR, and this e-mail thread. So only my $0.02 from having spent a great deal less time on this project than others. The original motivation I had for bringing up the idea of adding the

Re: Apache Arrow filesystem question

2022-10-27 Thread Wes McKinney
I definitely think it would be a good thing to have a C++ ADLS filesystem interface that is on par in quality with our S3 and GCS C++ interfaces — these should also provide material performance benefits to Python users over a pure-Python interface (I'm not sure if pyarrow's S3 interface via C++

Re: [ANNOUNCE] New Arrow PMC member: Nicola Crane

2022-10-27 Thread Wes McKinney
Congratulations! On Wed, Oct 26, 2022 at 4:56 PM Ian Joiner wrote: > > Congrats Nic! > > Ian > > On Tuesday, October 25, 2022, Sutou Kouhei wrote: > > > The Project Management Committee (PMC) for Apache Arrow has invited > > Nicola Crane to become a PMC member and we are pleased to announce > >

Re: [DISCUSS] Python Wheel Size

2022-10-10 Thread Wes McKinney
We've discussed this in the past, I think. In addition to having many optional components enabled, the pyarrow wheel also includes the unit tests directory which is of growing size. I think if we made a pyarrow-slim wheel with support only for core Arrow (IPC, etc.) and Parquet file reading, it

Re: [Discuss] Deprecating Plasma

2022-09-26 Thread Wes McKinney
+1 On Thu, Sep 22, 2022 at 11:59 PM Sutou Kouhei wrote: > > +1 > > In > "[Discuss] Deprecating Plasma" on Thu, 22 Sep 2022 17:38:27 +0200, > Antoine Pitrou wrote: > > > > > Hello, > > > > The Plasma object store (*) hasn't received significant maintenance > > since at least 2020. The

Re: [ANNOUNCE] New Arrow PMC member: Raphael Taylor-Davies

2022-09-20 Thread Wes McKinney
Congratulations! On Tue, Sep 20, 2022 at 12:37 PM Ashish wrote: > > Congratulations !! > > On Tue, Sep 20, 2022 at 10:17 AM Ian Joiner wrote: > > > Congrats Raphael! > > > > On Mon, Sep 19, 2022 at 9:56 PM Sutou Kouhei wrote: > > > > > The Project Management Committee (PMC) for Apache Arrow

Re: [ANNOUNCE] New Arrow committer: Remzi Yang

2022-09-10 Thread Wes McKinney
Congratulations! On Sat, Sep 10, 2022 at 7:12 AM Andrew Lamb wrote: > > On behalf of the Arrow PMC, I'm happy to announce that Remzi Yang > has accepted an invitation to become a committer on Apache > Arrow. Welcome, and thank you for your contributions! > > Andrew

Re: [VOTE] Substrait for Flight SQL

2022-09-09 Thread Wes McKinney
+1 (binding) On Thu, Sep 8, 2022 at 9:12 PM Jacques Nadeau wrote: > > My vote continues to be +1 > > On Thu, Sep 8, 2022 at 11:44 AM Neal Richardson > wrote: > > > +1 > > > > Neal > > > > On Thu, Sep 8, 2022 at 2:15 PM Ashish wrote: > > > > > +1 (non-binding) > > > > > > On Thu, Sep 8, 2022 at

Re: DISCUSS: [Format] Rules and procedures for Canonical extension types

2022-09-08 Thread Wes McKinney
+1 to this proposal. It would be great to use the JSON type as a crash dummy to work out the kinks in the process, but I think there are meaningful benefits (Parquet round-tripping) to getting this work under way. On Wed, Aug 24, 2022 at 11:22 AM Antoine Pitrou wrote: > > > Le 17/08/2022 à

Re: Arrow Flight usage with graph databases

2022-09-08 Thread Wes McKinney
hi Bill — you can unsubscribe by e-mailing dev-unsubscr...@arrow.apache.org On Tue, Sep 6, 2022 at 2:40 PM Bill Zhao wrote: > > unsubscribe > > Valentyn Kahamlyk 于2022年7月18日周一 16:56写道: > > > > Hi All, > > > > I'm investigating the possibility of using Arrow Flight with graph > > databases, and

Re: [ANNOUNCE] New Arrow PMC member: Weston Pace

2022-09-08 Thread Wes McKinney
Congrats Weston!! On Tue, Sep 6, 2022 at 8:21 PM Krisztián Szűcs wrote: > > Congrats Weston! > > On Wed, Sep 7, 2022 at 1:41 AM Percy Camilo Triveño Aucahuasi > wrote: > > > > Great news! Congratulations Weston! > > > > On Tue, Sep 6, 2022 at 1:42 PM Andy Grove wrote: > > > > > Congrats

Re: [ANNOUNCE] New Arrow PMC member: L. C. Hsieh

2022-09-07 Thread Wes McKinney
Congrats! On Mon, Sep 5, 2022 at 2:05 PM Raul Cumplido Dominguez wrote: > > Congratulations! > > El lun, 5 sept 2022, 20:05, Ian Joiner escribió: > > > Congrats L.C.! > > > > On Sat, Sep 3, 2022 at 5:39 PM Sutou Kouhei wrote: > > > > > The Project Management Committee (PMC) for Apache Arrow

Re: [C++] Read Flight data source into Acero

2022-09-07 Thread Wes McKinney
This seems like something where there should be ready-to-go code in the Arrow codebase to feed any RecordBatchReader into Acero On Thu, Aug 18, 2022 at 12:15 PM Li Jin wrote: > > Thanks all. I will try this out. > > On Thu, Aug 18, 2022 at 9:06 AM Rok Mihevc wrote: > > > +1 for adding this

Re: Apache Software Foundation community survey 2022

2022-09-06 Thread Wes McKinney
hi Antoine — thank you for circulating this survey. Even though it takes a few minutes to complete I encourage community members to take the time to participate since data about community participation helps the ASF do better in the future. Thanks, Wes On Thu, Aug 25, 2022 at 2:10 AM Antoine

Re: [C++] Purpose of C++ bundled dependencies

2022-08-05 Thread Wes McKinney
The current libarrow_bundled_dependencies.a was created to address the problem of libarrow.a being "useless" (unable to be used to link with application code) if any dependencies were built by the Arrow build system (notably: this the case when using the default allocator jemalloc). I'm not sure

Re: [DISCUSS][Format] Starting to do some concrete work on the new "StringView" columnar data type

2022-08-05 Thread Wes McKinney
e at the very least some intermediate copies can be > skipped. > > Thanks, > Gosh > > On Tue, Aug 2, 2022, 2:49 PM Wes McKinney wrote: > > > On Tue, Aug 2, 2022 at 1:02 AM Antoine Pitrou wrote: > > > > > > > > > Le 01/08/2022 à 19:13, Wes McKinney a é

Re: [FlightSQL][JDBC] Additional changes to the JDBC driver

2022-08-05 Thread Wes McKinney
If you want to merge the cleared IP into a new branch rather than master, that is fine, too. It's not necessary to land it in the main branch On Tue, Aug 2, 2022 at 4:18 PM David Li wrote: > > Would it be OK to get what's there into the main branch first? i.e., open a > PR from the

Re: [DISCUSS][Format] Starting to do some concrete work on the new "StringView" columnar data type

2022-08-02 Thread Wes McKinney
On Tue, Aug 2, 2022 at 1:02 AM Antoine Pitrou wrote: > > > Le 01/08/2022 à 19:13, Wes McKinney a écrit : > > > > If we start placing restrictions on how the out-of-line string buffers > > are managed and externalized, it risks undermining the zero-copy > > int

Re: [ARROW-17255] Logical JSON type in Arrow

2022-08-02 Thread Wes McKinney
I should add that since Parquet has JSON, BSON, and UUID types, that while UUID is just a simple fixed sized binary, that having the extension types so that the metadata flows through accurately to Parquet would be net beneficial:

Re: [DISCUSS][Format] Starting to do some concrete work on the new "StringView" columnar data type

2022-08-01 Thread Wes McKinney
On Sun, Jul 31, 2022 at 8:05 AM Antoine Pitrou wrote: > > > Hi Wes, > > Le 31/07/2022 à 00:02, Wes McKinney a écrit : > > > > I understand there are still some aspects of this project that cause > > some squeamishness (like having arbitrary memory addresses embed

[DISCUSS][Format] Starting to do some concrete work on the new "StringView" columnar data type

2022-07-30 Thread Wes McKinney
hi folks, I'm interested to start doing some work to implement the "StringView" memory layout that we previously discussed late last year [1] with supporting document [2]. Since there's quite a few details to work out, my objective would be to do the work in a feature branch focused on a few

Re: [DISCUSS][Format] Dynamic data encodings in the IPC format and C ABI

2022-07-30 Thread Wes McKinney
, so if we added a new batch type allowing for encodings, sparseness, etc., then we would need to bump the MetadataVersion to V6, but libraries implementing V6 metadata should be able to operate in V5 compatibility mode (sending non-encoded data in the current IPC format). > > [1] > ht

[DISCUSS][Format] Dynamic data encodings in the IPC format and C ABI

2022-07-29 Thread Wes McKinney
hi all, Since we've been recently discussing adding new data types, memory formats, or data encodings to Arrow, I wanted to bring up a more "big picture" question around how we could support data whose encodings may change throughout the lifetime of a data stream sent via the IPC format (e.g.

Re: [ARROW-17255] Logical JSON type in Arrow

2022-07-29 Thread Wes McKinney
a > > colleague of Padeep's) > > > > [1] https://arrow.apache.org/docs/format/Columnar.html#extension-types > > > > > > On Fri, Jul 29, 2022 at 3:19 PM Wes McKinney wrote: > > > > > This seems like a common-enough data type that having a f

Re: [ARROW-17255] Logical JSON type in Arrow

2022-07-29 Thread Wes McKinney
This seems like a common-enough data type that having a first-class logical type would be a good idea (perhaps even more so than UUID!). Compute engines would be able to implement kernels that provide manipulations of JSON data similar to what you can do with jq or GraphQL. On Fri, Jul 29, 2022

Re: [proposal] Arrow Intermediate Representation to facilitate the transformation of row-oriented data sources into Arrow columnar representation

2022-07-27 Thread Wes McKinney
We had an e-mail thread about this in 2018 https://lists.apache.org/thread/35pn7s8yzxozqmgx53ympxg63vjvggvm I still think having a canonical in-memory row format (and libraries to transform to and from Arrow columnar format) is a good idea — but there is the risk of ending up in the tar pit of

Re: Help needed with PR #13659: Fixing build/unit test issues in msvc/win32

2022-07-25 Thread Wes McKinney
Suppressing the warnings on 32-bit MSVC sounds like a reasonable compromise. Is there an open PR for this (and what is the corresponding Jira issue so we don't lose track of it)? On Fri, Jul 22, 2022 at 1:23 PM Arkadiy Vertleyb (BLOOMBERG/ 120 PARK) wrote: > > Or live with the warnings. Or cast

Re: Proposal: renaming the 'master' branch to 'main'

2022-07-25 Thread Wes McKinney
hi all, Do you think we could make a push to make this happen after the 9.0.0 release goes out? Thanks Wes On Tue, Feb 15, 2022 at 2:32 PM Fiona La wrote: > > Thank you Antoine for bringing up the engineering work that is required to > enable this. And thank you Neal for sharing the link to

Re: [C++] Moving from -O3 to -O2 optimization level in release builds

2022-07-21 Thread Wes McKinney
vely that > can be demonstrated to benefit from it (if anyone actually spends the time to > look into it). > > Sasha > > > On Jul 20, 2022, at 2:10 PM, Wes McKinney wrote: > > > > hi all, > > > > Antoine and I were digging into a weird issue where gcc i

[C++] Moving from -O3 to -O2 optimization level in release builds

2022-07-20 Thread Wes McKinney
hi all, Antoine and I were digging into a weird issue where gcc in -O3 generated ~40KB of optimized code for a function which was less than 2KB in -O2, and where a "leaner" implementation (in PR 13654) was yet faster and smaller. You can see some of the discussion at

Re: [C++] Help with Parquet backward compatibility regression between 2.0.0 and 3.0.0

2022-07-18 Thread Wes McKinney
On Mon, Jul 18, 2022 at 2:35 AM Antoine Pitrou wrote: > > > Le 18/07/2022 à 03:54, Wes McKinney a écrit : > > This patch caused Parquet files written with 2.0.0 to be unreadable in > > 3.0.0 onward > > > > https://github.com/apache/arrow/commit/ef0feb2

Re: Problem reading parquet written with pyarrow=2.0.0 using pyarrow=8.0.0 (when using use_dictionary with ParquetWriter)

2022-07-17 Thread Wes McKinney
hi -- I git-bisected and found the backwards-compat regression, and reported here https://issues.apache.org/jira/browse/ARROW-17100 On Wed, Jul 6, 2022 at 4:16 PM Wes McKinney wrote: > > hi — did you ever resolve this issue? We should try to identify what > is causing this failur

Re: [C++] Help with Parquet backward compatibility regression between 2.0.0 and 3.0.0

2022-07-17 Thread Wes McKinney
Jira issue for this: https://issues.apache.org/jira/browse/ARROW-17100 On Sun, Jul 17, 2022 at 8:54 PM Wes McKinney wrote: > > This patch caused Parquet files written with 2.0.0 to be unreadable in > 3.0.0 onward > > https://github.com/apache/arrow/commit/ef0feb2c9c959681d8a105cba

[C++] Help with Parquet backward compatibility regression between 2.0.0 and 3.0.0

2022-07-17 Thread Wes McKinney
This patch caused Parquet files written with 2.0.0 to be unreadable in 3.0.0 onward https://github.com/apache/arrow/commit/ef0feb2c9c959681d8a105cbadc1ae6580789e69 This was reported on June 14 on dev@ and I git-bisected to the root cause:

Re: [C++] Adding Run-Length Encoding to Arrow

2022-07-08 Thread Wes McKinney
l/13330 > >> Encode/Decode functions for (currently fixed width types only) > >> > >> - https://github.com/apache/arrow/pull/1 > >> For updating docs > >> > >> Best, > >> Tobias > >> > >> Am Dienstag, dem 31.05

Re: Problem reading parquet written with pyarrow=2.0.0 using pyarrow=8.0.0 (when using use_dictionary with ParquetWriter)

2022-07-06 Thread Wes McKinney
hi — did you ever resolve this issue? We should try to identify what is causing this failure and see if it can be fixed for the 9.0.0 release. On Tue, Jun 14, 2022 at 8:18 AM Niklas Bivald wrote: > > Hi, > > I’m experiencing problem reading parquet files written with the > `use_dictionary=[]`

Re: Existence/name/scope for minimal C/C++ Arrow C Data interface helpers

2022-07-06 Thread Wes McKinney
pendency-free library to help > >>>> constructing those would certainly be appreciated. What would also help > >>>> a > >>>> lot is validation code, Arrow structures are very delicate and one wrong > >>>> pointer can lead to disaster

Re: [C++] Kernel function registry evolution

2022-06-29 Thread Wes McKinney
> > > > > > Does boxing a scalar into an array actually build a buffer with the > > repeated value, or is it more efficient than that? > > > > > > Le 29/06/2022 à 17:57, Wes McKinney a écrit : > > > I'm working on my next PR which addresses the "

Re: [C++] Kernel function registry evolution

2022-06-29 Thread Wes McKinney
ss follow-on improvements like rewriting expression evaluation to utilize the span data structures to yield performance gains. On Mon, Jun 13, 2022 at 12:37 PM Wes McKinney wrote: > > I merged the PR a little while ago — thanks for David, Sasha for > helping review. If you have more comments

Re: [C++] Kernel function registry evolution

2022-06-13 Thread Wes McKinney
delete a lot of code I'll attach related Jiras to this umbrella issue: https://issues.apache.org/jira/browse/ARROW-16755 On Fri, Jun 10, 2022 at 12:56 PM Wes McKinney wrote: > > PR is up: https://github.com/apache/arrow/pull/13364 > > Look forward to getting this in since there's a bun

Re: [C++] Kernel function registry evolution

2022-06-10 Thread Wes McKinney
PR is up: https://github.com/apache/arrow/pull/13364 Look forward to getting this in since there's a bunch of follow on work that I'd like to get started on ASAP! On Thu, Jun 9, 2022 at 7:34 AM Wes McKinney wrote: > > I'm making good progress getting my branch PR-ready -- working t

Re: [C++] Kernel function registry evolution

2022-06-09 Thread Wes McKinney
I'm making good progress getting my branch PR-ready -- working through the compute-scalar-test suite and fixing the little things I broke. I hope I'll have it done by the end of the week. On Mon, Jun 6, 2022 at 3:21 PM Wes McKinney wrote: > > I created https://issues.apache.org/jira/browse

Re: [C++] Kernel function registry evolution

2022-06-06 Thread Wes McKinney
as I can to have my initial patch ARROW-16756 ready which will unblock the next few projects here On Mon, Jun 6, 2022 at 10:35 AM Wes McKinney wrote: > > This is definitely only the first stage of cleanup and streamlining — > I anticipate multiple rounds of refactoring (maybe not as

Re: [C++] Kernel function registry evolution

2022-06-06 Thread Wes McKinney
This is definitely only the first stage of cleanup and streamlining — I anticipate multiple rounds of refactoring (maybe not as invasive and painful as this one), and this patch I'm not sure will do a lot to alleviate bottom line expression evaluation overhead but it creates the environment (i.e.

Re: [C++] Kernel function registry evolution

2022-06-05 Thread Wes McKinney
ay from being review-ready: https://github.com/apache/arrow/compare/master...wesm:lightweight-exec-batch I'll post a PR when I have something closer to a green build. We probably won't want to let this PR linger since it will cause conflicts with any PRs touching scalar kernels. On Fri, Jun 3,

Re: RecordBatchFileWriter with DictionaryType: Making sure the dictionary stays the same

2022-06-03 Thread Wes McKinney
There's a relevant Jira issue here (maybe some others), if someone wants to pick it up and write a kernel for it https://issues.apache.org/jira/browse/ARROW-4097 I think having an improved experience around this dictionary conformance/normalization problem would be valuable. On Tue, May 31,

Re: [C++] Kernel function registry evolution

2022-06-03 Thread Wes McKinney
want to tweak the output > > interface (I don't know for sure if we will) then maybe it makes sense > > to pick a small set of kernels and incrementally improve that small > > set until we think we've made all the changes we are going to want for > > the near future, and then

Re: [C++] Kernel function registry evolution

2022-06-02 Thread Wes McKinney
On this topic, I actually have started prototyping a new ScalarKernel exec interface that uses a non-owning, shared_ptr-free "ArraySpan" data structure based on some prior conversations https://github.com/wesm/arrow/blob/711fd5e5665c280540bbaf48a48ca1eca1b91bff/cpp/src/arrow/compute/exec.h#L163

Re: [Dev] Switch to token authentication for archery & merge script

2022-06-01 Thread Wes McKinney
hi Jacob — this sounds very reasonable and fixes a rough edge for maintainers running into captcha issues. Thanks Wes On Wed, Jun 1, 2022 at 6:44 AM Jacob Wujciak wrote: > > Hello Everyone, > > I would like to propose that we switch from basic authentication with JIRA > in the merge script and

Re: [DISC] Improving Arrow's database support

2022-06-01 Thread Wes McKinney
I went ahead and created https://github.com/apache/arrow-adbc I directed issue comments / PRs to issues@ On Tue, May 31, 2022 at 8:49 PM Wes McKinney wrote: > > I think spinning up a new repository while this exploratory work > progresses is a fine idea — perhaps apache/arrow-dbc / a

Re: [DISC] Improving Arrow's database support

2022-05-31 Thread Wes McKinney
age the Arrow libraries). Of course, maintaining a parallel > build system, setting up releases, etc. is also a lot of work. > > -David > > On Tue, Apr 26, 2022, at 15:01, Wes McKinney wrote: > > I don't have major new things to add on this topic except that I've > > long had th

Re: [C++] Adding Run-Length Encoding to Arrow

2022-05-31 Thread Wes McKinney
I haven't had a chance to look at the branch in detail, but if you can provide a pointer to a specification or other details about the proposed memory format for RLE (basically: what would be added to the columnar documentation as well as the Flatbuffers schema files), it would be helpful so it

Re: Existence/name/scope for minimal C/C++ Arrow C Data interface helpers

2022-05-31 Thread Wes McKinney
I'm also supportive of having a small vendorable C/C++ "Arrow middleware" that provides: * Schemas and types * Columnar data structures and minimal APIs to build them and iterate over them * C data interface * Minimal validation (at the level of Validate but not ValidateFull) I don't think it's

Re: [DISCUSS] "Naming" the Arrow C++ execution engine subproject?

2022-05-19 Thread Wes McKinney
> > > "Acero" has a nice ring to it. Almost as if you said "ACE Arrow" really > > > fast. And maybe the steel / iron meaning gives a sort of close-to-metal > > > vibes (similar to what Rust's name invokes), though I'm not a Spanish > > > spe

Re: Merge a pull request with GitHub API

2022-05-18 Thread Wes McKinney
One of the benefits of the current merge script is that the PR description is preserved (maybe this could be possible with this method) — authors and co-authors are preserved by the explicit by-lines, e.g. Lead-authored-by: Nic Crane Co-authored-by: Ian Cook Signed-off-by: Ian Cook I assume

Re: [VOTE] [Rust] Move Ballista to new arrow-ballista repository

2022-05-17 Thread Wes McKinney
+1 (binding) On Tue, May 17, 2022 at 4:10 AM vin jake wrote: > > +1, It's reasonable > > On Mon, May 16, 2022 at 9:56 PM Andy Grove wrote: > > > I would like to propose that we move the Ballista project to a new > > top-level *arrow-ballista* repository. > > > > The rationale for this (copied

June 23 virtual conference to highlight work in the Arrow ecosystem

2022-05-13 Thread Wes McKinney
hi all, My employer (Voltron Data) is organizing a free virtual conference on June 23 to highlight development work and usage of Apache Arrow — you can register for this or apply to give a talk here: https://thedatathread.com/ We are especially interested in hearing from users (as opposed to

Re: Arrow sync call May 11 at 12:00 US/Eastern, 16:00 UTC

2022-05-12 Thread Wes McKinney
> Discussion about whether the community around Arrow would like to have > DataFrame-like APIs for Arrow in more languages, for example C++ We've discussed this a bit on the mailing list in the past, see

Re: [C++] Control flow and scheduling in C++ Engine operators / exec nodes

2022-05-11 Thread Wes McKinney
ub.com/apache/arrow/pull/12894 > [2] https://lists.apache.org/thread/mp68ofm2hnvs2v2oz276rvw7y5kwqoyd > [3] https://github.com/apache/arrow/pull/12755 > On Mon, May 2, 2022 at 1:20 PM Wes McKinney wrote: > > > > hi all, > > > > I've been catching up on

Re: [DISCUSS] "Naming" the Arrow C++ execution engine subproject?

2022-05-10 Thread Wes McKinney
erm in query engines already. > > >>>>> > > >>>>> > > >>>>> On Tue, Mar 29, 2022 at 10:07 AM Andy Grove > > >>> wrote: > > >>>>> > > >>>>>> Just my 2 cents on this. If you were t

[C++] Control flow and scheduling in C++ Engine operators / exec nodes

2022-05-02 Thread Wes McKinney
hi all, I've been catching up on the C++ execution engine codebase after a fairly long development hiatus. I have several questions / comments about the current design of the ExecNode and their implementations (currently: source / scan, filter, project, union, aggregate, sink, hash join). My

Re: [DISC] Improving Arrow's database support

2022-04-26 Thread Wes McKinney
I don't have major new things to add on this topic except that I've long had the aspiration of creating something like Python's DBAPI 2.0 [1] at the C or C++ level to enable a measure of API standardization for Arrow-native read/write interfaces with database drivers. It seems like a natural

Designing standards for "sandboxed" Arrow user-defined functions [was Re: User defined "Arrow Compute Function"]

2022-04-25 Thread Wes McKinney
I was going to reply to this e-mail thread on user@ but thought I would start a new thread on dev@. Executing user-defined functions in memory, especially untrusted functions, in general is unsafe. For "trusted" functions, having an in-memory API for writing them in user languages is very useful.

Re: [VOTE] Extend Arrow Flight SQL with more SQL type info in schemas

2022-04-25 Thread Wes McKinney
+1 (binding) I agree with the comments on the PR that it would be good to better explain what the "type name" is or give an example or reference in the code comments On Thu, Apr 21, 2022 at 11:49 AM José Almeida wrote: > > +1 (non binding) > > On Thu, Apr 21, 2022 at 1:49 PM Rafael Telles

Re: [Discuss][Format] Add 32-bit and 64-bit Decimals

2022-04-25 Thread Wes McKinney
I think there's a couple of embedded / entangled questions here that about this: * Should Arrow be able to be used to *transport* narrow decimals — for the (now very abundant) use cases where Arrow is being used as an internal wire protocol or client/server interface * Should *compute engines*

Re: PyArrow / Arrow questions about the time and date types

2022-04-04 Thread Wes McKinney
On Fri, Apr 1, 2022 at 2:00 PM Weston Pace wrote: > > > *Question 1*: For my own understanding: what purpose does the > > millisecond date64 type serve? > > I don't actually know the answer to this one. The rationale IIRC was that some systems represent dates this way, and so the purpose was to

Re: [Flight][Java][JDBC] IP clearance of Flight JDBC Driver

2022-04-04 Thread Wes McKinney
A corporate CLA is not required. Individual CLAs are fine. Since Dremio is a US corporation and the IP for the JDBC driver is owned by Dremio (I assume that the contributors all have IP assignment agreements where their contributions are assigned to the corporation), it would be best to have a

[DISCUSS] "Naming" the Arrow C++ execution engine subproject?

2022-03-28 Thread Wes McKinney
hi all, There has been a steady stream of work over the last year and a half or so to create a set of query engine building blocks in C++ to evaluate queries against Arrow Datasets and input streams, which can be of use to applications that are already building on top of the Arrow C++ project.

Re: [VOTE] Extend Arrow Flight SQL with GetXdbcTypeInfo, SQL type info in schemas

2022-03-27 Thread Wes McKinney
Adding my +1 (binding) vote (technically votes need 3 binding +1's so this will pass) On Fri, Mar 25, 2022 at 4:12 PM David Li wrote: > > The vote has been open for a while now without objection, so the vote passes > with 2 +1 votes (binding), 4 +1 votes (non-binding). > > Thanks to all the

Adding Apache Arrow to the registry of Digital Public Goods

2022-03-25 Thread Wes McKinney
As some research groups, e.g. at public universities, are doing work that involves Apache Arrow, I have learned that it would be beneficial in terms of access to funding if Arrow were registered as a Digital Public Good. Here is an example of another Apache project, Fineract, which is listed as

Re: [Vote][Datafusion][Python] Release Python binding of Apache Arrow Datafusion 0.5.0 RC1

2022-03-25 Thread Wes McKinney
Since https://github.com/datafusion-contrib/datafusion-python is not an official Apache Arrow project, the software is not under ASF governance procedures and so there is no need for the Arrow PMC to vote on releases. On Thu, Mar 10, 2022 at 7:56 AM Matthew Turner wrote: > > Thanks for driving

[ANNOUNCE] New Arrow committer: Jacob Quinn

2022-02-24 Thread Wes McKinney
On behalf of the Arrow PMC, I'm happy to announce that Jacob Quinn has accepted an invitation to become a committer on Apache Arrow. Welcome, and thank you for your contributions! Wes

Re: Managing usage of the @ApacheArrow Twitter handle and other social media

2022-02-16 Thread Wes McKinney
uld we request access? I'd like to tweet out the > Flight SQL blog post that just landed. > > On Wed, Feb 9, 2022, at 17:22, Wes McKinney wrote: > >> In my opinion, any PMC member should be allowed to use the Twitter account > >> without any other checks, balan

Re: Managing usage of the @ApacheArrow Twitter handle and other social media

2022-02-09 Thread Wes McKinney
, Feb 1, 2022 at 12:43 AM QP Hou wrote: > >>> > >>>> I don't know how other projects manage this, but one solution we could > >>>> evaluate is using github PRs to manage the twitter account. For > >>>> example, here is a github action that do

Re: [DISCUSS] Further proposals for Flight SQL

2022-02-04 Thread Wes McKinney
n tests) It > > > > > > > > would be > > > > > nice > > > > > > > > to have it on hand for reference. > > > > > > > > > > > > > > > > [1]: https://github.com/apache/arrow/tree/flight-sql >

Managing usage of the @ApacheArrow Twitter handle and other social media

2022-01-31 Thread Wes McKinney
hi all, The project is approaching it's 6th birthday and we have come a long way! We have a relatively seldom-used Twitter handle twitter.com/ApacheArrow and only a handful of people in the community have access to it. I know that Jacques and I do, but I am not sure who else. I wanted to

[ANNOUNCE] New Arrow PMC chair: Kouhei Sutou

2022-01-25 Thread Wes McKinney
I am pleased to announce that we have a new PMC chair and VP as per our newly started tradition of rotating the chair once a year. I have resigned and Kouhei was duly elected by the PMC and approved unanimously by the board. Please join me in congratulating Kou! Thanks, Wes

Re: [RUST][DataFusion][Arrow] Switching DataFusion to use arrow2 implementation and the future of arrow

2022-01-17 Thread Wes McKinney
seen, the > > > project is still undergoing major API changes on a monthly basis, so > > > quick releases and fast user feedback is quite valuable. But let's > > > hear Jorge's point of view on this first. > > > > > > On Sun, Jan 16, 2022 at 2:42 PM Wes

Re: [RUST][DataFusion][Arrow] Switching DataFusion to use arrow2 implementation and the future of arrow

2022-01-16 Thread Wes McKinney
Is there a possibility of donating arrow2 to the Arrow project (at some point)? The main impact to development would be holding votes on releases, but this is probably a good thing long term from a governance standpoint. The answer may be "not right now" and that's fine. Having many of the same

Re: Help drafting Arrow 2022-01 board report

2022-01-12 Thread Wes McKinney
hi Kou, Yes, I will submit it. Thanks for collecting the information Wes On Tue, Jan 11, 2022 at 7:13 PM Sutou Kouhei wrote: > > Hi Wes, > > Could you submit the draft to ASF? > Thanks to everyone who help drafting this report. > > Thanks, > -- > kou > > In

Re: [DISCUSS] Adding new columnar memory layouts to Arrow (in-memory, IPC, C ABI)

2022-01-11 Thread Wes McKinney
hi all, Thank you for all the comments on this mailing list thread and in the Google document. There is definitely a lot of work to take some next steps from here, so I think it would make sense to fork off each of the proposed additions into dedicated discussions. The most contentious issue, it

Re: [DRAFT] Arrow Jan 2022 ASF board report

2022-01-06 Thread Wes McKinney
hi folks, I apologize that I missed Kou's e-mail about this same subject about 14 hours ago. I'll withdraw this document, let's go with the one he started! Thanks On Thu, Jan 6, 2022 at 4:04 PM Wes McKinney wrote: > > Hi folks, it's time for our quarterly board report. Please suggest >

[DRAFT] Arrow Jan 2022 ASF board report

2022-01-06 Thread Wes McKinney
Hi folks, it's time for our quarterly board report. Please suggest additions on the following Google document or write them in responses to this e-mail. Thanks for your assistance! https://docs.google.com/document/d/1180DOLzfTunphh91WVCxhXW5c_fGxTKXWEfZm6Iu9Wc/edit?usp=sharing

Re: [DISCUSS] Annual rotation of Arrow PMC chair

2022-01-06 Thread Wes McKinney
Correction: the discussion was on dev@ and the vote (like all votes) was on private@ On Thu, Jan 6, 2022 at 3:37 PM Wes McKinney wrote: > > Thanks Kou for accepting the nomination. I will start a vote shortly. > > @Antoine -- the last time we rotated PMC chairs the discussio

Re: [DISCUSS] Annual rotation of Arrow PMC chair

2022-01-06 Thread Wes McKinney
Thanks Kou for accepting the nomination. I will start a vote shortly. @Antoine -- the last time we rotated PMC chairs the discussion and vote was conducted on dev@ so I'm just following the precedent On Thu, Jan 6, 2022 at 4:47 AM Benson Muite wrote: > > Congratulations! > On 1/6/22 12:54 AM,

[ANNOUNCE] New Arrow committer: Alessandro Molina

2022-01-05 Thread Wes McKinney
On behalf of the Arrow PMC, I'm happy to announce that Alessandro Molina has accepted an invitation to become a committer on Apache Arrow. Welcome, and thank you for your contributions! Wes

Re: [VOTE] Release Apache Arrow JS 6.0.2

2022-01-04 Thread Wes McKinney
nd uploaded. > >> > >> @Dominik - was not aware of arrow-wasm, thanks. > >> > >> Arrow rust implementation is in another repository and has support for > >> Javascript/Webassembly : > >> > >> https://github.com/apache/arrow-rs/tree/m

[ANNOUNCE] New Arrow PMC member: Yibo Cai

2022-01-04 Thread Wes McKinney
The Project Management Committee (PMC) for Apache Arrow has invited Yibo Cai to become a PMC member and we are pleased to announce that Yibo has accepted. Congratulations and welcome!

[DISCUSS] Annual rotation of Arrow PMC chair

2022-01-04 Thread Wes McKinney
hello all, As we discussed at the end of 2020 [1], we would like to have a roughly annual rotation of the Apache Arrow PMC chair. The responsibilities of the PMC chair are mainly bureaucratic: the submission of quarterly board reports on reporter.apache.org and managing the PMC roster on

  1   2   3   4   5   6   7   8   9   10   >