Re: [VOTE][RUST][Datafusion] Release Apache Arrow Datafusion 5.0.0 RC3

2021-08-12 Thread QP Hou
Good call Ruihang. I remember we used to have this toolchain file when we were still in the main arrow repo. I will take a look into that. On Wed, Aug 11, 2021 at 5:36 PM Wayne Xia wrote: > > Hi QP, > > When running this script I noticed that this might be because I was not > using a stable

Re: [VOTE][RUST] Release Apache Arrow Rust 5.2.0 RC1

2021-08-12 Thread QP Hou
+1 (non-binding) ran the verification script on Linux 5.4.0 x86_64 On Thu, Aug 12, 2021 at 12:44 PM Andrew Lamb wrote: > > Hi, > > I would like to propose a release of Apache Arrow Rust Implementation, > version 5.2.0. > > This release candidate is based on commit: >

Re: [VOTE][RUST] Release Apache Arrow Rust 5.2.0 RC1

2021-08-12 Thread Andy Grove
+1 (binding) I checked signatures and ran the verification script. On Thu, Aug 12, 2021 at 1:44 PM Andrew Lamb wrote: > Hi, > > I would like to propose a release of Apache Arrow Rust Implementation, > version 5.2.0. > > This release candidate is based on commit: >

Re: [DISCUSS] Splitting out the Arrow format directory

2021-08-12 Thread Phillip Cloud
On Thu, Aug 12, 2021 at 1:03 PM Jorge Cardoso Leitão < jorgecarlei...@gmail.com> wrote: > I agree with Antoine that we should weigh the pros and cons of flatbuffers > (or protobuf or thrift for that matter) over a more human-friendly, > simpler, format like json or MsgPack. I also struggle a bit

Re: [Rust] Integration tests for recursive nested data?

2021-08-12 Thread Jorge Cardoso Leitão
Hi, The checkout of arrow-rs on the failed build is over fa5acd971c97, which up to 3hrs or so was master, so, I think it is picking the right code. Did a quick investigation: * The integration tests on arrow-rs have not been running since June the 30th. they stopped running after the merge of

Re: [DISCUSS] Developing an "Arrow Compute IR [Intermediate Representation]" to decouple language front ends from Arrow-native compute engines

2021-08-12 Thread Julian Hyde
> Wes wrote: > > Supporting this kind of intra-application engine > heterogeneity is one of the motivations for the project. +1 The data format is the natural interface between tasks. (Defining “task” here as “something that is programmed using the IR”.) That is Arrow’s strength. So I think

[VOTE][RUST] Release Apache Arrow Rust 5.2.0 RC1

2021-08-12 Thread Andrew Lamb
Hi, I would like to propose a release of Apache Arrow Rust Implementation, version 5.2.0. This release candidate is based on commit: 7c98c4c60bc776acd09bd3568c6630d360e8d652 [1] The proposed release tarball and signatures are hosted at [2]. The changelog is located at [3]. Please download,

Re: [DISCUSS] Splitting out the Arrow format directory

2021-08-12 Thread Jorge Cardoso Leitão
I agree with Antoine that we should weigh the pros and cons of flatbuffers (or protobuf or thrift for that matter) over a more human-friendly, simpler, format like json or MsgPack. I also struggle a bit to reason with the complexity of using flatbuffers for this. E.g. there is no async support

Re: [DISCUSS] Splitting out the Arrow format directory

2021-08-12 Thread Antoine Pitrou
Le 12/08/2021 à 15:05, Wes McKinney a écrit : It seems that one adjacent problem here is how to make it simpler for third parties (especially ones that act as front end interfaces) to build and serialize/deserialize the IR structures with some kind of ready-to-go middleware library, written in

Re: [DISCUSS] Splitting out the Arrow format directory

2021-08-12 Thread Wes McKinney
On Thu, Aug 12, 2021 at 3:16 PM Neal Richardson wrote: > > > Maintain this "Arrow types and ComputeIR library" as an always > zero-dependency library to facilitate vendoring > > Would/should this hypothetical zero-dep, vendorable library also include > the IPC format? Or if you want to interact

Re: [DISCUSS] Splitting out the Arrow format directory

2021-08-12 Thread Neal Richardson
> Maintain this "Arrow types and ComputeIR library" as an always zero-dependency library to facilitate vendoring Would/should this hypothetical zero-dep, vendorable library also include the IPC format? Or if you want to interact with IPC in that case, the C data interface is the best/only option?

Re: [DISCUSS] Splitting out the Arrow format directory

2021-08-12 Thread Phillip Cloud
On Thu, Aug 12, 2021 at 9:06 AM Wes McKinney wrote: > It seems that one adjacent problem here is how to make it simpler for > third parties (especially ones that act as front end interfaces) to > build and serialize/deserialize the IR structures with some kind of > ready-to-go middleware

Re: [DISCUSS] Splitting out the Arrow format directory

2021-08-12 Thread Wes McKinney
It seems that one adjacent problem here is how to make it simpler for third parties (especially ones that act as front end interfaces) to build and serialize/deserialize the IR structures with some kind of ready-to-go middleware library, written in a language like C++. To do that, one would need

Re: [DISCUSS] Developing an "Arrow Compute IR [Intermediate Representation]" to decouple language front ends from Arrow-native compute engines

2021-08-12 Thread Wes McKinney
On Wed, Aug 11, 2021 at 11:22 PM Phillip Cloud wrote: > > On Wed, Aug 11, 2021 at 4:48 PM Jorge Cardoso Leitão < > jorgecarlei...@gmail.com> wrote: > > > Couple of questions > > > > 1. Is the goal that IRs have equal semantics, i.e. given (IR,data), the > > operation "(IR,data) - engine ->

Re: [DISCUSS] Splitting out the Arrow format directory

2021-08-12 Thread Andrew Lamb
I support the idea of an independent repo that has the arrow flatbuffers format definition files. My rationale is that the Rust implementation has a copy of the `format` directory [1] and potential drift worries me (a bit). Having a single source of truth for the format that is not part of the

Re: [Rust] Integration tests for recursive nested data?

2021-08-12 Thread Andrew Lamb
Hi Micah, There is no open issue that I know of, and while I may be mistaken it looks like the most recent run of the Integration Test on apache/master [3] passed successfully. The code [1] that is asserting on your branch doesn't look like it has been touched for a while (last touched in March