[DISCUSS] Developing an "Arrow Compute IR [Intermediate Representation]" to decouple language front ends from Arrow-native compute engines

2021-08-02 Thread Wes McKinney
hi folks, This idea came up in passing in the past -- given that there are multiple independent efforts to develop Arrow-native query engines (and surely many more to come), it seems like it would be valuable to have a way to enable user languages (like Java, Python, R, or Rust, for example) to co

Re: Recent Flatbuffers warns about non-snake-case field names

2021-08-02 Thread Max Burke
Is it something that can be done in a major version release? On our project we extensively use flatbuffer schema definitions and have incorporated the Arrow ones so our builds are littered with these warnings and it would be nice to see them disappear! On Mon, Aug 2, 2021 at 9:07 AM Micah Kornfiel

Re: [Discuss] [Rust] Arrow2/parquet2 going foward

2021-08-02 Thread Jorge Cardoso Leitão
Hi, Sorry for the delay. If there is a path towards an official release under a <1.0.0 versioning schema aligned with the rest of the Rust ecosystem and in line with the stability of the API, then IMO we should move all development to within Apache experimental asap (I can handle this and the lik

Re: Recent Flatbuffers warns about non-snake-case field names

2021-08-02 Thread Micah Kornfield
I'm -0.5. We never got it merged but I think at least some people might be relying on JSON serialized versions of flatbuffers schemas (and I would guess this would break that, or at least it is something we should test). While I like consistency and the warnings might be annoying, I'd rather keep

Re: [C++] Unable to getMutableValues from ArrayData

2021-08-02 Thread Rares Vernica
Thanks all! I ended up with this and it worked fine: std::shared_ptr delta_data = _arrowBatch->column_data(nAtts); // COPY delta to pos std::vector> pos_buffers(2); pos_buffers[0] = NULL; // No nulls in the array ASSIGN_OR_THROW(pos_buffers[1],

Re: [C++] Unable to getMutableValues from ArrayData

2021-08-02 Thread Antoine Pitrou
On Fri, 30 Jul 2021 18:55:33 +0200 Rares Vernica wrote: > Hello, > > I have a RecordBatch that I read from an IPC file. I need to run a > cumulative sum on one of the int64 arrays in the batch. I tried to do: The ArrayData contents are semantically immutable. You may want to grab mutable pointe

Re: [C++][Discuss] Representing null union scalars

2021-08-02 Thread Antoine Pitrou
Do other C++ developers have an opinion here? On Thu, 29 Jul 2021 12:58:08 +0200 Antoine Pitrou wrote: > Hello, > > The Scalar base class has a `bool is_valid` member that is used to > represent null scalars for all types (including the null type). > > A UnionScalar, since it inherits from

[RESULT] [VOTE][RUST] Release Apache Arrow Rust 5.1.0 RC1

2021-08-02 Thread Andrew Lamb
The vote passes with 3 +1 (binding) and a +1 (non-binding) Thanks to everyone who helped verify and contribute to this release The release is available here: https://dist.apache.org/repos/dist/release/arrow/arrow-rs-5.1.0 The release has also been published to crates.io: https://crates.io/cra

Re: Recent Flatbuffers warns about non-snake-case field names

2021-08-02 Thread Antoine Pitrou
While I'm not fond of changing naming conventions because of a third-party project, in this case we have inconsistent naming in our .fbs files, so this could be an opportunity to clean it up. Le 02/08/2021 à 15:00, Wes McKinney a écrit : While doing some Flatbuffers work, I noticed that re

Recent Flatbuffers warns about non-snake-case field names

2021-08-02 Thread Wes McKinney
While doing some Flatbuffers work, I noticed that recent compiler versions now warn about non-snake-case field names: https://github.com/google/flatbuffers/pull/6005 It seems that the intent is for the compiler to generate "language-friendly" code (e.g. camelCase for Java) from snake_case schemas

Re: [C++] Use RecordBatch::AddColumn to update RecordBatch

2021-08-02 Thread Wes McKinney
hi Rares -- since AddColumn appends to the data in the existing batch, all of the Buffer shared_ptrs that were there before should persist after the operation. So no memory should be freed. Certainly nothing is copied or allocated during the operation. On Mon, Aug 2, 2021 at 6:17 AM Rares Vernica

Re: [VOTE][RUST] Release Apache Arrow Rust 5.1.0 RC1

2021-08-02 Thread Neville Dipale
+1 Ran ./dev/release/verify-release-candidate.sh 5.1.0 1 on aarch64-apple-darwin Got + TEST_SUCCESS=yes + echo 'Release candidate looks good!' Release candidate looks good! + exit 0 + cleanup On Fri, 30 Jul 2021 at 15:53, Wayne Xia wrote: > +1 > > I ran this on Intel macOS Catalina: > ./dev/r

[C++] Use RecordBatch::AddColumn to update RecordBatch

2021-08-02 Thread Rares Vernica
Hello, I'm using RecordBatch:;AddColumn to update a RecordBatch. Something like this: std::shared_ptr rb; ... rb = rb->AddColumn(...) Since AddColumn creates a new RecordBatch, is the memory taken by rb before assignment being freed as expected. Thanks! Rares

Re: [Rust][DataFusion] [DISCUSS] Next DataFusion / Ballista official release

2021-08-02 Thread Andrew Lamb
I also think it sounds like a good process to get all the various packages released in a timely manner. Thank you for taking point on this issue Andrew On Sun, Aug 1, 2021 at 11:24 PM Andy Grove wrote: > Thanks QP. This seems reasonable to me. > > On Sun, Aug 1, 2021, 3:24 PM QP Hou wrote: > >