Re: Exposing low-level Parquet encryption to Python user (or, maybe not)

2021-02-12 Thread Micah Kornfield
My thoughts: 1. I've lost track of the higher level encryption implementation in C++. I think we were trying to come to a consensus on the threading/thread safety model? 2. I'm open to exposing the lower level encryption libraries in python (without appropriate namespacing/communication). It se

Re: [C++] adopting an SIMD library - xsimd

2021-02-12 Thread Micah Kornfield
That is unfortunate, like I said if the consensus is xsimd, let's move forward with that. On Fri, Feb 12, 2021 at 2:45 AM Antoine Pitrou wrote: > > There is an std::simd being envisioned. > https://en.cppreference.com/w/cpp/experimental/simd/simd > > The problem is that we need an implementation

Re: Array offset in IPC

2021-02-12 Thread Micah Kornfield
Hi Jorge, This is correct to my knowledge offsets are not modelled in IPC. There was a lot of debate on whether to include them in the c data interface. Cheers, Micah On Friday, February 12, 2021, Jorge Cardoso Leitão wrote: > Hi, > > I am going through the Rust implementation of the IPC, and

Array offset in IPC

2021-02-12 Thread Jorge Cardoso Leitão
Hi, I am going through the Rust implementation of the IPC, and I am trying to understand how we share Arraydata offsets. Specifically, our C data interface supports the notion of an offset, measured in slots, that denotes how many slots ahead of the buffer pointers we read from. This enables us t

Documenting the dataset/compute/expression APIs

2021-02-12 Thread Aldrin
Hello! I am interested in exploring the compute and expression APIs for pushdown filters, and I expect some of the use cases to overlap with Gandiva, and efforts towards flightSQL. I feel like the design and API documentations for these are sparse or I am simply bad at finding them, and wanted to

Re: [Python] Python based Query Engine for Arrow

2021-02-12 Thread Wes McKinney
I'm actively building an engineering team to work on this -- so anyone who would like to work on this as part of their day job can reach out to me to discuss. We are doing some research about what aspects of prior art in columnar database systems we can pull into the Arrow C++ project (to make sure

Re: [Python] Python based Query Engine for Arrow

2021-02-12 Thread Jorge Cardoso Leitão
Hi, Tom, This does not address the question directly, but for what is worth, I had the same issue and thus released a Python binding for DataFusion . It allows e.g. to create a pyarrow RecordBatch by reading from s3 (via pyarrow), and use it as a source to Dat

Re: [Python] Python based Query Engine for Arrow

2021-02-12 Thread Micah Kornfield
Welcome Tom, > Is there already something like DataFusion on the roadmap for C++ (and thus > Python)? Yes it is [1] and the components are being developed. In terms of contributions others might have a better idea but I think the two big pieces of functionality missing from a kernel/operator pe

[Python] Python based Query Engine for Arrow

2021-02-12 Thread Tom Scheffers
Dear devs, I am really interested in an in-memory query interface to Arrow tables (like DataFusion is for Rust), preferably in Python. In my opinion, there are three routes: 1. create a wrapper/interface to DataFusion directly, 2. copy Arrow to pandas and use an existing framework (like Ibis) and

Re: Arrow Rust sync call February 10 at 12:00 US/Eastern, 17:00 UTC

2021-02-12 Thread Benjamin Blodgett
Hi Andy, Is Ballista or the Rust meeting still looking for sponsors? Please send me some details as I'm still in the trying to get C level buy in, but I think it's moving along :) I'd like to do something similar to TwoSigmas BeakerX or Flint at my work, while especially focusing on low level op

Re: [NIGHTLY] Arrow Build Report for Job nightly-2021-02-12-0

2021-02-12 Thread Joris Van den Bossche
I opened an issue for the turbodbc failures: https://issues.apache.org/jira/browse/ARROW-11608 (it seems a build issue "Unknown CMake command "pybind11_add_module") On Fri, 12 Feb 2021 at 11:13, Crossbow wrote: > > Arrow Build Report for Job nightly-2021-02-12-0 > > All tasks: > https://github.c

Re: [C++] adopting an SIMD library - xsimd

2021-02-12 Thread Antoine Pitrou
There is an std::simd being envisioned. https://en.cppreference.com/w/cpp/experimental/simd/simd The problem is that we need an implementation that's C++11- or C++14-compliant, that works on major compilers, and that provides accelerations for common instruction sets. It doesn't seem to be the

[NIGHTLY] Arrow Build Report for Job nightly-2021-02-12-0

2021-02-12 Thread Crossbow
Arrow Build Report for Job nightly-2021-02-12-0 All tasks: https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-02-12-0 Failed Tasks: - conda-linux-gcc-py36-aarch64: URL: https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-02-12-0-drone-conda-linux