Re: [C++] Type_codes and child_ids for Unions & test time concerns

2020-11-08 Thread Micah Kornfield
> > First of all I would like to ask why we use both type_codes and child_ids > for Union types. It seems that we can already cover the logical types a > union has using type_codes alone. What’s the point of using child_ids? The two are inverses of each other:

[C++] Type_codes and child_ids for Unions & test time concerns

2020-11-08 Thread Ying Zhou
The work of converting Arrow Arrays, ChunkedArrays, RecordBatches and Tables to ORC files is about 50% done. Now I have two questions. First of all I would like to ask why we use both type_codes and child_ids for Union types. It seems that we can already cover the logical types a union has

Re: Using arrow/compute/kernels/*internal.h headers

2020-11-08 Thread Wes McKinney
I'm not opposed to installing headers that provide access to some of the kernel implementation internals (with the caveat that changes won't go through a deprecation cycle, so caveat emptor). It might be more sustainable to think about what kind of stable-ish public API could be exported to

Pandas Block Manager

2020-11-08 Thread Nicholas White
Hi - I've been looking through the Arrow specification format to look for ways to allow zero-copy creation of Pandas DataFrames (beyond `split_blocks`). Am I right in thinking that if you created an Arrow file (let's say of `m` rows and `n` columns of `float64`s for now) as a single RecordBatch

Re: Using arrow/compute/kernels/*internal.h headers

2020-11-08 Thread Ben Kietzman
Hi Niranda, SumImpl is a subclass of KernelState. Given a SumAggregateKernel, one can produce zeroed KernelState using the `init` member, then operate on data using the `consume`, `merge`, and `finalize` members. You can look at ScalarAggExecutor for an example of how to get from a compute

Re: Using arrow/compute/kernels/*internal.h headers

2020-11-08 Thread Niranda Perera
Hi Ben, We are building a distributed table abstraction on top of Arrow dataframes called Cylon (https://github.com/cylondata/cylon). Currently we have a simple aggregation and group-by operation implementation. But we felt like we can give more functionality if we can import arrow kernels and

Re: Using arrow/compute/kernels/*internal.h headers

2020-11-08 Thread Ben Kietzman
Ni Niranda, What is the context of your work? if you're working inside the arrow repository you shouldn't need to install headers before using them, and we welcome PRs for new kernels. Otherwise, could you provide some details about how your work is using Arrow as a dependency? Ben Kietzman On

Using arrow/compute/kernels/*internal.h headers

2020-11-08 Thread Niranda Perera
Hi, I was wondering if I could use the arrow/compute/kernels/*internal.h headers in my work? I would like to reuse some of the kernel implementations and kernel states. With -DARROW_COMPUTE=ON, those headers are not added into the include dir. I see that the *internal.h headers are skipped from

[NIGHTLY] Arrow Build Report for Job nightly-2020-11-08-0

2020-11-08 Thread Crossbow
Arrow Build Report for Job nightly-2020-11-08-0 All tasks: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-11-08-0 Failed Tasks: - conda-win-vs2017-py36: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-11-08-0-azure-conda-win-vs2017-py36 -