[jira] [Created] (ARROW-8359) [C++/Python] Enable aarch64/ppc64le build in conda recipes

2020-04-06 Thread Uwe Korn (Jira)
Uwe Korn created ARROW-8359: --- Summary: [C++/Python] Enable aarch64/ppc64le build in conda recipes Key: ARROW-8359 URL: https://issues.apache.org/jira/browse/ARROW-8359 Project: Apache Arrow Issue

Re: [DISCUSS] Adding "trivial" buffer compression option to IPC protocol (ARROW-300)

2020-04-06 Thread Wes McKinney
I updated the Format proposal again, please have a look https://github.com/apache/arrow/pull/6707 On Wed, Apr 1, 2020 at 10:15 AM Wes McKinney wrote: > > For uncompressed, memory mapping is disabled, so all of the bytes are > being read into RAM. I wanted to show that even when your IO pipe is

Re: Preparing for 0.17.0 Arrow release

2020-04-06 Thread Andy Grove
There are two trivial Rust PRs pending that I would like to see merged for the release. ARROW-7794: [Rust] Support releasing arrow-flight https://github.com/apache/arrow/pull/6858 ARROW-8357: [Rust] [DataFusion] Dockerfile for CLI is missing format dir https://github.com/apache/arrow/pull/6860

[jira] [Created] (ARROW-8358) [C++] Fix -Wrange-loop-construct warnings in clang-11

2020-04-06 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-8358: --- Summary: [C++] Fix -Wrange-loop-construct warnings in clang-11 Key: ARROW-8358 URL: https://issues.apache.org/jira/browse/ARROW-8358 Project: Apache Arrow

[jira] [Created] (ARROW-8357) [Rust] [DataFusion] Dockerfile for CLI is missing format dir

2020-04-06 Thread Andy Grove (Jira)
Andy Grove created ARROW-8357: - Summary: [Rust] [DataFusion] Dockerfile for CLI is missing format dir Key: ARROW-8357 URL: https://issues.apache.org/jira/browse/ARROW-8357 Project: Apache Arrow

[jira] [Created] (ARROW-8356) [Developer] Support * wildcards with "crossbow submit" via GitHub actions

2020-04-06 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-8356: --- Summary: [Developer] Support * wildcards with "crossbow submit" via GitHub actions Key: ARROW-8356 URL: https://issues.apache.org/jira/browse/ARROW-8356 Project:

[jira] [Created] (ARROW-8355) [Python] Reduce the number of pandas dependent test cases in test_feather

2020-04-06 Thread Krisztian Szucs (Jira)
Krisztian Szucs created ARROW-8355: -- Summary: [Python] Reduce the number of pandas dependent test cases in test_feather Key: ARROW-8355 URL: https://issues.apache.org/jira/browse/ARROW-8355 Project:

[jira] [Created] (ARROW-8353) [C++] is_nullable maybe not initialized in parquet writer

2020-04-06 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-8353: -- Summary: [C++] is_nullable maybe not initialized in parquet writer Key: ARROW-8353 URL: https://issues.apache.org/jira/browse/ARROW-8353 Project: Apache Arrow

[jira] [Created] (ARROW-8352) [R] Add install_pyarrow()

2020-04-06 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-8352: -- Summary: [R] Add install_pyarrow() Key: ARROW-8352 URL: https://issues.apache.org/jira/browse/ARROW-8352 Project: Apache Arrow Issue Type: New Feature

Re: building arrow with CMake 2.8 on CentOS

2020-04-06 Thread Wes McKinney
Newer versions of CMake are also available from PyPI pip install cmake https://pypi.org/project/cmake/ On Mon, Apr 6, 2020 at 1:11 AM Sutou Kouhei wrote: > > Hi, > > We don't support CMake 2.8. Please use CMake 3.2 or later. > > Are you using CentOS 6? You can install CMake 3.6 with EPEL > on

Re: C interface clarifications

2020-04-06 Thread Wes McKinney
On Mon, Apr 6, 2020 at 12:22 PM Todd Lipcon wrote: > > On Mon, Apr 6, 2020 at 9:57 AM Antoine Pitrou wrote: > > > > > Hello Todd, > > > > Le 06/04/2020 à 18:18, Todd Lipcon a écrit : > > > > > > I had a couple questions / items that should be clarified in the spec. > > Wes > > > suggested I

Re: Attn: Wes, Re: Masked Arrays

2020-04-06 Thread Wes McKinney
For the sake of others reading, this discussion might be a bit confusing to happen upon because the scope isn't clear. It seems that we are discussing the C++ implementation and not the columnar format, is that right? Adding any additional metadata about this to the columnar format / Flatbuffers

[jira] [Created] (ARROW-8351) [R][CI] Store the Rtools-built Arrow C++ library as a build artifact

2020-04-06 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-8351: -- Summary: [R][CI] Store the Rtools-built Arrow C++ library as a build artifact Key: ARROW-8351 URL: https://issues.apache.org/jira/browse/ARROW-8351 Project:

[jira] [Created] (ARROW-8350) [Python] Implement to_numpy on ChunkedArray

2020-04-06 Thread Uwe Korn (Jira)
Uwe Korn created ARROW-8350: --- Summary: [Python] Implement to_numpy on ChunkedArray Key: ARROW-8350 URL: https://issues.apache.org/jira/browse/ARROW-8350 Project: Apache Arrow Issue Type:

Re: C interface clarifications

2020-04-06 Thread Todd Lipcon
On Mon, Apr 6, 2020 at 9:57 AM Antoine Pitrou wrote: > > Hello Todd, > > Le 06/04/2020 à 18:18, Todd Lipcon a écrit : > > > > I had a couple questions / items that should be clarified in the spec. > Wes > > suggested I raise them here on dev@: > > > > *1) Should producers expect callers to

Re: C interface clarifications

2020-04-06 Thread Antoine Pitrou
Hello Todd, Le 06/04/2020 à 18:18, Todd Lipcon a écrit : > > I had a couple questions / items that should be clarified in the spec. Wes > suggested I raise them here on dev@: > > *1) Should producers expect callers to zero-init structs?* IMO, they shouldn't. They should fill the structure

Re: Attn: Wes, Re: Masked Arrays

2020-04-06 Thread Felix Benning
In that case it is probably necessary to have a "has_sentinel" flag and a "sentinel_value" variable. Since other algorithms might benefit from not having to set these values to zero. Which is probably the reason why the value "underneath" was set to unspecified in the first place. Alternatively a

C interface clarifications

2020-04-06 Thread Todd Lipcon
Hey folks, I've started working on a patch to make Apache Kudu's C++ client able to expose batches of data in Arrow's new C-style interface ( https://github.com/apache/arrow/blob/master/docs/source/format/CDataInterface.rst ) I had a couple questions / items that should be clarified in the spec.

[jira] [Created] (ARROW-8349) [CI][NIGHTLY:gandiva-jar-osx] Use latest pygit2

2020-04-06 Thread Prudhvi Porandla (Jira)
Prudhvi Porandla created ARROW-8349: --- Summary: [CI][NIGHTLY:gandiva-jar-osx] Use latest pygit2 Key: ARROW-8349 URL: https://issues.apache.org/jira/browse/ARROW-8349 Project: Apache Arrow

Re: Attn: Wes, Re: Masked Arrays

2020-04-06 Thread Francois Saint-Jacques
It does make sense, I would go a little further and make this field/property a single value of the same type than the array. This would allow using any arbitrary sentinel value for unknown values (0 in your suggested case). The end result is zero-copy for R bindings (if stars are aligned). I

[jira] [Created] (ARROW-8348) [C++] Support optional sentinel values in primitive Array for nulls

2020-04-06 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-8348: - Summary: [C++] Support optional sentinel values in primitive Array for nulls Key: ARROW-8348 URL: https://issues.apache.org/jira/browse/ARROW-8348

[jira] [Created] (ARROW-8347) [C++] Add Result to Array methods

2020-04-06 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-8347: - Summary: [C++] Add Result to Array methods Key: ARROW-8347 URL: https://issues.apache.org/jira/browse/ARROW-8347 Project: Apache Arrow Issue Type:

[jira] [Created] (ARROW-8346) [CI][Ruby] GLib/Ruby macOS build fails on zlib

2020-04-06 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-8346: -- Summary: [CI][Ruby] GLib/Ruby macOS build fails on zlib Key: ARROW-8346 URL: https://issues.apache.org/jira/browse/ARROW-8346 Project: Apache Arrow

Fwd: Attn: Wes, Re: Masked Arrays

2020-04-06 Thread Felix Benning
Would it make sense to have an `na_are_zero` flag? Since null checking is not without cost, it might be helpful to some algorithms, if the content "underneath" the nulls is zero. For example in means, or scalar products and thus matrix multiplication, knowing that the array has zeros where the

[jira] [Created] (ARROW-8345) [Python] feather.read_table should not require pandas

2020-04-06 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-8345: Summary: [Python] feather.read_table should not require pandas Key: ARROW-8345 URL: https://issues.apache.org/jira/browse/ARROW-8345 Project: Apache

Re: Preparing for 0.17.0 Arrow release

2020-04-06 Thread Antoine Pitrou
Also nice to have perhaps (PR available and several back-and-forths already): * ARROW-7610: [Java] Finish support for 64 bit int allocations Needs a Java committer to decide... Regards Antoine. Le 06/04/2020 à 00:24, Wes McKinney a écrit : > We are getting close to the 0.17.0 endgame. > >

Re: Preparing for 0.17.0 Arrow release

2020-04-06 Thread Antoine Pitrou
Hi, I added the following issue to the cpp-1.6.0 milestone: * PARQUET-1835 [C++] Fix crashes on invalid input (OSS-Fuzz) There's a PR up for it and it's simple enough to be validated quickly, IMHO. Regards Antoine. Le 06/04/2020 à 00:24, Wes McKinney a écrit : > We are getting close to

Re: Preparing for 0.17.0 Arrow release

2020-04-06 Thread Wes McKinney
That may be so. If we do partially revert it (the dict return value is the only thing probably that needs to be changed), we need to get the downstream libraries to make changes to allow us to make this change. Another option is returning the KV wrapper via another attribute. On Mon, Apr 6, 2020,

[jira] [Created] (ARROW-8344) [C#] StringArray.Builder.Clear() corrupts subsequent array contents

2020-04-06 Thread Adam Szmigin (Jira)
Adam Szmigin created ARROW-8344: --- Summary: [C#] StringArray.Builder.Clear() corrupts subsequent array contents Key: ARROW-8344 URL: https://issues.apache.org/jira/browse/ARROW-8344 Project: Apache

[jira] [Created] (ARROW-8343) [GLib] Add GArrowRecordBatchIterator

2020-04-06 Thread Kenta Murata (Jira)
Kenta Murata created ARROW-8343: --- Summary: [GLib] Add GArrowRecordBatchIterator Key: ARROW-8343 URL: https://issues.apache.org/jira/browse/ARROW-8343 Project: Apache Arrow Issue Type: New

Re: Preparing for 0.17.0 Arrow release

2020-04-06 Thread Antoine Pitrou
Hmm, if downstream libraries were expecting a dict, perhaps we'll need to revert that change... Regards Antoine. Le 06/04/2020 à 08:50, Joris Van den Bossche a écrit : > We also have a recent regression related to the KeyValueMetadata wrapping > python that is causing failures in downstream

Re: Preparing for 0.17.0 Arrow release

2020-04-06 Thread Joris Van den Bossche
We also have a recent regression related to the KeyValueMetadata wrapping python that is causing failures in downstream libraries, that seems a blocker for the release: https://issues.apache.org/jira/browse/ARROW-8342 On Mon, 6 Apr 2020 at 00:25, Wes McKinney wrote: > We are getting close to

[jira] [Created] (ARROW-8342) [Python] dask and kartothek integration tests are failing

2020-04-06 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-8342: Summary: [Python] dask and kartothek integration tests are failing Key: ARROW-8342 URL: https://issues.apache.org/jira/browse/ARROW-8342 Project:

Re: [NIGHTLY] Arrow Build Report for Job nightly-2020-04-05-0

2020-04-06 Thread Joris Van den Bossche
I opened https://issues.apache.org/jira/browse/ARROW-8342 for the dask/kartothek integration failures. On Mon, 6 Apr 2020 at 02:54, Crossbow wrote: > > Arrow Build Report for Job nightly-2020-04-05-0 > > All tasks: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-05-0

Re: building arrow with CMake 2.8 on CentOS

2020-04-06 Thread Sutou Kouhei
Hi, We don't support CMake 2.8. Please use CMake 3.2 or later. Are you using CentOS 6? You can install CMake 3.6 with EPEL on CentOS 6: % sudo yum install -y epel-release % sudo yum install -y cmake3 % cmake3 --version cmake3 version 3.6.1 CMake suite maintained and supported by