Re: [DISCUSS] C-level in-process array protocol

2019-10-02 Thread Micah Kornfield
Hi Wes, I agree for third-parties "A" (Field data structures) is the most useful. At least in my mind the discussion was for both first and third-parties. I was trying to point out that "A" is less necessary as a first step for first-party integrations and could potentially require more effort

Re: [DISCUSS][Java] Design of the algorithm module

2019-10-02 Thread Micah Kornfield
Hi Liya Fan, Thanks again for writing this up. I think it provides a road-map for intended features. I commented on the document but I wanted to raise a few high-level concerns here as well to get more feedback from the community. 1. It isn't clear to me who the users will of this will be. My

Re: [DISCUSS] C-level in-process array protocol

2019-10-02 Thread Wes McKinney
On Wed, Oct 2, 2019 at 11:05 PM Micah Kornfield wrote: > > I've tried to summarize my understanding of the debate so far and give some > initial thoughts. I think there are two potentially different sets of users > that we are targeting with a stable C API/ABI ourselves and external > parties. >

Re: Clarifying interpretation of Buffer "length" field in Arrow protocol

2019-10-02 Thread Micah Kornfield
Hi Wes, It seems fine to be flexible here. However: > This could have implications for hashing or > comparisons, for example, so I think that having the flexibility to do > either is a good idea. This statement of use-cases makes me a little nervous. It seems like it could lead to bugs if a

Re: [DISCUSS] Result vs Status

2019-10-02 Thread Micah Kornfield
Hi Ben, > From the discussion in the sync call, it seems reasonable to require that: > Public APIs which are likely to be directly wrapped in a binding should not > use Result<> to the exclusion of Status. An equivalent Status API should > always be provided for ease of binding. Along with

[jira] [Created] (ARROW-6777) [GLib][CI] Unpin gobject-introspection gem

2019-10-02 Thread Kouhei Sutou (Jira)
Kouhei Sutou created ARROW-6777: --- Summary: [GLib][CI] Unpin gobject-introspection gem Key: ARROW-6777 URL: https://issues.apache.org/jira/browse/ARROW-6777 Project: Apache Arrow Issue Type:

Re: [DISCUSS] C-level in-process array protocol

2019-10-02 Thread Micah Kornfield
I've tried to summarize my understanding of the debate so far and give some initial thoughts. I think there are two potentially different sets of users that we are targeting with a stable C API/ABI ourselves and external parties. 1. Different language implementations within the Arrow project

Re: [DISCUSS] C-level in-process array protocol

2019-10-02 Thread Wes McKinney
On Wed, Oct 2, 2019 at 10:19 PM Wes McKinney wrote: > > On Wed, Oct 2, 2019 at 7:46 PM Jacques Nadeau wrote: > > > > I'd like to hear more opinions from others on this topic. This conversation > > seems mostly dominated by comments from myself, Wes and Antoine. > > > > I think it is reasonable

Re: [DISCUSS] C-level in-process array protocol

2019-10-02 Thread Wes McKinney
On Wed, Oct 2, 2019 at 7:46 PM Jacques Nadeau wrote: > > I'd like to hear more opinions from others on this topic. This conversation > seems mostly dominated by comments from myself, Wes and Antoine. > > I think it is reasonable to argue that keeping any ABI (or header/struct > pattern) as narrow

[jira] [Created] (ARROW-6776) [Python] Need a lite version of pyarrow

2019-10-02 Thread Haowei Yu (Jira)
Haowei Yu created ARROW-6776: Summary: [Python] Need a lite version of pyarrow Key: ARROW-6776 URL: https://issues.apache.org/jira/browse/ARROW-6776 Project: Apache Arrow Issue Type: Improvement

Re: [DISCUSS] C-level in-process array protocol

2019-10-02 Thread Jacques Nadeau
I'd like to hear more opinions from others on this topic. This conversation seems mostly dominated by comments from myself, Wes and Antoine. I think it is reasonable to argue that keeping any ABI (or header/struct pattern) as narrow as possible would allow us to minimize overlap with the existing

Re: [VOTE] Release Apache Arrow 0.15.0 - RC2

2019-10-02 Thread Bryan Cutler
+1 (non-binding) I ran the following on Ubuntu 16.04 4.15.0-64-generic: > dev/release/verify-release-candidate.sh binaries 0.15.0 2 > ARROW_CUDA=OFF \ TEST_DEFAULT=0 \ TEST_SOURCE=1 \ TEST_CPP=1 \ TEST_PYTHON=1 \ TEST_JAVA=1 \ TEST_INTEGRATION=1 \ dev/release/verify-release-candidate.sh source

Re: [VOTE] Release Apache Arrow 0.15.0 - RC2

2019-10-02 Thread Bryan Cutler
Accidentally sent too soon. The ORC build error I got was probably just an env issue for me, but here it is in case anyone else had the same issue: In file included from

[jira] [Created] (ARROW-6775) Proposal for several Array utility functions

2019-10-02 Thread Zhuo Peng (Jira)
Zhuo Peng created ARROW-6775: Summary: Proposal for several Array utility functions Key: ARROW-6775 URL: https://issues.apache.org/jira/browse/ARROW-6775 Project: Apache Arrow Issue Type: Wish

[jira] [Created] (ARROW-6774) Reading parquet file is slow

2019-10-02 Thread Adam Lippai (Jira)
Adam Lippai created ARROW-6774: -- Summary: Reading parquet file is slow Key: ARROW-6774 URL: https://issues.apache.org/jira/browse/ARROW-6774 Project: Apache Arrow Issue Type: Improvement

Re: [DISCUSS] Understanding Arrow's CI problems and needs

2019-10-02 Thread Krisztián Szűcs
The current document greatly summarizes the current situation, but in order to properly compare and eventually select a solution we need a a detailed list of explicit features with some sort of classification, like should/must have. For example our future CI system must support "PRs from forks".

[jira] [Created] (ARROW-6773) [C++] Filter kernel returns invalid data when filtering with an Array slice

2019-10-02 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-6773: -- Summary: [C++] Filter kernel returns invalid data when filtering with an Array slice Key: ARROW-6773 URL: https://issues.apache.org/jira/browse/ARROW-6773

[jira] [Created] (ARROW-6772) [C++] Add operator== for interfaces with an Equals() method

2019-10-02 Thread Ben Kietzman (Jira)
Ben Kietzman created ARROW-6772: --- Summary: [C++] Add operator== for interfaces with an Equals() method Key: ARROW-6772 URL: https://issues.apache.org/jira/browse/ARROW-6772 Project: Apache Arrow

[DISCUSS] Result vs Status

2019-10-02 Thread Ben Kietzman
The C++ library has two classes which fill mostly the same function. Both Status and Result<> are used to express a recoverable error in lieu of exceptions. Result<> is slightly more ergonomic in C++, but our binding infrastructures assume Status based APIs. >From the discussion in the sync call,

[jira] [Created] (ARROW-6771) [Packaging][Python] Missing pytest dependency from conda and wheel builds

2019-10-02 Thread Krisztian Szucs (Jira)
Krisztian Szucs created ARROW-6771: -- Summary: [Packaging][Python] Missing pytest dependency from conda and wheel builds Key: ARROW-6771 URL: https://issues.apache.org/jira/browse/ARROW-6771 Project:

Arrow sync call October 2 at 12:00 US/Eastern, 16:00 UTC

2019-10-02 Thread Neal Richardson
Hi all, our biweekly call is about to begin at https://meet.google.com/vtm-teks-phx. All are welcome to join. Notes will be sent out to the mailing list afterwards. Neal

Re: [NIGHTLY] Arrow Build Report for Job nightly-2019-10-02-0

2019-10-02 Thread Krisztián Szűcs
pytest-lazy-fixture is a new dependency I've introduced for testing the filesystems. I'm going to update the packaging builds to ensure it's installed. On Wed, Oct 2, 2019 at 4:17 PM Wes McKinney wrote: > A lot of builds seem to have failed due to a pytest-related thing. > Seems likely related

Re: [VOTE] Release Apache Arrow 0.15.0 - RC2

2019-10-02 Thread Andy Grove
+1 (binding) On Mon, Sep 30, 2019 at 11:57 PM Krisztián Szűcs wrote: > Hi, > > I would like to propose the following release candidate (RC2) of Apache > Arrow version 0.15.0. This is a release consiting of 697 > resolved JIRA issues[1]. > > This release candidate is based on commit: >

[jira] [Created] (ARROW-6770) [CI][Travis] Download Minio quietly

2019-10-02 Thread Krisztian Szucs (Jira)
Krisztian Szucs created ARROW-6770: -- Summary: [CI][Travis] Download Minio quietly Key: ARROW-6770 URL: https://issues.apache.org/jira/browse/ARROW-6770 Project: Apache Arrow Issue Type:

[jira] [Created] (ARROW-6769) [C++][Dataset] End to End dataset integration test case

2019-10-02 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-6769: - Summary: [C++][Dataset] End to End dataset integration test case Key: ARROW-6769 URL: https://issues.apache.org/jira/browse/ARROW-6769 Project:

[jira] [Created] (ARROW-6767) [JS] lazily bind batches in scan/scanReverse

2019-10-02 Thread Taylor Baldwin (Jira)
Taylor Baldwin created ARROW-6767: - Summary: [JS] lazily bind batches in scan/scanReverse Key: ARROW-6767 URL: https://issues.apache.org/jira/browse/ARROW-6767 Project: Apache Arrow Issue

Re: [VOTE] Release Apache Arrow 0.15.0 - RC2

2019-10-02 Thread Francois Saint-Jacques
+1 (non binding) Source release verified. ARROW_FLIGHT=OFF due to system protobuf. Binary release verified. Ubuntu 18.04 François On Wed, Oct 2, 2019 at 1:18 AM Micah Kornfield wrote: > > +1 (binding) > > On Debian Stretch I ran: dev/release/verify-release-candidate.sh binaries > 0.15.0 2 and

Re: [NIGHTLY] Arrow Build Report for Job nightly-2019-10-02-0

2019-10-02 Thread Wes McKinney
A lot of builds seem to have failed due to a pytest-related thing. Seems likely related to the 5.2.0 pytest release on 9/29 + pytest -m 'not requires_testing_data' --pyargs pyarrow = test session starts == platform linux -- Python 3.7.3,

[jira] [Created] (ARROW-6766) libarrow_python..dylib does not exist

2019-10-02 Thread Tarek Allam (Jira)
Tarek Allam created ARROW-6766: -- Summary: libarrow_python..dylib does not exist Key: ARROW-6766 URL: https://issues.apache.org/jira/browse/ARROW-6766 Project: Apache Arrow Issue Type: Bug

Re: [DISCUSS] Understanding Arrow's CI problems and needs

2019-10-02 Thread Wes McKinney
I reviewed the document, thanks for putting it together! I think it captures most of the requirements and the challenges that we are currently facing. I think that anyone who is actively contributing to the project or merging pull requests should read this document since this affects all of us.

[NIGHTLY] Arrow Build Report for Job nightly-2019-10-02-0

2019-10-02 Thread Crossbow
Arrow Build Report for Job nightly-2019-10-02-0 All tasks: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-02-0 Failed Tasks: - conda-osx-clang-py36: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-02-0-azure-conda-osx-clang-py36 -

[jira] [Created] (ARROW-6764) [C++] Simplify readahead implementation

2019-10-02 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-6764: - Summary: [C++] Simplify readahead implementation Key: ARROW-6764 URL: https://issues.apache.org/jira/browse/ARROW-6764 Project: Apache Arrow Issue Type:

[jira] [Created] (ARROW-6763) [Python] Parquet s3 tests are skipped because dependencies are not installed

2019-10-02 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-6763: Summary: [Python] Parquet s3 tests are skipped because dependencies are not installed Key: ARROW-6763 URL: https://issues.apache.org/jira/browse/ARROW-6763