Re: Timeline for 0.15.0 release

2019-09-24 Thread Andy Grove
I found a last minute issue with DataFusion (Rust) and would appreciate it if we could merge ARROW-6086 (PR is https://github.com/apache/arrow/pull/5494) before cutting the RC. Thanks, Andy. On Tue, Sep 24, 2019 at 6:19 PM Micah Kornfield wrote: > OK, I'm going to postpone cutting a release

[jira] [Created] (ARROW-6683) [Python] Add unit tests that validate cross-compatibility with pyarrow.parquet when fastparquet is installed

2019-09-24 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-6683: --- Summary: [Python] Add unit tests that validate cross-compatibility with pyarrow.parquet when fastparquet is installed Key: ARROW-6683 URL:

[jira] [Created] (ARROW-6682) Arrow Hangs on Large Files (10-12gb)

2019-09-24 Thread Anthony Abate (Jira)
Anthony Abate created ARROW-6682: Summary: Arrow Hangs on Large Files (10-12gb) Key: ARROW-6682 URL: https://issues.apache.org/jira/browse/ARROW-6682 Project: Apache Arrow Issue Type: Bug

[jira] [Created] (ARROW-6681) [C# -> R] - Record Batches in reverse order?

2019-09-24 Thread Anthony Abate (Jira)
Anthony Abate created ARROW-6681: Summary: [C# -> R] - Record Batches in reverse order? Key: ARROW-6681 URL: https://issues.apache.org/jira/browse/ARROW-6681 Project: Apache Arrow Issue

[jira] [Created] (ARROW-6680) [Python] Add Array ctor microbenchmarks

2019-09-24 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-6680: --- Summary: [Python] Add Array ctor microbenchmarks Key: ARROW-6680 URL: https://issues.apache.org/jira/browse/ARROW-6680 Project: Apache Arrow Issue Type:

Re: Timeline for 0.15.0 release

2019-09-24 Thread Micah Kornfield
OK, I'm going to postpone cutting a release until tomorrow (hoping we can issues resolved by then).. I'll also try to review the third-party additions since 14.x. On Tue, Sep 24, 2019 at 4:20 PM Wes McKinney wrote: > I found a licensing issue > >

Re: Timeline for 0.15.0 release

2019-09-24 Thread Wes McKinney
I found a licensing issue https://issues.apache.org/jira/browse/ARROW-6679 It might be worth examining third party code added to the project since 0.14.x to make sure there are no other such issues. On Tue, Sep 24, 2019 at 6:10 PM Wes McKinney wrote: > > I have diagnosed the problem (Thrift

[jira] [Created] (ARROW-6679) [RELEASE] autobrew license in LICENSE.txt is not acceptable

2019-09-24 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-6679: --- Summary: [RELEASE] autobrew license in LICENSE.txt is not acceptable Key: ARROW-6679 URL: https://issues.apache.org/jira/browse/ARROW-6679 Project: Apache Arrow

Re: Timeline for 0.15.0 release

2019-09-24 Thread Wes McKinney
I have diagnosed the problem (Thrift "string" data must be UTF-8, cannot be arbitrary binary) and am working on a patch right now On Tue, Sep 24, 2019 at 6:02 PM Wes McKinney wrote: > > I just opened > > https://issues.apache.org/jira/browse/ARROW-6678 > > Please don't cut an RC until I have an

Re: Timeline for 0.15.0 release

2019-09-24 Thread Wes McKinney
I just opened https://issues.apache.org/jira/browse/ARROW-6678 Please don't cut an RC until I have an opportunity to diagnose this, will report back. On Tue, Sep 24, 2019 at 5:51 PM Wes McKinney wrote: > > I'm investigating a possible Parquet-related compatibility bug that I > encountered

[jira] [Created] (ARROW-6678) [C++] Regression in Parquet file compatibility introduced by ARROW-3246

2019-09-24 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-6678: --- Summary: [C++] Regression in Parquet file compatibility introduced by ARROW-3246 Key: ARROW-6678 URL: https://issues.apache.org/jira/browse/ARROW-6678 Project: Apache

Re: Timeline for 0.15.0 release

2019-09-24 Thread Wes McKinney
I'm investigating a possible Parquet-related compatibility bug that I encountered through some routine testing / benchmarking. I'll report back once I figure out what is going on (if anything) On Sun, Sep 22, 2019 at 11:51 PM Micah Kornfield wrote: >> >> It's ideal if your GPG key is in the web

Re: Parquet file reading performance

2019-09-24 Thread Maarten Ballintijn
Hi, The code to show the performance issue with DateTimeIndex is at: https://gist.github.com/maartenb/256556bcd6d7c7636d400f3b464db18c It shows three case 0) int index, 1) datetime index, 2) date time index created in a slightly roundabout way I’m a little confused by the two

Re: [NIGHTLY] Arrow Build Report for Job nightly-2019-09-24-0

2019-09-24 Thread Bryan Cutler
I'm able to pass Spark integration tests locally with the build patch from https://github.com/apache/arrow/pull/5465, so I'm reasonably confident all the issues have been resolved and it's just flaky timeouts now. We are trying some things to fix the timeouts, but nothing to hold up the release

[jira] [Created] (ARROW-6677) [FlightRPC][C++] Document using Flight in C++

2019-09-24 Thread lidavidm (Jira)
lidavidm created ARROW-6677: --- Summary: [FlightRPC][C++] Document using Flight in C++ Key: ARROW-6677 URL: https://issues.apache.org/jira/browse/ARROW-6677 Project: Apache Arrow Issue Type: Bug

[jira] [Created] (ARROW-6676) [C++] [Parquet] Refactor encoding/decoding APIs for clarity

2019-09-24 Thread Benjamin Kietzman (Jira)
Benjamin Kietzman created ARROW-6676: Summary: [C++] [Parquet] Refactor encoding/decoding APIs for clarity Key: ARROW-6676 URL: https://issues.apache.org/jira/browse/ARROW-6676 Project: Apache

[jira] [Created] (ARROW-6675) Add scanReverse function

2019-09-24 Thread Malcolm MacLachlan (Jira)
Malcolm MacLachlan created ARROW-6675: - Summary: Add scanReverse function Key: ARROW-6675 URL: https://issues.apache.org/jira/browse/ARROW-6675 Project: Apache Arrow Issue Type: New

Re: [NIGHTLY] Arrow Build Report for Job nightly-2019-09-24-0

2019-09-24 Thread Micah Kornfield
Hi Wes, Thanks, that makes sense, I'll pick a commit in a little bit to get started with. Somehow I thought we had done so in the past. Thanks, Micah On Tue, Sep 24, 2019 at 7:59 AM Wes McKinney wrote: > hi Micah -- we should not stop merging PRs. That's been our policy > with past releases.

Re: Parquet file reading performance

2019-09-24 Thread Wes McKinney
hi On Tue, Sep 24, 2019 at 9:26 AM Maarten Ballintijn wrote: > > Hi Wes, > > Thanks for your quick response. > > Yes, we’re using Python 3.7.4, from miniconda and conda-forge, and: > > numpy: 1.16.5 > pandas: 0.25.1 > pyarrow: 0.14.1 > > It looks like 0.15 is close, so

Re: [NIGHTLY] Arrow Build Report for Job nightly-2019-09-24-0

2019-09-24 Thread Micah Kornfield
OK at least Spark and Wheel builds look like they might just be flaky timeouts. I agree with Fuzzit not being a blocker. Are there any other blockers I should be aware of? Otherwise, I will try to start the build process later today. On Tue, Sep 24, 2019 at 8:33 AM Antoine Pitrou wrote: > >

Re: [NIGHTLY] Arrow Build Report for Job nightly-2019-09-24-0

2019-09-24 Thread Micah Kornfield
Have the failures already been fixed (i.e. is this a timing issue?). If not could people chime in if they are looking at some of them? I assume these are blockers until 0.15.0? If people are OK with it, it might make sense to stop merging non-blocking PRs until 0.15.0 is out the door.

Re: Parquet file reading performance

2019-09-24 Thread Maarten Ballintijn
Hi Wes, Thanks for your quick response. Yes, we’re using Python 3.7.4, from miniconda and conda-forge, and: numpy: 1.16.5 pandas: 0.25.1 pyarrow: 0.14.1 It looks like 0.15 is close, so I can wait for that. Theoretically I see three components driving the

[jira] [Created] (ARROW-6673) [Python] Consider separating libarrow.pxd into multiple definition files

2019-09-24 Thread Krisztian Szucs (Jira)
Krisztian Szucs created ARROW-6673: -- Summary: [Python] Consider separating libarrow.pxd into multiple definition files Key: ARROW-6673 URL: https://issues.apache.org/jira/browse/ARROW-6673 Project:

[NIGHTLY] Arrow Build Report for Job nightly-2019-09-24-0

2019-09-24 Thread Crossbow
Arrow Build Report for Job nightly-2019-09-24-0 All tasks: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-24-0 Failed Tasks: - docker-cpp-fuzzit: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-24-0-circle-docker-cpp-fuzzit -

Re: [DISCUSS][Java] Design of the algorithm module

2019-09-24 Thread Fan Liya
Hi Micah, Thanks for your effort and precious time. Looking forward to receiving more valuable feedback from you. Best, Liya Fan On Tue, Sep 24, 2019 at 2:12 PM Micah Kornfield wrote: > Hi Liya Fan, > I started reviewing but haven't gotten all the way through it. I will try > to leave more

[jira] [Created] (ARROW-6672) [Java] Extract a common interface for dictionary builders

2019-09-24 Thread Liya Fan (Jira)
Liya Fan created ARROW-6672: --- Summary: [Java] Extract a common interface for dictionary builders Key: ARROW-6672 URL: https://issues.apache.org/jira/browse/ARROW-6672 Project: Apache Arrow Issue

[jira] [Created] (ARROW-6671) [C++] Sparse tensor naming

2019-09-24 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-6671: - Summary: [C++] Sparse tensor naming Key: ARROW-6671 URL: https://issues.apache.org/jira/browse/ARROW-6671 Project: Apache Arrow Issue Type: Wish

Re: [DISCUSS][Java] Design of the algorithm module

2019-09-24 Thread Micah Kornfield
Hi Liya Fan, I started reviewing but haven't gotten all the way through it. I will try to leave more comments over the next few days. Thanks again for the write-up I think it will help frame a productive conversation. -Micah On Tue, Sep 17, 2019 at 1:47 AM Fan Liya wrote: > Hi Micah, > >

Re: [Discuss] [Java] DateMilliVector.getObject() return type (LocalDateTime vs LocalDate)

2019-09-24 Thread Micah Kornfield
Hi David, Is the suggestion to add something like a LocalDate getDate method? Thanks, Micah On Tue, Sep 17, 2019 at 7:39 AM David Li wrote: > Maybe a utility method to get a date instead of a datetime at least would > be useful? And/or documentation of the fact that the default behavior is >