Re: [Discuss] Format additions to Arrow for sparse data and data integrity

2019-07-05 Thread Micah Kornfield
Hi Jacques, I think our e-mails might have crossed, so I'm consolidating my responses from the previous e-mail as well. I don't think most of this should be targeted for 1.0. It is a lot of > change/enhancement and seems like it would likely substantially delay 1.0. I agree it shouldn't block

[jira] [Created] (ARROW-5866) [C++] Remove duplicate library in cpp/Brewfile

2019-07-05 Thread Yosuke Shiro (JIRA)
Yosuke Shiro created ARROW-5866: --- Summary: [C++] Remove duplicate library in cpp/Brewfile Key: ARROW-5866 URL: https://issues.apache.org/jira/browse/ARROW-5866 Project: Apache Arrow Issue

Re: [Discuss][Java] Make the semantics of lastSet consistent

2019-07-05 Thread Jacques Nadeau
Ravindra, Praveen and Prudhvi, can you confirm the ramifications of this change and what impact this inconsistency has had downstream? On Thu, Jul 4, 2019 at 7:32 PM Fan Liya wrote: > There are two lastSet member variables in the code. One is in > BaseVariableWidthVector and the other is in

Re: [Discuss] Format additions to Arrow for sparse data and data integrity

2019-07-05 Thread Jacques Nadeau
One question and a random thought: What is the driving force for transport compression? Are you seeing that as a major bottleneck in particular circumstances? (I'm not disagreeing, just want to clearly define the particular problem you're worried about.) Random thought: what do you think of

Re: [Discuss] Format additions to Arrow for sparse data and data integrity

2019-07-05 Thread Micah Kornfield
Hi Jacques, Thanks for the quick response. I don't think most of this should be targeted for 1.0. It is a lot of > change/enhancement and seems like it would likely substantially delay 1.0. I agree it shouldn't block 1.0. I think time based releases are working well for the community.But

Re: [Discuss] Format additions to Arrow for sparse data and data integrity

2019-07-05 Thread Micah Kornfield
Strange, I've pasted the contents into a google document at [1] [1] https://docs.google.com/document/d/1uJzWh63Iqk7FRbElHPhHrsmlfe0NIJ6M8-0kejPmwIw/edit On Fri, Jul 5, 2019 at 12:32 PM Jacques Nadeau wrote: > Hey Micah, you're formatting seems to be messed up on this mail. Some kind > of

Re: [Discuss] Format additions to Arrow for sparse data and data integrity

2019-07-05 Thread Jacques Nadeau
Initial thought: I don't think most of this should be targeted for 1.0. It is a lot of change/enhancement and seems like it would likely substantially delay 1.0. The one piece that seems least disruptive would be basic on the wire compression. You suggested that this be done on the buffer level

Re: [Discuss] Format additions to Arrow for sparse data and data integrity

2019-07-05 Thread Jacques Nadeau
Hey Micah, you're formatting seems to be messed up on this mail. Some kind of copy/paste error? On Fri, Jul 5, 2019 at 11:54 AM Micah Kornfield wrote: > Hi Arrow-dev, > > I’d like to make a straw-man proposal to cover some features that I think > would be useful to Arrow, and that I would like

[Discuss] Format additions to Arrow for sparse data and data integrity

2019-07-05 Thread Micah Kornfield
Hi Arrow-dev, I’d like to make a straw-man proposal to cover some features that I think would be useful to Arrow, and that I would like to make a proof-of-concept implementation for in Java and C++. In particular, the proposal covers allowing for smaller data sizes via compression and encoding

[jira] [Created] (ARROW-5865) [Release] Helper script for rebasing open pull requests on master

2019-07-05 Thread Micah Kornfield (JIRA)
Micah Kornfield created ARROW-5865: -- Summary: [Release] Helper script for rebasing open pull requests on master Key: ARROW-5865 URL: https://issues.apache.org/jira/browse/ARROW-5865 Project: Apache

flatbuffers vectors and --gen-object-api

2019-07-05 Thread John Muehlhausen
It seems as if Arrow expects for some vectors to be empty rather than null. (Examples: Footer.dictionaries, Field.children) Anyone using --gen-object-api with flatc will get code that writes null when (e.g.) _o->children.size() is zero in CreateField(). I may be missing something but I don't

Re: [Discuss] Streaming: Differentiate between length of RecordBatch and utilized portion-- common use-case?

2019-07-05 Thread John Muehlhausen
This seems to help... still testing it though. Status GetFieldMetadata(int field_index, ArrayData* out) { auto nodes = metadata_->nodes(); // pop off a field if (field_index >= static_cast(nodes->size())) { return Status::Invalid("Ran out of field metadata, likely malformed");

Re: [Discuss] Streaming: Differentiate between length of RecordBatch and utilized portion-- common use-case?

2019-07-05 Thread John Muehlhausen
So far it seems as if pyarrow is completely ignoring the RecordBatch.length field. More info to follow... On Tue, Jul 2, 2019 at 3:02 PM John Muehlhausen wrote: > Crikey! I'll do some testing around that and suggest some test cases to > ensure it continues to work, assuming that it does. > >

[jira] [Created] (ARROW-5864) [Python] simplify cython wrapping of Result

2019-07-05 Thread Joris Van den Bossche (JIRA)
Joris Van den Bossche created ARROW-5864: Summary: [Python] simplify cython wrapping of Result Key: ARROW-5864 URL: https://issues.apache.org/jira/browse/ARROW-5864 Project: Apache Arrow

[jira] [Created] (ARROW-5861) [Java] Initial implement to convert Avro record with primitive types

2019-07-05 Thread Ji Liu (JIRA)
Ji Liu created ARROW-5861: - Summary: [Java] Initial implement to convert Avro record with primitive types Key: ARROW-5861 URL: https://issues.apache.org/jira/browse/ARROW-5861 Project: Apache Arrow

Re: [RESULT][VOTE] Release Apache Arrow 0.14.0 - RC0

2019-07-05 Thread Micah Kornfield
OK, I wrote a quick script (I'll clean it up and send it out a PR tomorrow) and rebased everything that could be done so cleanly. What do we generally do about PRs that don't rebase cleanly? Thanks, Micah On Fri, Jul 5, 2019 at 1:29 AM Krisztián Szűcs wrote: > I prefer to use hub [1] to

Re: linking 3rd party cython modules against pyarrow fails since 0.14.0

2019-07-05 Thread Antoine Pitrou
That's quite likely indeed. A bit worrying is that this should have been caught by our unit tests. Regards Antoine. Le 05/07/2019 à 10:02, Weston Steimel a écrit : > Hello, > > I wonder if perhaps that may be due to the work done for reducing the wheel > size in

Re: [RESULT][VOTE] Release Apache Arrow 0.14.0 - RC0

2019-07-05 Thread Sutou Kouhei
We did this by hand in the past releases. It may be better that we have a script to do this. In "Re: [RESULT][VOTE] Release Apache Arrow 0.14.0 - RC0" on Fri, 5 Jul 2019 01:16:42 -0700, Micah Kornfield wrote: > Thanks. Is there a script to do this or is it typically just done by hand? >

Re: [RESULT][VOTE] Release Apache Arrow 0.14.0 - RC0

2019-07-05 Thread Micah Kornfield
Thanks. Is there a script to do this or is it typically just done by hand? On Fri, Jul 5, 2019 at 1:12 AM Sutou Kouhei wrote: > Hi Micah, > > Thanks for helping this. > > Sorry for my bad description of the task. > > > e.g. run: > > > > "./dev/release/post-00-rebase.sh

Re: [RESULT][VOTE] Release Apache Arrow 0.14.0 - RC0

2019-07-05 Thread Sutou Kouhei
Hi Micah, Thanks for helping this. Sorry for my bad description of the task. > e.g. run: > > "./dev/release/post-00-rebase.sh apache-arrow-0.14.0-rc0"? I've already done this: >>> Done: >>> >>> * Rebasing the master branch on local release branch >>> >>>

Re: [RESULT][VOTE] Release Apache Arrow 0.14.0 - RC0

2019-07-05 Thread Krisztián Szűcs
Hey Micah, Kou has already rebased the master branch of apache/arrow. So if you want to rebase PRs, then you should rebase on top of apache/arrow@master. On Fri, Jul 5, 2019 at 10:01 AM Micah Kornfield wrote: > Actually, can someone clarify is the correct approach here to clone the > @Kou's

Re: linking 3rd party cython modules against pyarrow fails since 0.14.0

2019-07-05 Thread Weston Steimel
Hello, I wonder if perhaps that may be due to the work done for reducing the wheel size in https://issues.apache.org/jira/browse/ARROW-5082? On Thu, Jul 4, 2019 at 10:06 PM Stestagg wrote: > 1) pip install pyarrow==0.14.0 > 2) All the pyarrow files including, for example libarrow.so.14, but

Re: [RESULT][VOTE] Release Apache Arrow 0.14.0 - RC0

2019-07-05 Thread Micah Kornfield
Actually, can someone clarify is the correct approach here to clone the @Kou's repo and use his RC0 branch to do the rebase? e.g. run: "./dev/release/post-00-rebase.sh apache-arrow-0.14.0-rc0"? Thanks, Micah On Fri, Jul 5, 2019 at 12:38 AM Micah Kornfield wrote: > * All pull requests need

Re: [RESULT][VOTE] Release Apache Arrow 0.14.0 - RC0

2019-07-05 Thread Micah Kornfield
> > * All pull requests need to rebase on master by > "Rebasing the master branch on local release branch" Since it doesn't look like its been claimed i'll do it. On Thu, Jul 4, 2019 at 12:46 AM Sutou Kouhei wrote: > Hi, > > I need your help! > Could Rust developers see "Failed:" section?