Re: [Format] Semantics for dictionary batches in streams

2019-08-11 Thread Micah Kornfield
I'm not sure what you mean by record-in-dictionary-id, so it is possible this is a solution that I just don't understand :) The only two references to dictionary IDs that I could find, are one in schema.fbs [1] which is attached a column in a schema and the one referenced above in

Re: [Discuss][Java] 64-bit lengths for ValueVectors

2019-08-11 Thread Micah Kornfield
Hi Wes and Jacques, See responses below. With regards to the reference implementation point. It is a good point. I'm > on vacation this week. Unless you're pushing hard on this, can we pick this > up and discuss more next week? Sure thing, enjoy your vacation. I think the only practical

[jira] [Created] (ARROW-6207) [Java] Incorporate jmh benchmarks into archery

2019-08-11 Thread Micah Kornfield (JIRA)
Micah Kornfield created ARROW-6207: -- Summary: [Java] Incorporate jmh benchmarks into archery Key: ARROW-6207 URL: https://issues.apache.org/jira/browse/ARROW-6207 Project: Apache Arrow

[jira] [Created] (ARROW-6206) [Java][Docs] Document environment variables/java properties

2019-08-11 Thread Micah Kornfield (JIRA)
Micah Kornfield created ARROW-6206: -- Summary: [Java][Docs] Document environment variables/java properties Key: ARROW-6206 URL: https://issues.apache.org/jira/browse/ARROW-6206 Project: Apache Arrow

Re: [ANNOUNCE] New Arrow PMC member: Micah Kornfield

2019-08-11 Thread Micah Kornfield
Thanks everyone for the good wishes! On Fri, Aug 9, 2019 at 5:41 PM Fan Liya wrote: > Big congratulations! Micah > Thank you so much for all the help! > > Best, > Liya Fan > > On Saturday, August 10, 2019, Brian Hulette wrote: > > Congratulations Micah! Well deserved :) > > > > On Fri, Aug 9,

[jira] [Created] (ARROW-6200) [Java] Method getBufferSizeFor in BaseRepeatedValueVector/ListVector not correct

2019-08-11 Thread Ji Liu (JIRA)
Ji Liu created ARROW-6200: - Summary: [Java] Method getBufferSizeFor in BaseRepeatedValueVector/ListVector not correct Key: ARROW-6200 URL: https://issues.apache.org/jira/browse/ARROW-6200 Project: Apache

[Discussion][Java] Redesign the dictionary encoder

2019-08-11 Thread Fan Liya
Dear all, Dictionary encoding is an important feature, so it should be implemented with good performance. The current Java dictionary encoder implementation is based on static utility methods in org.apache.arrow.vector.dictionary.DictionaryEncoder, which has heavy performance overhead, preventing

[jira] [Created] (ARROW-6199) [Java] Avro adapter avoid potential resource leak.

2019-08-11 Thread Ji Liu (JIRA)
Ji Liu created ARROW-6199: - Summary: [Java] Avro adapter avoid potential resource leak. Key: ARROW-6199 URL: https://issues.apache.org/jira/browse/ARROW-6199 Project: Apache Arrow Issue Type: Bug

Re: [DISCUSS][JAVA] Make FixedSizedListVector inherit from ListVector

2019-08-11 Thread Jacques Nadeau
We tried to get away from this kind of back and forth with subclassing as much as possible. (call getObject on base class which then calls getIndex on child class which then calls something else on base class). I haven't looked through the code but let's try to avoid having complex call paths for

Re: [DISCUSS][JAVA] Make FixedSizedListVector inherit from ListVector

2019-08-11 Thread Ji Liu
Thanks Jacques, to avoid complex call paths for getObject, should keep getObject for both classes. I'll also checked for other methods. Thanks, Ji Liu -- From:Jacques Nadeau Send Time:2019年8月11日(星期日) 21:43 To:dev ; Ji Liu

Re: Proposal to move website source to arrow-site, add automatic builds

2019-08-11 Thread Wes McKinney
It looks like the git pruning is done. So we can remove the site/ directory from the main repository at some point soon. On Thu, Aug 8, 2019 at 2:29 PM Neal Richardson wrote: > > I need a committer to make a master branch on arrow-site so that I can > PR to it. I thought it could be just an

Re: [Discuss][Java] 64-bit lengths for ValueVectors

2019-08-11 Thread Wes McKinney
My stance on this is that I don't know how important it is for Java to support vectors over INT32_MAX elements. The use cases enabled by having very large arrays seem to be concentrated in the native code world (e.g. C/C++/Rust) -- that could just be implementation-centrism on my part, though.

[jira] [Created] (ARROW-6201) [Python] Add pyarrow.read_schema to API documentation, add prose documentation for schema serialization workflow

2019-08-11 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-6201: --- Summary: [Python] Add pyarrow.read_schema to API documentation, add prose documentation for schema serialization workflow Key: ARROW-6201 URL:

[jira] [Created] (ARROW-6202) Exception in thread "main" org.apache.arrow.memory.OutOfMemoryException: Unable to allocate buffer of size 4 due to memory limit. Current allocation: 2147483646

2019-08-11 Thread Jim Northrup (JIRA)
Jim Northrup created ARROW-6202: --- Summary: Exception in thread "main" org.apache.arrow.memory.OutOfMemoryException: Unable to allocate buffer of size 4 due to memory limit. Current allocation: 2147483646 Key: ARROW-6202

[jira] [Created] (ARROW-6203) [GLib] Add garrow_array_argsort()

2019-08-11 Thread Yosuke Shiro (JIRA)
Yosuke Shiro created ARROW-6203: --- Summary: [GLib] Add garrow_array_argsort() Key: ARROW-6203 URL: https://issues.apache.org/jira/browse/ARROW-6203 Project: Apache Arrow Issue Type: New Feature

[jira] [Created] (ARROW-6204) [GLib] Add garrow_array_is_in_chunked_array()

2019-08-11 Thread Yosuke Shiro (JIRA)
Yosuke Shiro created ARROW-6204: --- Summary: [GLib] Add garrow_array_is_in_chunked_array() Key: ARROW-6204 URL: https://issues.apache.org/jira/browse/ARROW-6204 Project: Apache Arrow Issue Type:

[jira] [Created] (ARROW-6205) ARROW_DEPRECATED warning when including io/interfaces.h from CUDA (.cu) source

2019-08-11 Thread Mark Harris (JIRA)
Mark Harris created ARROW-6205: -- Summary: ARROW_DEPRECATED warning when including io/interfaces.h from CUDA (.cu) source Key: ARROW-6205 URL: https://issues.apache.org/jira/browse/ARROW-6205 Project:

Re: [Discuss][Java] 64-bit lengths for ValueVectors

2019-08-11 Thread Jacques Nadeau
Hey Micah, Appreciate the offer on the compiling. The reality is I'm more concerned about the unknowns than the compiling issue itself. Any time you've been tuning for a while, changing something like this could be totally fine or cause a couple of major issues. For example, we've done a very

Re: [Format] Semantics for dictionary batches in streams

2019-08-11 Thread Jacques Nadeau
Wow, you've shown how little I've thought about Arrow dictionaries for a while. I thought we had a dictionary id and a record-in-dictionary-id. Wouldn't that approach make more sense? Does no one do this today? (We frequently use compound values for this type of scenario...) On Sat, Aug 10, 2019