[jira] [Created] (ARROW-6196) [Ruby] Add support for building Arrow::TimeNNArray by .new

2019-08-10 Thread Sutou Kouhei (JIRA)
Sutou Kouhei created ARROW-6196: --- Summary: [Ruby] Add support for building Arrow::TimeNNArray by .new Key: ARROW-6196 URL: https://issues.apache.org/jira/browse/ARROW-6196 Project: Apache Arrow

Re: [Discuss][Java] 64-bit lengths for ValueVectors

2019-08-10 Thread Micah Kornfield
Hi Jacques, What avenue were you thinking for supporting both paths? I didn't want to pursue a different class hierarchy, because I felt like that would effectively fork the code base, but that is potentially an option that would allow us to have a complete reference implementation in Java that

Re: [Format] Semantics for dictionary batches in streams

2019-08-10 Thread Micah Kornfield
Reading data from two different parquet files sequentially with different dictionaries for the same column. This could be handled by re-encoding data but that seems potentially sub-optimal. On Sat, Aug 10, 2019 at 12:38 PM Jacques Nadeau wrote: > What situation are anticipating where you're

[jira] [Created] (ARROW-6194) [Java] Make DictionaryEncoder non-static making it easy to extend and reuse

2019-08-10 Thread Ji Liu (JIRA)
Ji Liu created ARROW-6194: - Summary: [Java] Make DictionaryEncoder non-static making it easy to extend and reuse Key: ARROW-6194 URL: https://issues.apache.org/jira/browse/ARROW-6194 Project: Apache Arrow

Re: [Format] Semantics for dictionary batches in streams

2019-08-10 Thread Jacques Nadeau
What situation are anticipating where you're going to be restating ids mid stream? On Sat, Aug 10, 2019 at 12:13 AM Micah Kornfield wrote: > The IPC specification [1] defines behavior when isDelta on a > DictionaryBatch [2] is "true". I might have missed it in the > specification, but I

[jira] [Created] (ARROW-6197) [GLib] Add garrow_decimal128_rescale()

2019-08-10 Thread Sutou Kouhei (JIRA)
Sutou Kouhei created ARROW-6197: --- Summary: [GLib] Add garrow_decimal128_rescale() Key: ARROW-6197 URL: https://issues.apache.org/jira/browse/ARROW-6197 Project: Apache Arrow Issue Type: New

Re: [Format] Semantics for dictionary batches in streams

2019-08-10 Thread Micah Kornfield
I should add that Option #1 above would be my preference, even though it adds some complications (especially for the file format). On Sat, Aug 10, 2019 at 12:12 AM Micah Kornfield wrote: > The IPC specification [1] defines behavior when isDelta on a > DictionaryBatch [2] is "true". I might

[jira] [Created] (ARROW-6195) [C++] CMake fails with file not found error while bundling thrift if python is not installed

2019-08-10 Thread Omer Ozarslan (JIRA)
Omer Ozarslan created ARROW-6195: Summary: [C++] CMake fails with file not found error while bundling thrift if python is not installed Key: ARROW-6195 URL: https://issues.apache.org/jira/browse/ARROW-6195

Re: [Discuss][Java] 64-bit lengths for ValueVectors

2019-08-10 Thread Jacques Nadeau
This is a pretty massive change to the apis. I wonder how nasty it would be to just support both paths. Have you evaluated how complex that would be? On Wed, Aug 7, 2019 at 11:08 PM Micah Kornfield wrote: > After more investigation, it looks like Float8Benchmarks at least on my > machine are

[Format] Semantics for dictionary batches in streams

2019-08-10 Thread Micah Kornfield
The IPC specification [1] defines behavior when isDelta on a DictionaryBatch [2] is "true". I might have missed it in the specification, but I couldn't find the interpretation for what the expected behavior is when isDelta=false and and two dictionary batches with the same ID are sent. It

Re: [Java] Arrow PR queue build up?

2019-08-10 Thread Ji Liu
Hi, Jacques, thanks for your valuable feedback. Sorry for the lack of discuss. Some of these PRs are small change/bugfix which not deserving a discuss. You are right, some PRs are more complex than we thought before in the review process, making a discuss on ML/JIRA would actually help. This

Re: [Discuss][Java] 64-bit lengths for ValueVectors

2019-08-10 Thread Jacques Nadeau
Hey Micah, I didn't have a particular path in mind. Was thinking more along the lines of extra methods as opposed to separate classes. Arrow hasn't historically been a place where we're writing algorithms in Java so the fact that they aren't there doesn't mean they don't exist. We have a large

[DISCUSS][JAVA] Make FixedSizedListVector inherit from ListVector

2019-08-10 Thread Ji Liu
Hi, all While working on the issue to implement dictionary-encoded subfields[1] [2], I found FixedSizeListVector not extends ListVector(Thanks Micah pointing this out and curious why implemented FixedSizeListVector this way before). Since FixedSizeListVector is a specific case of ListVector,

Re: [DISCUSS][JAVA] Make FixedSizedListVector inherit from ListVector

2019-08-10 Thread Micah Kornfield
Hi Ji Liu, I think have a common interface/base-class for the two makes sense (but don't have historical context) from a reading data perspective. I think the change would need to be something above BaseRepeatedValueVector, since the FixedSizeListVector doesn't contain an offset buffer, and that

Re: [Discuss][Java] 64-bit lengths for ValueVectors

2019-08-10 Thread Micah Kornfield
Hi Jacques, I definitely understand these concerns and this change is risky because it is so large. Perhaps, creating a new hierarchy, might be the cleanest way of dealing with this. This could have other benefits like cleaning up some cruft around dictionary encode and "orphaned" method. Per

Re: [DISCUSS][JAVA] Make FixedSizedListVector inherit from ListVector

2019-08-10 Thread Ji Liu
Hi Micah, thanks for your suggestion. You are right, the mainly difference between FixSizedListVector and ListVector is the offsetBuffer, but I think this could be avoided through allocateNewSafe() overwrite which calls allocateOffsetBuffer() in BaseRepeatedValueVector, in this way,