Re: [DISCUSS][Java] Design of RLE vector

2019-08-21 Thread Fan Liya
Hi Micah, Thanks for the comments. By storing the run-length ends (partial sum of run-lengths), it provides better support for random access (O(log(n)), at the expense of larger buffer width. Generally, I think this is a better design, so the design should be changed as follows: 2. the data stru

Re: [Discuss][Java] Communicating module maturity

2019-08-21 Thread Micah Kornfield
> The discussion in ARROW-6206 contains some mildly offensive language > directly at the Arrow community, like "arrow is a team that picked up > netty derived off-heap tools naively". Excuse me? I'm trying my best to ignore language that isn't really productive to solving technical problems :) If

Re: [Discuss] Support read/write interleaved dictionaries and batches in IPC stream

2019-08-21 Thread Micah Kornfield
Hi Ji Liu, Thanks for getting the conversation started. I think a few things need to happen: 1. We need to clarify in the specification that not all dictionaries need to be present at the beginning. I plan on creating a PR for discussion that clarifies this point, as well as handling of non-delt

[jira] [Created] (ARROW-6319) [C++] Extract the core of NumericTensor::Value as Tensor::Value

2019-08-21 Thread Kenta Murata (Jira)
Kenta Murata created ARROW-6319: --- Summary: [C++] Extract the core of NumericTensor::Value as Tensor::Value Key: ARROW-6319 URL: https://issues.apache.org/jira/browse/ARROW-6319 Project: Apache Arrow

Re: [DISCUSS][Java] Design of RLE vector

2019-08-21 Thread Micah Kornfield
Hi Liya Fan, Perhaps comment on the original thread? This differs from my proposal in terms on details of encoding. For RLE, I proposed encoding run end indices instead of run-lengths. This allows for sublinear access to elements at the cost of potentially larger bit-widths for the lengths. Th

[jira] [Created] (ARROW-6318) [Integration] Update integration test to use generated binaries to ensure backwards compatibility

2019-08-21 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-6318: -- Summary: [Integration] Update integration test to use generated binaries to ensure backwards compatibility Key: ARROW-6318 URL: https://issues.apache.org/jira/browse/ARROW-631

[jira] [Created] (ARROW-6317) [Javascript]

2019-08-21 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-6317: -- Summary: [Javascript] Key: ARROW-6317 URL: https://issues.apache.org/jira/browse/ARROW-6317 Project: Apache Arrow Issue Type: Sub-task Componen

[jira] [Created] (ARROW-6316) [Go] Make change to ensure flatbuffer reads are aligned

2019-08-21 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-6316: -- Summary: [Go] Make change to ensure flatbuffer reads are aligned Key: ARROW-6316 URL: https://issues.apache.org/jira/browse/ARROW-6316 Project: Apache Arrow

[jira] [Created] (ARROW-6315) [Java] Make change to ensure flatbuffer reads are aligned

2019-08-21 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-6315: -- Summary: [Java] Make change to ensure flatbuffer reads are aligned Key: ARROW-6315 URL: https://issues.apache.org/jira/browse/ARROW-6315 Project: Apache Arrow

[jira] [Created] (ARROW-6314) [C++] Implement alignment to ensure flatbuffer alignemnt.

2019-08-21 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-6314: -- Summary: [C++] Implement alignment to ensure flatbuffer alignemnt. Key: ARROW-6314 URL: https://issues.apache.org/jira/browse/ARROW-6314 Project: Apache Arrow

[jira] [Created] (ARROW-6313) Tracking

2019-08-21 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-6313: -- Summary: Tracking Key: ARROW-6313 URL: https://issues.apache.org/jira/browse/ARROW-6313 Project: Apache Arrow Issue Type: Improvement Reporte

Re: [DISCUSS][Java] Design of RLE vector

2019-08-21 Thread Fan Liya
Hi Wes, Thanks for the good suggestion. It is intended to be sent through IPC. So it should implement FieldVector, not just ValueVector. This can be considered a sub-item of Micah's proposal about compression/decompression. I will spend more time on that discussion. Best, Liya Fan On Wed, Aug 2

[jira] [Created] (ARROW-6312) Declare required Libs.private in arrow.pc package config

2019-08-21 Thread Michael Maguire (Jira)
Michael Maguire created ARROW-6312: -- Summary: Declare required Libs.private in arrow.pc package config Key: ARROW-6312 URL: https://issues.apache.org/jira/browse/ARROW-6312 Project: Apache Arrow

Re: [DISCUSS][Format][C++] Improvement of sparse tensor format and implementation

2019-08-21 Thread Rok Mihevc
Hi, On Mon, Aug 19, 2019 at 11:30 AM Kenta Murata wrote: > (3) Adding SparseCSCIndex > I'd be interested to help with (Python) part of this SparseCSCIndex. Iā€™d appreciate any comments or suggestions. > I missed previous discussion, so this might have already been discussed, but did we ever c

[jira] [Created] (ARROW-6311) [Java] Make ApproxEqualsVisitor accept DiffFunction to make it more flexible

2019-08-21 Thread Ji Liu (Jira)
Ji Liu created ARROW-6311: - Summary: [Java] Make ApproxEqualsVisitor accept DiffFunction to make it more flexible Key: ARROW-6311 URL: https://issues.apache.org/jira/browse/ARROW-6311 Project: Apache Arrow

[jira] [Created] (ARROW-6310) [C++] Write 64-bit integers as strings in JSON integration test files

2019-08-21 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-6310: --- Summary: [C++] Write 64-bit integers as strings in JSON integration test files Key: ARROW-6310 URL: https://issues.apache.org/jira/browse/ARROW-6310 Project: Apache Arr

Re: Arrow sync call tomorrow (August 21) at 12:00 US/Eastern, 16:00 UTC

2019-08-21 Thread Neal Richardson
Attendees: 刘吉 Micah Kornfield Wes McKinney Rok Mihevc Antoine Pitrou Prudhvi Porandla Neal Richardson Discussion: * alignment vote: Wes and Micah discussed implementation and testing forwards and backwards compatibility * 0.15: Alignment issues will be "blockers"; doesn't seem there are any othe

[Discuss] Support read/write interleaved dictionaries and batches in IPC stream

2019-08-21 Thread Ji Liu
Hi all, Recently when we worked on fixing a IPC related bug in both Java/C++ sides[1][2], @emkornfieldfound that the stream reader assumes that all dictionaries are at the start of the stream which is inconsistent with spec[3] which says as long as a record batch doesn't reference a dictionar

Re: [Discuss][Java] Communicating module maturity

2019-08-21 Thread Wes McKinney
hi Micah, I agree that documenting the maturity of components is a good idea. The discussion in ARROW-6206 contains some mildly offensive language directly at the Arrow community, like "arrow is a team that picked up netty derived off-heap tools naively". Excuse me? Documentation aside, I think s

[jira] [Created] (ARROW-6309) [C++] Parquet tests are linked statically

2019-08-21 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-6309: - Summary: [C++] Parquet tests are linked statically Key: ARROW-6309 URL: https://issues.apache.org/jira/browse/ARROW-6309 Project: Apache Arrow Issue Type:

[jira] [Created] (ARROW-6308) [Java] Support write interleaved dictionaries and batches in IPC stream

2019-08-21 Thread Ji Liu (Jira)
Ji Liu created ARROW-6308: - Summary: [Java] Support write interleaved dictionaries and batches in IPC stream Key: ARROW-6308 URL: https://issues.apache.org/jira/browse/ARROW-6308 Project: Apache Arrow

Re: [DISCUSS][Java] Design of RLE vector

2019-08-21 Thread Wes McKinney
hi Liya, Do you intend to be able to send RLE vectors using the IPC protocol? If so, we need to spend some time on Micah's discussion about sparseness and encodings/compression. - Wes On Wed, Aug 21, 2019 at 7:33 AM Fan Liya wrote: > > Dear all, > > RLE (run length encoding) is a widely used en

[jira] [Created] (ARROW-6307) [Java] Provide RLE vector

2019-08-21 Thread Liya Fan (Jira)
Liya Fan created ARROW-6307: --- Summary: [Java] Provide RLE vector Key: ARROW-6307 URL: https://issues.apache.org/jira/browse/ARROW-6307 Project: Apache Arrow Issue Type: New Feature Compon

[DISCUSS][Java] Design of RLE vector

2019-08-21 Thread Fan Liya
Dear all, RLE (run length encoding) is a widely used encoding/decoding technique. Compared with other encoding/decoding techniques, it is easier to work with the encoded data. We want to provide an RLE vector implementation in Arrow. The design details include: 1. RleVector implements ValueVecto

[jira] [Created] (ARROW-6306) [Java] Support stable sort by stable comparators

2019-08-21 Thread Liya Fan (Jira)
Liya Fan created ARROW-6306: --- Summary: [Java] Support stable sort by stable comparators Key: ARROW-6306 URL: https://issues.apache.org/jira/browse/ARROW-6306 Project: Apache Arrow Issue Type: New F

[jira] [Created] (ARROW-6305) [Python] scalar pd.NaT incorrectly parsed in conversion from Python

2019-08-21 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-6305: Summary: [Python] scalar pd.NaT incorrectly parsed in conversion from Python Key: ARROW-6305 URL: https://issues.apache.org/jira/browse/ARROW-6305 Pro

[Discuss][Java] Communicating module maturity

2019-08-21 Thread Micah Kornfield
A recent issue with the JDBC adapter [1] made me realize we aren't doing enough to communicate to consumers the maturity of various modules within arrow. From the issue, it also seems like it is surprising that everything is based off of off-heap data access. To help with this I added a descripti