[jira] [Created] (ARROW-5901) [Rust] Implement PartialEq to compare array and json values

2019-07-10 Thread Renjie Liu (JIRA)
Renjie Liu created ARROW-5901: - Summary: [Rust] Implement PartialEq to compare array and json values Key: ARROW-5901 URL: https://issues.apache.org/jira/browse/ARROW-5901 Project: Apache Arrow

[jira] [Created] (ARROW-5902) [Java] Implement HashTable for dictionary encoding

2019-07-10 Thread Ji Liu (JIRA)
Ji Liu created ARROW-5902: - Summary: [Java] Implement HashTable for dictionary encoding Key: ARROW-5902 URL: https://issues.apache.org/jira/browse/ARROW-5902 Project: Apache Arrow Issue Type: New

Re: [Discuss] IPC Specification, flatbuffers and unaligned memory accesses

2019-07-10 Thread Wes McKinney
The issue is fairly esoteric, so it will probably take more time to collect feedback. I could create a C++ implementation of this if it helps with the process. On Wed, Jul 10, 2019 at 2:25 AM Micah Kornfield wrote: > > Does anybody else have thoughts on this? Other language contributors? > >

Re: [Discuss] Compatibility Guarantees and Versioning Post "1.0.0"

2019-07-10 Thread Wes McKinney
On Wed, Jul 10, 2019 at 12:43 AM Micah Kornfield wrote: > > Hi Eric, > Short answer: I think your understanding matches what I was proposing. > Longer answer below. > >> So, for example, we release library v1.0.0 in a few months and then library >> v2.0.0 a few months after that. In v2.0.0,

Arrow biweekly sync call today at 12pm US/Eastern / 16:00 UTC

2019-07-10 Thread Wes McKinney
All are welcome at https://meet.google.com/vtm-teks-phx

Re: [DISCUSS] Release cadence and release vote conventions

2019-07-10 Thread Wes McKinney
On Sun, Jul 7, 2019 at 7:40 PM Sutou Kouhei wrote: > > Hi, > > > in future releases we should > > institute a minimum 24-hour "quiet period" after any community > > feedback on a release candidate to allow issues to be examined > > further. > > I agree with this. I'll do so when I do a release

[Discuss] Are Union.typeIds worth keeping?

2019-07-10 Thread Ben Kietzman
The Union.typeIds property is confusing and its utility is unclear. I'd like to remove it (or at least document it better). Unless anyone knows a real advantage for keeping it I plan to assemble a PR to drop it from the format and the C++ implementation. ARROW-257 ( resolved by pull request

Re: Arrow biweekly sync call today at 12pm US/Eastern / 16:00 UTC

2019-07-10 Thread Neal Richardson
Attendees: Hatem Helal Uwe Korn Micah Kornfield Wes McKinney Prudhvi Porandla Neal Richardson Krisztián Szűcs Topics discussed: Issues with 0.14: * Python manylinux2010 wheels broken, runtime dependency on lz4: fixed in master, bad wheels removed * Python macOS wheels have runtime dependency on

Re: Spark and Arrow Flight

2019-07-10 Thread Wes McKinney
Of course, it might make just as much sense in Apache Spark. Probably worth bringing up with that community, too On Wed, Jul 10, 2019 at 12:37 PM Wes McKinney wrote: > > hi Ryan -- I was thinking that this might be built separately from the > main Java project. We don't have a model in the

Re: Spark and Arrow Flight

2019-07-10 Thread Ryan Murray
Hey Wes, Would be happy to! Jacques and I had originally thought to try and get it into Spark but perhaps Arrow might be a better home. I think the only issue is whether we want to bring Spark jars and their dependencies into Arrow. One challenge I have had so far with the connector is managing

[jira] [Created] (ARROW-5903) [Java] Set methods in DecimalVector are slow

2019-07-10 Thread Pindikura Ravindra (JIRA)
Pindikura Ravindra created ARROW-5903: - Summary: [Java] Set methods in DecimalVector are slow Key: ARROW-5903 URL: https://issues.apache.org/jira/browse/ARROW-5903 Project: Apache Arrow

Re: [DISCUSS][C++] Evaluating the arrow::Column C++ class

2019-07-10 Thread Wes McKinney
I did my best to remove the class from the GLib bindings -- there are probably some conventions around API versions that I did not respect, so I will need some help from others on GLib and Ruby. MATLAB and R are also affected, but should be relatively simple to change. I'll wait to hear more

Re: Spark and Arrow Flight

2019-07-10 Thread Wes McKinney
hi Ryan -- I was thinking that this might be built separately from the main Java project. We don't have a model in the codebase yet for libraries that depend on the core libraries (this could be in an apps/ directory at the top level, so apps/spark-flight-source or something). So the development

Re: [Discuss] Are Union.typeIds worth keeping?

2019-07-10 Thread Wes McKinney
hi Ben, Some applications use static type ids for various data types. Let's consider one possibility: BOOLEAN: 0 INT32: 1 DOUBLE: 2 STRING (UTF8): 3 If you were parsing JSON and constructing unions while parsing, you might encounter some types, but not all. So if we _don't_ have the option of

Re: [DISCUSS] Need for 0.14.1 release due to Python package problems, Parquet forward compatibility problems

2019-07-10 Thread Wes McKinney
hi folks, Are there any opinions / strong feelings about the two options: * Prepare patch 0.14.1 release from a maintenance branch * Release 0.15.0 out of master Aside from the Parquet forward compatibility issues we're still discussing, and Eric's C# patch PR 4836, are there any other issues

[jira] [Created] (ARROW-5905) [Python] support conversion to decimal type from floats?

2019-07-10 Thread Joris Van den Bossche (JIRA)
Joris Van den Bossche created ARROW-5905: Summary: [Python] support conversion to decimal type from floats? Key: ARROW-5905 URL: https://issues.apache.org/jira/browse/ARROW-5905 Project:

[jira] [Created] (ARROW-5907) base64 support of bytes-like

2019-07-10 Thread Litchy (JIRA)
Litchy created ARROW-5907: - Summary: base64 support of bytes-like Key: ARROW-5907 URL: https://issues.apache.org/jira/browse/ARROW-5907 Project: Apache Arrow Issue Type: New Feature

Re: [Discuss] Are Union.typeIds worth keeping?

2019-07-10 Thread Jacques Nadeau
I was also supportive of this pattern. We definitely have used it before to optimize in certain cases. On Wed, Jul 10, 2019, 2:40 PM Wes McKinney wrote: > On Wed, Jul 10, 2019 at 3:57 PM Ben Kietzman > wrote: > > > > In this scenario option A (include child arrays for each child type, even > >

Re: [Discuss] Are Union.typeIds worth keeping?

2019-07-10 Thread Wes McKinney
On Wed, Jul 10, 2019 at 3:57 PM Ben Kietzman wrote: > > In this scenario option A (include child arrays for each child type, even > if that type is not observed) seems like the clearly correct choice to me. > It yiedls a more intuitive layout for the union array and incurs no runtime > overhead

Re: [Discuss] Are Union.typeIds worth keeping?

2019-07-10 Thread Ben Kietzman
In this scenario option A (include child arrays for each child type, even if that type is not observed) seems like the clearly correct choice to me. It yiedls a more intuitive layout for the union array and incurs no runtime overhead (since the absent children are empty/null arrays). > why not

Re: [DRAFT] Apache Arrow ASF Board Report July 2019

2019-07-10 Thread Jacques Nadeau
Looks good to me. Thanks for pulling together. On Wed, Jul 10, 2019 at 2:49 PM Wes McKinney wrote: > any comments about this? The report is due > > On Sun, Jul 7, 2019 at 6:02 PM Wes McKinney wrote: > > > > ## Description: > > > > Apache Arrow is a cross-language development platform for

[jira] [Created] (ARROW-5904) [Java] [Plasma] Fix compilation of Plasma Java client

2019-07-10 Thread Philipp Moritz (JIRA)
Philipp Moritz created ARROW-5904: - Summary: [Java] [Plasma] Fix compilation of Plasma Java client Key: ARROW-5904 URL: https://issues.apache.org/jira/browse/ARROW-5904 Project: Apache Arrow

Re: [DRAFT] Apache Arrow ASF Board Report July 2019

2019-07-10 Thread Wes McKinney
any comments about this? The report is due On Sun, Jul 7, 2019 at 6:02 PM Wes McKinney wrote: > > ## Description: > > Apache Arrow is a cross-language development platform for in-memory > data. It specifies a standardized language-independent columnar memory > format for flat and hierarchical

[jira] [Created] (ARROW-5906) [CI] Set -DARROW_VERBOSE_THIRDPARTY_BUILD=OFF in builds running in Travis CI, maybe all docker-compose builds by default

2019-07-10 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-5906: --- Summary: [CI] Set -DARROW_VERBOSE_THIRDPARTY_BUILD=OFF in builds running in Travis CI, maybe all docker-compose builds by default Key: ARROW-5906 URL:

Re: [DISCUSS] Need for 0.14.1 release due to Python package problems, Parquet forward compatibility problems

2019-07-10 Thread Joris Van den Bossche
I personally prefer 0.14.1 over 0.15.0. I think that is clearer in communication, as we are fixing regressions of the 0.14.0 release. (but I haven't been involved much in releases, so certainly no strong opinion) Joris Op wo 10 jul. 2019 om 15:07 schreef Wes McKinney : > hi folks, > > Are

Support an alternative memory layout for varchar/varbinary vectors

2019-07-10 Thread Fan Liya
Hi all, We are thinking of providing varchar/varbinary vectors with a different memory layout which exists in a wide range of systems. The memory layout is different from that of VarCharVector in the following ways: 1. Instead of storing (start offset, end offset), the new layout stores

[jira] [Created] (ARROW-5909) [Java] Optimize ByteFunctionHelpers equals & compare logic

2019-07-10 Thread Ji Liu (JIRA)
Ji Liu created ARROW-5909: - Summary: [Java] Optimize ByteFunctionHelpers equals & compare logic Key: ARROW-5909 URL: https://issues.apache.org/jira/browse/ARROW-5909 Project: Apache Arrow Issue

[Discuss] Support an alternative memory layout for varchar/varbinary vectors

2019-07-10 Thread Fan Liya
Hi all, We are thinking of providing varchar/varbinary vectors with a different memory layout which exists in a wide range of systems. The memory layout is different from that of VarCharVector in the following ways: 1. Instead of storing (start offset, end offset), the new layout stores

[jira] [Created] (ARROW-5899) [Python][Packaging] Bundle uriparser.dll in windows wheels

2019-07-10 Thread Krisztian Szucs (JIRA)
Krisztian Szucs created ARROW-5899: -- Summary: [Python][Packaging] Bundle uriparser.dll in windows wheels Key: ARROW-5899 URL: https://issues.apache.org/jira/browse/ARROW-5899 Project: Apache Arrow

[jira] [Created] (ARROW-5900) [Gandiva] [Java] Decimal precision,scale bounds check

2019-07-10 Thread Praveen Kumar Desabandu (JIRA)
Praveen Kumar Desabandu created ARROW-5900: -- Summary: [Gandiva] [Java] Decimal precision,scale bounds check Key: ARROW-5900 URL: https://issues.apache.org/jira/browse/ARROW-5900 Project:

[jira] [Created] (ARROW-5898) [Java] Provide functionality to efficiently compute hash code for arbitrary memory segment

2019-07-10 Thread Liya Fan (JIRA)
Liya Fan created ARROW-5898: --- Summary: [Java] Provide functionality to efficiently compute hash code for arbitrary memory segment Key: ARROW-5898 URL: https://issues.apache.org/jira/browse/ARROW-5898

Re: [Discuss] IPC Specification, flatbuffers and unaligned memory accesses

2019-07-10 Thread Micah Kornfield
Does anybody else have thoughts on this? Other language contributors? It seems like we still might not have enough of a consensus for a vote? Thanks, Micah On Tue, Jul 2, 2019 at 7:32 AM Wes McKinney wrote: > Correct. The encapsulated IPC message will just be 4 bytes bigger. > > On Tue,