[jira] [Created] (ARROW-8081) Fix memory size when using huge pages in plasma; other code cleanups

2020-03-11 Thread Siyuan Zhuang (Jira)
Siyuan Zhuang created ARROW-8081: Summary: Fix memory size when using huge pages in plasma; other code cleanups Key: ARROW-8081 URL: https://issues.apache.org/jira/browse/ARROW-8081 Project: Apache

Re: Coordinating / scheduling C++ Parquet-Arrow nested data work (ARROW-1644 and others)

2020-03-11 Thread Micah Kornfield
Another status update. I've integrated the level generation code with the parquet writing code [1]. After that PR is merged I'll add bindings in Python to control versions of the level generation algorithm and plan on moving on to the read side. Thanks, Micah [1]

Re: [DISCUSS] Leveraging cloud computing resources for Arrow test workloads

2020-03-11 Thread Micah Kornfield
> > * Who's going to pay for it? Perhaps Amazon, Google, or Microsoft can > donate cloud compute credits to the project Google has offered a donation of GCP credits based on some estimates I made last year when we were facing Travis CI issues. I'm happy to try to do some integration work to help

[jira] [Created] (ARROW-8080) [C++] Add AVX512 build option

2020-03-11 Thread Frank Du (Jira)
Frank Du created ARROW-8080: --- Summary: [C++] Add AVX512 build option Key: ARROW-8080 URL: https://issues.apache.org/jira/browse/ARROW-8080 Project: Apache Arrow Issue Type: Improvement

Re: [DISCUSS][Java] Support non-nullable vectors

2020-03-11 Thread Jacques Nadeau
Generally Ive found that this isnt an important optimization in the use cases we see. Memory overhead, especially with our Java shared allocation scheme is nominal. Optimizing null checks at the word level usually is much more impactful since non null and null runs are much more common on a

[NIGHTLY] Arrow Build Report for Job nightly-2020-03-11-0

2020-03-11 Thread Crossbow
Arrow Build Report for Job nightly-2020-03-11-0 All tasks: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-11-0 Failed Tasks: - centos-8: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-11-0-github-centos-8 - conda-win-vs2015-py36:

Re: Summary of RLE and other compression efforts?

2020-03-11 Thread Wes McKinney
On Wed, Mar 11, 2020 at 11:24 AM Evan Chan wrote: > > Sure thing. > > Computation speed needs to be thought about in context We might find > something which takes up half the space to be a little more computationally > expensive, but in the grand scheme of things is faster to compute as

[DISCUSS] Leveraging cloud computing resources for Arrow test workloads

2020-03-11 Thread Wes McKinney
hi folks, There has periodically been a discussion about employing dedicated compute resources to serve our testing needs beyond what can be accomplished in free / public CI services like GitHub Actions, Appveyor, etc. For example: * Workloads requiring a CUDA-capable GPU * Tests requiring a lot

Re: [DISCUSS] Semantics of custom_metadata

2020-03-11 Thread Wes McKinney
I opened https://issues.apache.org/jira/browse/ARROW-8079 about the Python question On Wed, Mar 11, 2020 at 2:53 PM Neal Richardson wrote: > > While the underlying storage may allow duplicate keys, it seems much more > likely that someone would end up with duplicate keys by accident than by >

[jira] [Created] (ARROW-8079) [Python] Implement a wrapper for KeyValueMetadata, duck-typing dict where relevant

2020-03-11 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-8079: --- Summary: [Python] Implement a wrapper for KeyValueMetadata, duck-typing dict where relevant Key: ARROW-8079 URL: https://issues.apache.org/jira/browse/ARROW-8079

[jira] [Created] (ARROW-8078) [Python] Missing links in the docs regarding field and schema DataTypes

2020-03-11 Thread Jira
Otávio Vasques created ARROW-8078: - Summary: [Python] Missing links in the docs regarding field and schema DataTypes Key: ARROW-8078 URL: https://issues.apache.org/jira/browse/ARROW-8078 Project:

Re: [DISCUSS] Semantics of custom_metadata

2020-03-11 Thread Neal Richardson
While the underlying storage may allow duplicate keys, it seems much more likely that someone would end up with duplicate keys by accident than by design. And although it may be up to the implementations to determine or enforce uniqueness constraints, it might be a good idea to make a

[jira] [Created] (ARROW-8077) [Python] Add wheel build script and Crossbow configuration for Windows on Python 3.5

2020-03-11 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-8077: --- Summary: [Python] Add wheel build script and Crossbow configuration for Windows on Python 3.5 Key: ARROW-8077 URL: https://issues.apache.org/jira/browse/ARROW-8077

Re: [DISCUSS] Semantics of custom_metadata

2020-03-11 Thread Wes McKinney
On Wed, Mar 11, 2020 at 2:22 PM Antoine Pitrou wrote: > > On Wed, 11 Mar 2020 12:44:26 -0500 > Wes McKinney wrote: > > On this note, in Python we should probably re-evaluate the data > > structure returned when accessing the "metadata" field. > > I think it's ok for the convenience API to return

Re: [DISCUSS] Semantics of custom_metadata

2020-03-11 Thread Antoine Pitrou
On Wed, 11 Mar 2020 12:44:26 -0500 Wes McKinney wrote: > On this note, in Python we should probably re-evaluate the data > structure returned when accessing the "metadata" field. I think it's ok for the convenience API to return a dict, if we also expose e.g. a "metadata_items" that returns an

[jira] [Created] (ARROW-8076) [C++] arrow::stl::TupleRangeFromTable example includes wrong signature

2020-03-11 Thread Tomasz Cheda (Jira)
Tomasz Cheda created ARROW-8076: --- Summary: [C++] arrow::stl::TupleRangeFromTable example includes wrong signature Key: ARROW-8076 URL: https://issues.apache.org/jira/browse/ARROW-8076 Project: Apache

Re: [DISCUSS] Semantics of custom_metadata

2020-03-11 Thread Wes McKinney
On this note, in Python we should probably re-evaluate the data structure returned when accessing the "metadata" field. On Wed, Mar 11, 2020 at 12:42 PM Wes McKinney wrote: > > In the C++ library at least, uniqueness is never asserted when reading > and writing the IPC metadata [1] [2]. If you

Re: [DISCUSS] Semantics of custom_metadata

2020-03-11 Thread Wes McKinney
In the C++ library at least, uniqueness is never asserted when reading and writing the IPC metadata [1] [2]. If you use KeyValueMetadata::FindKey and the keys are non-unique, it will return the first one it finds. KeyValueMetadata::Merge assumes uniqueness, and the KeyValueMetadata::ToUnorderedMap

[DISCUSS] Semantics of custom_metadata

2020-03-11 Thread Ben Kietzman
While working on https://issues.apache.org/jira/browse/ARROW-2255 (serialize custom_metadata in the integration tests), we had the following discussion on GitHub: https://github.com/apache/arrow/pull/6556#pullrequestreview-372405940 In short, although in Schema.fbs custom_metadata is declared as

Re: Summary of RLE and other compression efforts?

2020-03-11 Thread Evan Chan
Sure thing. Computation speed needs to be thought about in context We might find something which takes up half the space to be a little more computationally expensive, but in the grand scheme of things is faster to compute as more of it can fit in memory, and it saves I/O. I definitely

[jira] [Created] (ARROW-8075) Loading R.utils after arrow breaks some arrow functions

2020-03-11 Thread Sam Albers (Jira)
Sam Albers created ARROW-8075: - Summary: Loading R.utils after arrow breaks some arrow functions Key: ARROW-8075 URL: https://issues.apache.org/jira/browse/ARROW-8075 Project: Apache Arrow Issue

Re: [DISCUSS][Java] Support non-nullable vectors

2020-03-11 Thread Brian Hulette
> And there is a "nullable" metadata-only flag at the > Field level. Could the same kinds of optimizations be implemented in > Java without introducing a "nullable" concept? Note Liya Fan did suggest pulling the nullable flag from the Field when the vector is created in item (1) of the proposed

Re: [DISCUSS][Java] Support non-nullable vectors

2020-03-11 Thread Fan Liya
Hi Micah, Thanks a lot for your valuable comments. Please see my comments inline. > I'm a little concerned that this will change assumptions for at least some > of the clients using the library (some might always rely on the validity > buffer being present). I can understand your concern and I

[jira] [Created] (ARROW-8074) [C++][Dataset] Support for file-like objects (buffers) in FileSystemDataset?

2020-03-11 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-8074: Summary: [C++][Dataset] Support for file-like objects (buffers) in FileSystemDataset? Key: ARROW-8074 URL: https://issues.apache.org/jira/browse/ARROW-8074

Re: Summary of RLE and other compression efforts?

2020-03-11 Thread Antoine Pitrou
Hi, Le 11/03/2020 à 06:31, Micah Kornfield a écrit : > > I still think we should be careful on what is added to the spec, in > particular, we should be focused on encodings that can be used to improve > computational efficiency rather than just smaller size. Also, it is > important to note

Re: [Discuss] [Java] Implement vector diff functionality

2020-03-11 Thread Ji Liu
Hi Micah, Thanks for your feedback, you have opened an issue for Google's Truth[1] and it was assigned to me, I'll try to use it. Thanks, Ji Liu [1] https://issues.apache.org/jira/browse/ARROW-6931 -- From:Micah Kornfield Send

Re: [Java] Port vector validate functionality

2020-03-11 Thread Ji Liu
Hi Wes and Micah, Thanks for your valuable suggestion, I will create sub-tasks under this issue as follow-up works when this one is finished. Thanks, Ji Liu -- From:Micah Kornfield Send Time:2020年3月11日(星期三) 13:42 To:dev Cc:Ji

[jira] [Created] (ARROW-8073) [GLib] Add binding of arrow::fs::PathForest

2020-03-11 Thread Kenta Murata (Jira)
Kenta Murata created ARROW-8073: --- Summary: [GLib] Add binding of arrow::fs::PathForest Key: ARROW-8073 URL: https://issues.apache.org/jira/browse/ARROW-8073 Project: Apache Arrow Issue Type:

[jira] [Created] (ARROW-8072) Add const constraint when parsing data

2020-03-11 Thread Siyuan Zhuang (Jira)
Siyuan Zhuang created ARROW-8072: Summary: Add const constraint when parsing data Key: ARROW-8072 URL: https://issues.apache.org/jira/browse/ARROW-8072 Project: Apache Arrow Issue Type: