[jira] [Created] (ARROW-1238) [Java] Add JSON read/write support for decimals for integration tests

2017-07-19 Thread Bryan Cutler (JIRA)
Bryan Cutler created ARROW-1238: --- Summary: [Java] Add JSON read/write support for decimals for integration tests Key: ARROW-1238 URL: https://issues.apache.org/jira/browse/ARROW-1238 Project: Apache Arr

Re: Use case for R Arrow Bindings

2017-07-19 Thread Dean Chen
Sounds good, will get a thread going there. On Wed, Jul 19, 2017 at 6:02 PM Wes McKinney wrote: > Especially with Arrow support landing in Spark (SPARK-13534), it would > be helpful to combine efforts between Python and R on this front. I > also have a long list of improvements to the Feather fo

Re: Use case for R Arrow Bindings

2017-07-19 Thread Wes McKinney
Especially with Arrow support landing in Spark (SPARK-13534), it would be helpful to combine efforts between Python and R on this front. I also have a long list of improvements to the Feather format that will be substantially simpler once library(feather) is depending on the main Arrow libraries.

Re: Use case for R Arrow Bindings

2017-07-19 Thread Dean Chen
I also sent a note about it to the dev list a month ago. Still have a huge internal need and interested in helping push this along where we can. Unfortunately, our team is more focused around Spark and doesn't have much experience working with the R community. On Wed, Jul 19, 2017 at 1:44 PM Clark

[VOTE] Release Apache Arrow 0.5.0 - RC1

2017-07-19 Thread Wes McKinney
Hello all, I'd like to propose the 1st release candidate (rc1) of Apache Arrow version 0.5.0. This is a major release consisting of 129 resolved JIRAs [1]. The source release rc1 is hosted at [2]. This release candidate is based on commit 1dd141bfe2523fcb402a63e2931f425aef5bd26a [3] The will b

Parquet+Arrow Java

2017-07-19 Thread Sven Wagner-Boysen
Hi, I started looking into the projects Parquet and Arrow. Looks very promising to me. I also came across PyArrow and the Parquet-Arrow integration in Python. Is there something similar available for Java? https://arrow.apache.org/docs/python/parquet.html Thanks Sven

Re: Adding a Map logical type to the Arrow metadata

2017-07-19 Thread Julian Hyde
I see. It took me a while to understand, but it all made sense when I realized that we are not looking at one Map instance but multiple rows, each with a Map instance, and the constituents parts of those Maps are stored end-to-end. Julian On Wed, Jul 19, 2017 at 11:42 AM, Wes McKinney wrote: >

Re: Adding a Map logical type to the Arrow metadata

2017-07-19 Thread Wes McKinney
The only structural difference between List> and Struct, List> is that in the latter case, the "key" value and the "value" value have different offset vectors and thus can have different lengths. So in the first case we have buffer structure: - list null bitmap (map value is null / not null)

Use case for R Arrow Bindings

2017-07-19 Thread Clark Fitzgerald
Hello all, I saw the notes come through from today's call: > * R Arrow Bindings? > - Find use cases within the R community, contributors needed > - R Feather bindings a useful starting point This year I've been working on parallel R on datasets in the 100+ GB range, and have found that loading

Re: Adding a Map logical type to the Arrow metadata

2017-07-19 Thread Julian Hyde
List> isn’t the only physical representation that makes sense. Because it doesn’t take advantage of the fact that (a) keys can be re-ordered, (b) keys are unique. So, another viable physical representation would be Struct, List>, with the keys sorted. If keys are constant width and in contiguou

[jira] [Created] (ARROW-1237) Expose the ability to set lastSet

2017-07-19 Thread SIDDHARTH TEOTIA (JIRA)
SIDDHARTH TEOTIA created ARROW-1237: --- Summary: Expose the ability to set lastSet Key: ARROW-1237 URL: https://issues.apache.org/jira/browse/ARROW-1237 Project: Apache Arrow Issue Type: Bug

Re: Arrow sync tomorrow?

2017-07-19 Thread Wes McKinney
Topics from the sync call. I will work on cutting the 0.5.0 RC today, and we can discuss the other topics further on the mailing list * 0.5.0 Release - Deprecating timestamps_to_ms argument in Python for Parquet - Wes to cut release candidate * Arrow 1.0.0 release planning: format stability

Re: Arrow sync tomorrow?

2017-07-19 Thread Wes McKinney
Arrow sync starting in just a few minutes https://hangouts.google.com/hangouts/_/calendar/d2VzbWNraW5uQGdtYWlsLmNvbQ.05oukg37tomc6frk0osfdea9a9?authuser=0 On Tue, Jul 18, 2017 at 1:50 PM, Wes McKinney wrote: > OK, let's do 15:00 UTC tomorrow when. I'll send out a hangout link to > the mailing li