Re: [VOTE] Adopt Arrow in-process C Data Interface specification

2019-12-06 Thread Jacques Nadeau
-1 (binding) I'm voting -1 on this. I posted the thinking why on the PR. The high-level is that I think it needs to better address the pipelined use case as right now it fails to support that at all and has too much weight to ignore that use case. I actually would have posted it here but totally

[jira] [Created] (ARROW-7345) [Python] Writing partitions with NaNs silently drops data

2019-12-06 Thread Karl Dunkle Werner (Jira)
Karl Dunkle Werner created ARROW-7345: - Summary: [Python] Writing partitions with NaNs silently drops data Key: ARROW-7345 URL: https://issues.apache.org/jira/browse/ARROW-7345 Project: Apache

[jira] [Created] (ARROW-7344) [Packaging][Python] Build manylinux2014 wheels

2019-12-06 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-7344: -- Summary: [Packaging][Python] Build manylinux2014 wheels Key: ARROW-7344 URL: https://issues.apache.org/jira/browse/ARROW-7344 Project: Apache Arrow

Re: Human-readable version of Arrow Schema?

2019-12-06 Thread Micah Kornfield
Hi Christian, As far as I know no-one is working on a canonical text representation for schemas. A JSON serializer exists for integration test purposes, but IMO it shouldn't be relied upon as canonical. It looks like Flatbuffers supports serialization to/from JSON [1

Re: [VOTE] Adopt Arrow in-process C Data Interface specification

2019-12-06 Thread Wes McKinney
Hello, Could more PMC members take a look at this work? Thank you On Tue, Dec 3, 2019 at 1:50 PM Neal Richardson wrote: > > +1 (non-binding) > > On Tue, Dec 3, 2019 at 10:56 AM Wes McKinney wrote: > > > +1 (binding) > > > > On Tue, Dec 3, 2019 at 12:54 PM Wes McKinney wrote: > > > > > >

[jira] [Created] (ARROW-7343) Memory leak in Flight ArrowMessage

2019-12-06 Thread David Li (Jira)
David Li created ARROW-7343: --- Summary: Memory leak in Flight ArrowMessage Key: ARROW-7343 URL: https://issues.apache.org/jira/browse/ARROW-7343 Project: Apache Arrow Issue Type: Bug

Human-readable version of Arrow Schema?

2019-12-06 Thread Christian Hudon
Hi, For the uses I would like to make of Arrow, I would need a human-readable and -writable version of an Arrow Schema, that could be converted to and from the Arrow Schema C++ object. Going through the doc for 0.15.1, I don't see anything to that effect, with the closest being the ToString()

[jira] [Created] (ARROW-7342) [Java] offset buffer for vector of variable-width type with zero value count is empty

2019-12-06 Thread Steve M. Kim (Jira)
Steve M. Kim created ARROW-7342: --- Summary: [Java] offset buffer for vector of variable-width type with zero value count is empty Key: ARROW-7342 URL: https://issues.apache.org/jira/browse/ARROW-7342

[jira] [Created] (ARROW-7341) [CI] Unbreak nightly Conda R job

2019-12-06 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-7341: -- Summary: [CI] Unbreak nightly Conda R job Key: ARROW-7341 URL: https://issues.apache.org/jira/browse/ARROW-7341 Project: Apache Arrow Issue Type: Bug

Re: Timestamp coerced by default writing to parquet when resolution is ns (python)

2019-12-06 Thread Weston Pace
Thanks. I similarly noticed that uint32 gets converted to int64. This makes some surface sense as uint32 is a logical type with int64 as the backing physical type. However, uint8, uint16, and uint64 all keep their data types so I was a little surprised. On Fri, Dec 6, 2019 at 6:52 AM Wes

[jira] [Created] (ARROW-7340) [CI] Prune defunct appveyor build setup

2019-12-06 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-7340: -- Summary: [CI] Prune defunct appveyor build setup Key: ARROW-7340 URL: https://issues.apache.org/jira/browse/ARROW-7340 Project: Apache Arrow Issue Type:

[jira] [Created] (ARROW-7339) [CMake] Thrift version not respected in CMake configuration version.txt

2019-12-06 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-7339: - Summary: [CMake] Thrift version not respected in CMake configuration version.txt Key: ARROW-7339 URL: https://issues.apache.org/jira/browse/ARROW-7339

Re: Timestamp coerced by default writing to parquet when resolution is ns (python)

2019-12-06 Thread Wes McKinney
Some notes * 96-bit nanosecond timestamps are deprecated in the Parquet format by default, so we don't write them by default unless you use the use_deprecated_int96_timestamps flag * 64-bit timestamps are relatively new to the Parquet format, I'm not actually sure what's required to write these.

[jira] [Created] (ARROW-7338) [C++] Rename SimpleDataSource to InMemoryDataSource

2019-12-06 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-7338: - Summary: [C++] Rename SimpleDataSource to InMemoryDataSource Key: ARROW-7338 URL: https://issues.apache.org/jira/browse/ARROW-7338 Project: Apache

Timestamp coerced by default writing to parquet when resolution is ns (python)

2019-12-06 Thread Weston Pace
If my table has timestamp fields with ns resolution and I save the table to parquet format without specifying any timestamp args (default coerce and legacy settings) then it automatically converts my timestamp to us resolution. As best I can tell Parquet supports ns resolution so I would prefer

[NIGHTLY] Arrow Build Report for Job nightly-2019-12-06-0

2019-12-06 Thread Crossbow
Arrow Build Report for Job nightly-2019-12-06-0 All tasks: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-12-06-0 Failed Tasks: - test-conda-python-3.7-hdfs-2.9.2: URL:

Re: Java - Spark dataframe to Arrow format

2019-12-06 Thread GaoXiang Wang
Hi Wes and Liya, Appreciate your feedback and information. Looking forward to a more efficient integration between Arrow and Spark on the Java/Scala level. I would like to make my contribution if I can help in any way during my free time. Thank you very much. *Best Regards,WANG GAOXIANG* *

[jira] [Created] (ARROW-7337) [CI][C++] Excersive benchmarks as GitHub actions cron job

2019-12-06 Thread Krisztian Szucs (Jira)
Krisztian Szucs created ARROW-7337: -- Summary: [CI][C++] Excersive benchmarks as GitHub actions cron job Key: ARROW-7337 URL: https://issues.apache.org/jira/browse/ARROW-7337 Project: Apache Arrow

[jira] [Created] (ARROW-7336) implement minmax options

2019-12-06 Thread Yuan Zhou (Jira)
Yuan Zhou created ARROW-7336: Summary: implement minmax options Key: ARROW-7336 URL: https://issues.apache.org/jira/browse/ARROW-7336 Project: Apache Arrow Issue Type: Improvement

Re: Java - Spark dataframe to Arrow format

2019-12-06 Thread Fan Liya
Hi folks, Thanks for your clarification. I also think this is a universal requirement (including Java UDF in Arrow format). The Java converter provided by Spark is inefficient, due to two reasons (IMO) 1. There are frequent memory copies between on-heap and off-heap memory. 2. The Spark API is