[Format] Redundant information in Time type?

2019-02-27 Thread Micah Kornfield
In the flatbuffer schema what is the purpose of bit width in "table Time" [1] based on the documentation it sounds like bit-width is fully determined by TimeUnit? In other cases (e.g. Date) we don't have a similar field. Thanks, Micah [1]

Re: Flaky Travis CI builds on master

2019-02-27 Thread Ravindra Pindikura
> On Feb 27, 2019, at 1:48 AM, Antoine Pitrou wrote: > > On Tue, 26 Feb 2019 13:39:08 -0600 > Wes McKinney wrote: >> hi folks, >> >> We haven't had a green build on master for about 5 days now (the last >> one was February 21). Has anyone else been paying attention to this? >> It seems we

[jira] [Created] (ARROW-4711) [Plasma] enhance plasma client interfaces to work with multiple objects

2019-02-27 Thread Zhijun Fu (JIRA)
Zhijun Fu created ARROW-4711: Summary: [Plasma] enhance plasma client interfaces to work with multiple objects Key: ARROW-4711 URL: https://issues.apache.org/jira/browse/ARROW-4711 Project: Apache Arrow

[jira] [Created] (ARROW-4710) [C++][R] New linting script skip files with "cpp" extension

2019-02-27 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-4710: --- Summary: [C++][R] New linting script skip files with "cpp" extension Key: ARROW-4710 URL: https://issues.apache.org/jira/browse/ARROW-4710 Project: Apache Arrow

[jira] [Created] (ARROW-4709) [C++] Optimize for ordered JSON fields

2019-02-27 Thread Benjamin Kietzman (JIRA)
Benjamin Kietzman created ARROW-4709: Summary: [C++] Optimize for ordered JSON fields Key: ARROW-4709 URL: https://issues.apache.org/jira/browse/ARROW-4709 Project: Apache Arrow Issue

[jira] [Created] (ARROW-4708) [C++] Add multithreaded JSON reader

2019-02-27 Thread Benjamin Kietzman (JIRA)
Benjamin Kietzman created ARROW-4708: Summary: [C++] Add multithreaded JSON reader Key: ARROW-4708 URL: https://issues.apache.org/jira/browse/ARROW-4708 Project: Apache Arrow Issue

[jira] [Created] (ARROW-4707) [C++] move BitsetStack to bit-util.h

2019-02-27 Thread Benjamin Kietzman (JIRA)
Benjamin Kietzman created ARROW-4707: Summary: [C++] move BitsetStack to bit-util.h Key: ARROW-4707 URL: https://issues.apache.org/jira/browse/ARROW-4707 Project: Apache Arrow Issue

[jira] [Created] (ARROW-4706) [C++] shared conversion framework for JSON/CSV parsers

2019-02-27 Thread Benjamin Kietzman (JIRA)
Benjamin Kietzman created ARROW-4706: Summary: [C++] shared conversion framework for JSON/CSV parsers Key: ARROW-4706 URL: https://issues.apache.org/jira/browse/ARROW-4706 Project: Apache Arrow

[jira] [Created] (ARROW-4705) [Rust] CSV reader should show line number and error message when failing to parse a line

2019-02-27 Thread Andy Grove (JIRA)
Andy Grove created ARROW-4705: - Summary: [Rust] CSV reader should show line number and error message when failing to parse a line Key: ARROW-4705 URL: https://issues.apache.org/jira/browse/ARROW-4705

Re: Flaky Travis CI builds on master

2019-02-27 Thread Kouhei Sutou
Hi, In "Re: Flaky Travis CI builds on master" on Tue, 26 Feb 2019 16:00:31 -0600, Wes McKinney wrote: > * Seemingly a GLib Plasma OOM > https://travis-ci.org/apache/arrow/jobs/498906118#L3689 I take this: https://issues.apache.org/jira/browse/ARROW-4704 It seems that plasma_store doesn't

Re: [Discuss][Java] Codebase Housekeeping?

2019-02-27 Thread Bryan Cutler
These all sound good to me Micah, thanks for taking this on! Regarding the javadoc codestyle in (2), I believe it was disabled because there were just too many issues of missing docs at the time. Any documentation additions are definitely welcome and hopefully we can eventually enable the check

[jira] [Created] (ARROW-4704) [CI][GLib] Plasma test is flaky

2019-02-27 Thread Kouhei Sutou (JIRA)
Kouhei Sutou created ARROW-4704: --- Summary: [CI][GLib] Plasma test is flaky Key: ARROW-4704 URL: https://issues.apache.org/jira/browse/ARROW-4704 Project: Apache Arrow Issue Type: Test

Re: Parquet Shared Library Versioning

2019-02-27 Thread Wes McKinney
hi Hatem, Until the Parquet community begins to make C++ releases out of the new monorepo structure, I think we should continue to use the same SO version for all libraries produced by the build. Otherwise the ABI version from a libparquet.so coming from an Arrow release artifact could cause

Re: Google Summer of Code 2019 for Apache Arrow

2019-02-27 Thread Wes McKinney
I don't have the bandwidth to do more on this right now. If another member of the community wants to take a leadership role in our involvement in GSoC that would be really great. On Wed, Feb 27, 2019 at 1:06 PM Kevin Ratnasekera wrote: > > Hi Wes, > > I went through the JIRA on [1] there are

[jira] [Created] (ARROW-4703) [C++] Upgrade dependency versions

2019-02-27 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-4703: - Summary: [C++] Upgrade dependency versions Key: ARROW-4703 URL: https://issues.apache.org/jira/browse/ARROW-4703 Project: Apache Arrow Issue Type: Task

Re: Nightly binary packages

2019-02-27 Thread Kouhei Sutou
Hi, > - How should We handle the signing procedure? Simply omit? For .deb and .rpm, we need to sign them to install them by apt/yum. We should use a GPG key only for nightly for this propose. We should not use GPG keys in https://dist.apache.org/repos/dist/release/arrow/KEYS for this propose.

[jira] [Created] (ARROW-4702) [C++] Upgrade dependency versions

2019-02-27 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-4702: - Summary: [C++] Upgrade dependency versions Key: ARROW-4702 URL: https://issues.apache.org/jira/browse/ARROW-4702 Project: Apache Arrow Issue Type: Task

Re: Developing a "dataset" API / framework for Arrow C++ users

2019-02-27 Thread Wes McKinney
hi Ryan, On Wed, Feb 27, 2019 at 1:31 PM Ryan Blue wrote: > > Thanks for pointing out that document, Uwe. I really like the intent and it > would be really useful to have common components for large datasets. One of > the questions we are hitting with an Iceberg python implementation is the >

Re: Timeline for 0.13 Arrow release

2019-02-27 Thread Wes McKinney
The timeline for the 0.13 release is drawing closer. I would say we should consider a release candidate either the week of March 18 or March 25, which gives us ~3 weeks to close out backlog items. There are around 220 issues open or in-progress in

[jira] [Created] (ARROW-4701) [C++] Add JSON chunker benchmarks

2019-02-27 Thread Benjamin Kietzman (JIRA)
Benjamin Kietzman created ARROW-4701: Summary: [C++] Add JSON chunker benchmarks Key: ARROW-4701 URL: https://issues.apache.org/jira/browse/ARROW-4701 Project: Apache Arrow Issue Type:

[jira] [Created] (ARROW-4700) [C++] Add DecimalType support to JSON parser

2019-02-27 Thread Benjamin Kietzman (JIRA)
Benjamin Kietzman created ARROW-4700: Summary: [C++] Add DecimalType support to JSON parser Key: ARROW-4700 URL: https://issues.apache.org/jira/browse/ARROW-4700 Project: Apache Arrow

Re: Developing a "dataset" API / framework for Arrow C++ users

2019-02-27 Thread Ryan Blue
Thanks for pointing out that document, Uwe. I really like the intent and it would be really useful to have common components for large datasets. One of the questions we are hitting with an Iceberg python implementation is the file system abstraction, so I think this is very relevant for all of us.

[jira] [Created] (ARROW-4699) [C++] json parser should not rely on null terminated buffers

2019-02-27 Thread Benjamin Kietzman (JIRA)
Benjamin Kietzman created ARROW-4699: Summary: [C++] json parser should not rely on null terminated buffers Key: ARROW-4699 URL: https://issues.apache.org/jira/browse/ARROW-4699 Project: Apache

[jira] [Created] (ARROW-4698) [C++] Let StringBuilder be constructible with a pre allocated buffer for character data

2019-02-27 Thread Benjamin Kietzman (JIRA)
Benjamin Kietzman created ARROW-4698: Summary: [C++] Let StringBuilder be constructible with a pre allocated buffer for character data Key: ARROW-4698 URL: https://issues.apache.org/jira/browse/ARROW-4698

[jira] [Created] (ARROW-4697) [C++] Add URI parsing facility

2019-02-27 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-4697: - Summary: [C++] Add URI parsing facility Key: ARROW-4697 URL: https://issues.apache.org/jira/browse/ARROW-4697 Project: Apache Arrow Issue Type:

[jira] [Created] (ARROW-4696) Verify release script is over optimist with CUDA detection

2019-02-27 Thread Francois Saint-Jacques (JIRA)
Francois Saint-Jacques created ARROW-4696: - Summary: Verify release script is over optimist with CUDA detection Key: ARROW-4696 URL: https://issues.apache.org/jira/browse/ARROW-4696 Project:

Re: Flight / gRPC scalability issue

2019-02-27 Thread Wes McKinney
It seems like this discussion would be relevant to the gRPC community. There are probably other issues at play, like ensuring that multiple streams through the same port do not block each other too much if one stream has messages of smaller size and another larger size, then the byte slices sent

Re: Google Summer of Code 2019 for Apache Arrow

2019-02-27 Thread Wes McKinney
hi Kevin, I did post a JIRA ticket about Arrow project ideas. Perhaps I'm mistaken about where we are in the process so hopefully it is still possible to find a student to work on database driver bridges. > I saw your involvement with community and its more than enough time to mentor > a

Re: URI library for C++

2019-02-27 Thread Francois Saint-Jacques
I agree that vendoring curl is crazyness, my main point is that curl now has a url api. If we can find a way to avoid pulling another dependency. On Wed, Feb 27, 2019 at 11:30 AM Antoine Pitrou wrote: > > Vendoring curl sounds a bit crazy IMHO. We'll end up having to vendor a > TLS library and

Re: Google Summer of Code 2019 for Apache Arrow

2019-02-27 Thread Kevin Ratnasekera
Hi Wes, Why do you think of not participating this year?Google just announced apache as accepted org. And this is the usual time where students start to pop up. Its not too late to create some new ideas and post new tickets on Jira. It doesn’t even matter you post something very basic, what

Re: URI library for C++

2019-02-27 Thread Wes McKinney
Yes I do not think vendoring curl is a good idea On Wed, Feb 27, 2019 at 10:30 AM Antoine Pitrou wrote: > > > Vendoring curl sounds a bit crazy IMHO. We'll end up having to vendor a > TLS library and who knows what else... > > Regards > > Antoine. > > > On Wed, 27 Feb 2019 11:16:49 -0500 >

Re: URI library for C++

2019-02-27 Thread Antoine Pitrou
Vendoring curl sounds a bit crazy IMHO. We'll end up having to vendor a TLS library and who knows what else... Regards Antoine. On Wed, 27 Feb 2019 11:16:49 -0500 Francois Saint-Jacques wrote: > There's a good chance we end up using curl for the dataset project. Curl > has a new url API

Re: URI library for C++

2019-02-27 Thread Wes McKinney
> There's a good chance we end up using curl for the dataset project We'll need curl in our toolchain at some point for interacting e.g. with Google Cloud Storage [1] [1]: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/platform/cloud/gcs_file_system.cc On Wed, Feb 27,

Re: Flaky Travis CI builds on master

2019-02-27 Thread Brian Hulette
Another instance of #1 for the JS builds: https://travis-ci.org/apache/arrow/jobs/498967250#L992 I filed https://issues.apache.org/jira/browse/ARROW-4695 about it before seeing this thread. As noted there I was able to replicate the timeout on my laptop at least once. I didn't think to monitor

Re: URI library for C++

2019-02-27 Thread Antoine Pitrou
I've opened a couple issues for cpp-netlib. Let's how the maintainer responds. Otherwise I agree uriparser sounds better. Regards Antoine. On Wed, 27 Feb 2019 10:12:24 -0600 Wes McKinney wrote: > Seems like uriparser might be a better choice, but I haven't looked > into the C++ uri

Re: URI library for C++

2019-02-27 Thread Francois Saint-Jacques
There's a good chance we end up using curl for the dataset project. Curl has a new url API https://github.com/curl/curl/wiki/URL-API , but it requires a recent version (7.62.0 october 2018) which means vendoring. François On Wed, Feb 27, 2019 at 11:06 AM Antoine Pitrou wrote: > > Hello, > > As

Re: URI library for C++

2019-02-27 Thread Wes McKinney
Seems like uriparser might be a better choice, but I haven't looked into the C++ uri library to see how annoying maintaining a patched version would be On Wed, Feb 27, 2019 at 10:06 AM Antoine Pitrou wrote: > > > Hello, > > As part of ARROW-4651, we would need to have a URI parsing library in >

URI library for C++

2019-02-27 Thread Antoine Pitrou
Hello, As part of ARROW-4651, we would need to have a URI parsing library in the C++ project. One such library is https://github.com/cpp-netlib/uri, it's based on a previous proposal for the standard C++ library. It has no dependencies except boost::algorithm. One problem is that the library

Re: Google Summer of Code 2019 for Apache Arrow

2019-02-27 Thread Wes McKinney
It would be interesting. I think the ship has sailed on GSoC for us unfortunately. I'll try again to get the community interested in it next year; hopefully I'll have a little more bandwidth then to help make it happen On Wed, Feb 27, 2019 at 12:13 AM Micah Kornfield wrote: > > It might be

[jira] [Created] (ARROW-4695) [JS] Tests timing out on Travis

2019-02-27 Thread Brian Hulette (JIRA)
Brian Hulette created ARROW-4695: Summary: [JS] Tests timing out on Travis Key: ARROW-4695 URL: https://issues.apache.org/jira/browse/ARROW-4695 Project: Apache Arrow Issue Type: Improvement

Re: Flaky Travis CI builds on master

2019-02-27 Thread Francois Saint-Jacques
I think we're witnessing multiple issues. 1. Travis seems to be slow (is it an OOM issue?) - https://travis-ci.org/apache/arrow/jobs/499122041#L1019 - https://travis-ci.org/apache/arrow/jobs/498906118#L3694 - https://travis-ci.org/apache/arrow/jobs/499146261#L2316 2.

[jira] [Created] (ARROW-4694) [CI] detect-changes.py is inconsistent

2019-02-27 Thread Francois Saint-Jacques (JIRA)
Francois Saint-Jacques created ARROW-4694: - Summary: [CI] detect-changes.py is inconsistent Key: ARROW-4694 URL: https://issues.apache.org/jira/browse/ARROW-4694 Project: Apache Arrow

[jira] [Created] (ARROW-4693) [CI] Build boost library with multi precision

2019-02-27 Thread Pindikura Ravindra (JIRA)
Pindikura Ravindra created ARROW-4693: - Summary: [CI] Build boost library with multi precision Key: ARROW-4693 URL: https://issues.apache.org/jira/browse/ARROW-4693 Project: Apache Arrow

Re: Boost and manylinux CI builds

2019-02-27 Thread Uwe L. Korn
Hello Ravindra, simplest thing would be when you open a pull request and I can then pick this up and push it to my personal fork. Then a new image is built on quay.io. Otherwise, you can also activate quay.io on your fork to get the docker image to build. Uwe On Wed, Feb 27, 2019, at 8:41