Re: MIME type

2019-11-21 Thread Sutou Kouhei
I found Apache Thrift registers the following MIME types: * application/vnd.apache.thrift.binary * application/vnd.apache.thrift.compact * application/vnd.apache.thrift.json https://www.iana.org/assignments/media-types/media-types.xhtml Thrift uses "vnd.apache." prefix[1]. [1]

[jira] [Created] (ARROW-7226) [JSON] Json loader fails on example in documentation.

2019-11-21 Thread Rinke Hoekstra (Jira)
Rinke Hoekstra created ARROW-7226: - Summary: [JSON] Json loader fails on example in documentation. Key: ARROW-7226 URL: https://issues.apache.org/jira/browse/ARROW-7226 Project: Apache Arrow

[NIGHTLY] Arrow Build Report for Job nightly-2019-11-21-0

2019-11-21 Thread Crossbow
Arrow Build Report for Job nightly-2019-11-21-0 All tasks: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-21-0 Failed Tasks: - conda-osx-clang-py27: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-21-0-azure-conda-osx-clang-py27 -

Re: [DISCUSS][C++] Pointer name aliasing

2019-11-21 Thread Antoine Pitrou
On Wed, 20 Nov 2019 20:50:12 -0800 Micah Kornfield wrote: > A recent PR for datasets [1] seems to have introduced the convention of > aliasing "std::shared_ptr" with "TypePtr" for some type. I think in > the past we had decided not to use a convention like this but I can't find > the thread.

Re: [DISCUSS][C++] Pointer name aliasing

2019-11-21 Thread Francois Saint-Jacques
This notation is already used in some parts of the codebase [1]. I think it was introduced when absorbing gandiva and then in a draft of the logical operations in the compute module. I have no strong opinion for/against. I find it convenient to reduce typing, but the style guide argue against

[jira] [Created] (ARROW-7225) [C++] `*std::move(Result)` calls T copy constructor

2019-11-21 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-7225: - Summary: [C++] `*std::move(Result)` calls T copy constructor Key: ARROW-7225 URL: https://issues.apache.org/jira/browse/ARROW-7225 Project: Apache Arrow

Dense unions: monotonic or strictly monotonic offsets?

2019-11-21 Thread Antoine Pitrou
Hello, I'd like some clarification on the spec and intent for dense arrays. Currently, it is specified that offsets of a dense union are "in order / increasing" (*). However, it is not obvious whether repeated values are allowed or not. I suspect the intent is to avoid having people exploit

Unions: storing type_ids or type_codes?

2019-11-21 Thread Antoine Pitrou
Hello, There's some ambiguity whether a union array's "types" buffer stores physical child ids, or logical type codes. Some of our C++ tests assume the former: https://github.com/apache/arrow/blob/master/cpp/src/arrow/array_union_test.cc#L107-L123 Some of our C++ tests assume the latter:

Re: [DISCUSS][C++] Pointer name aliasing

2019-11-21 Thread Antoine Pitrou
On Thu, 21 Nov 2019 08:40:10 -0500 Francois Saint-Jacques wrote: > This notation is already used in some parts of the codebase [1]. I > think it was introduced when absorbing gandiva and then in a draft of > the logical operations in the compute module. I have no strong opinion > for/against. I

[jira] [Created] (ARROW-7227) [Python] Provide wrappers for ConcatenateWithPromotion()

2019-11-21 Thread Zhuo Peng (Jira)
Zhuo Peng created ARROW-7227: Summary: [Python] Provide wrappers for ConcatenateWithPromotion() Key: ARROW-7227 URL: https://issues.apache.org/jira/browse/ARROW-7227 Project: Apache Arrow Issue

Adding stronger warnings about pre-production Arrow IPC implementations (C#, Rust)

2019-11-21 Thread Wes McKinney
hi folks, We're accruing some bug reports relating to the C# library when it comes to interop with other languages Nowhere in https://github.com/apache/arrow/blob/master/csharp/README.md is it clearly stated that such problems are to be anticipated. Until C# participates in the integration

Re: question about Columnar “Streaming Protocol” Change since 0.14.0

2019-11-21 Thread Wes McKinney
hi Andong, Yes. Here is the commit implementing these changes https://github.com/apache/arrow/commit/3eaceec8561d6b783d56f7b82e091c19e7fb043c#diff-32981a13284db7a021131df49e6cd203 - Wes On Thu, Nov 21, 2019 at 12:44 AM Andong Zhan wrote: > > Hi Arrow developers, > > We noticed that since

Re: Unions: storing type_ids or type_codes?

2019-11-21 Thread Wes McKinney
hi Antoine, The latter is correct, or at least what is intended in the specification. For example, if the type metadata indices codes [0, 5, 10], then the "types" buffer should contain values selected from these values rather than physical child indexes (which would be [0, 1, 2] in this case)

Re: Dense unions: monotonic or strictly monotonic offsets?

2019-11-21 Thread Wes McKinney
hi Antoine, It's a good question. The intent when we wrote the specification was to be strictly monotonic, but there seems nothing especially harmful about relaxing the constraint to allow for repeated values or even non-monotonicity (strict or otherwise). For example, if we had the union ['a',

Re: [VOTE] Clarifications and forward compatibility changes for Dictionary Encoding (second iteration)

2019-11-21 Thread Wes McKinney
+1 (binding). Thanks Micah On Wed, Nov 20, 2019 at 10:42 PM Micah Kornfield wrote: > > Hello, > As discussed on [1], I've proposed clarifications in a PR [2] that > clarifies: > > 1. It is not required that all dictionary batches occur at the beginning > of the IPC stream format (if a the first

Re: [DISCUSS][C++] Pointer name aliasing

2019-11-21 Thread Wes McKinney
I think we should mostly be careful about public APIs. With public APIs we should write out the types and avoid aliases. With implementation details and private/protected class members, I think it is fine to use aliases. On Thu, Nov 21, 2019 at 11:06 AM Antoine Pitrou wrote: > > On Thu, 21 Nov

Re: [VOTE] Clarifications and forward compatibility changes for Dictionary Encoding (second iteration)

2019-11-21 Thread Micah Kornfield
Forgot to say, My vote is +1 (binding). On Thu, Nov 21, 2019 at 12:09 PM Wes McKinney wrote: > +1 (binding). Thanks Micah > > On Wed, Nov 20, 2019 at 10:42 PM Micah Kornfield > wrote: > > > > Hello, > > As discussed on [1], I've proposed clarifications in a PR [2] that > > clarifies: > > > >

[jira] [Created] (ARROW-7229) [C++] Unify ConcatenateTables APIs

2019-11-21 Thread Zhuo Peng (Jira)
Zhuo Peng created ARROW-7229: Summary: [C++] Unify ConcatenateTables APIs Key: ARROW-7229 URL: https://issues.apache.org/jira/browse/ARROW-7229 Project: Apache Arrow Issue Type: Improvement

[jira] [Created] (ARROW-7228) [Python] Expose RecordBatch.FromStructArray in Python.

2019-11-21 Thread Zhuo Peng (Jira)
Zhuo Peng created ARROW-7228: Summary: [Python] Expose RecordBatch.FromStructArray in Python. Key: ARROW-7228 URL: https://issues.apache.org/jira/browse/ARROW-7228 Project: Apache Arrow Issue

Re: Creating arrays from existing arrays in Cython

2019-11-21 Thread Suhail Razzak
Hi Micah, I was trying to create an Int64Builder class but kept getting a type identifier error. So, I did a bit of digging and realized I was looking at the latest commit of libarrow.pxd on GitHub which wasn't actually released as part of 0.15.1. Thanks for your help anyways! Suhail On Sat,

[jira] [Created] (ARROW-7230) [C++] Use vendored std::optional instead of boost::optional in Gandiva

2019-11-21 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-7230: --- Summary: [C++] Use vendored std::optional instead of boost::optional in Gandiva Key: ARROW-7230 URL: https://issues.apache.org/jira/browse/ARROW-7230 Project: Apache

Re: Dense unions: monotonic or strictly monotonic offsets?

2019-11-21 Thread Micah Kornfield
Hmm, I also thought the intention was monotonically increasing. I can't think of a strong reason one way or another. If the argument about code to do random access is the same in all cases, is there any benefit to forcing any order at all? Memory prefetching? On Thu, Nov 21, 2019 at 11:48 AM Wes

[jira] [Created] (ARROW-7234) [C++] Add Result to APIs to Gandiva

2019-11-21 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-7234: -- Summary: [C++] Add Result to APIs to Gandiva Key: ARROW-7234 URL: https://issues.apache.org/jira/browse/ARROW-7234 Project: Apache Arrow Issue Type:

[jira] [Created] (ARROW-7239) [C++] Add Result to APIs to plasma

2019-11-21 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-7239: -- Summary: [C++] Add Result to APIs to plasma Key: ARROW-7239 URL: https://issues.apache.org/jira/browse/ARROW-7239 Project: Apache Arrow Issue Type:

[jira] [Created] (ARROW-7240) [C++] Add Result to APIs to arrow/util

2019-11-21 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-7240: -- Summary: [C++] Add Result to APIs to arrow/util Key: ARROW-7240 URL: https://issues.apache.org/jira/browse/ARROW-7240 Project: Apache Arrow Issue Type:

[jira] [Created] (ARROW-7231) [C++] Parent bug for tracking migration to Result

2019-11-21 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-7231: -- Summary: [C++] Parent bug for tracking migration to Result Key: ARROW-7231 URL: https://issues.apache.org/jira/browse/ARROW-7231 Project: Apache Arrow

[jira] [Created] (ARROW-7235) [C++] Add Result to APIs to arrow/io

2019-11-21 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-7235: -- Summary: [C++] Add Result to APIs to arrow/io Key: ARROW-7235 URL: https://issues.apache.org/jira/browse/ARROW-7235 Project: Apache Arrow Issue Type:

[jira] [Created] (ARROW-7236) [C++] Add Result to APIs to arrow/csv

2019-11-21 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-7236: -- Summary: [C++] Add Result to APIs to arrow/csv Key: ARROW-7236 URL: https://issues.apache.org/jira/browse/ARROW-7236 Project: Apache Arrow Issue Type:

[jira] [Created] (ARROW-7232) [C++] Add Result to APIs to core vector structures

2019-11-21 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-7232: -- Summary: [C++] Add Result to APIs to core vector structures Key: ARROW-7232 URL: https://issues.apache.org/jira/browse/ARROW-7232 Project: Apache Arrow

[jira] [Created] (ARROW-7233) [C++] Add Result APIs to IPC module

2019-11-21 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-7233: -- Summary: [C++] Add Result APIs to IPC module Key: ARROW-7233 URL: https://issues.apache.org/jira/browse/ARROW-7233 Project: Apache Arrow Issue Type:

Re: Dense unions: monotonic or strictly monotonic offsets?

2019-11-21 Thread Fan Liya
This is an interesting question. IMO, to support repeated values, we also need to design a "coherency protocol", to avoid the scenario where once a value is witten, the change is propagated to another slot unexpectedly. Best, Liya Fan On Fri, Nov 22, 2019 at 1:34 PM Micah Kornfield wrote: >

Re: [DISCUSS][C++] Pointer name aliasing

2019-11-21 Thread Micah Kornfield
> > I think we should mostly be careful about public APIs. With public > APIs we should write out the types and avoid aliases. With > implementation details and private/protected class members, I think it > is fine to use aliases. My concern with this is that in general if the types are in the

[jira] [Created] (ARROW-7237) [C++] Add Result to APIs to arrow/json

2019-11-21 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-7237: -- Summary: [C++] Add Result to APIs to arrow/json Key: ARROW-7237 URL: https://issues.apache.org/jira/browse/ARROW-7237 Project: Apache Arrow Issue Type:

[jira] [Created] (ARROW-7238) [C++] Add Result to APIs to arrow/adapters

2019-11-21 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-7238: -- Summary: [C++] Add Result to APIs to arrow/adapters Key: ARROW-7238 URL: https://issues.apache.org/jira/browse/ARROW-7238 Project: Apache Arrow Issue