date:20190815

[VOTE] Proposed addition to Arrow Flight Protocol

2019-08-15 Thread Micah Kornfield

Hello,
Ryan Murray has proposed adding a GetFlightSchema RPC [1] to the Arrow
Flight Protocol [2].  The purpose of this RPC is to allow decoupling schema
and endpoint retrieval as provided by the GetFlightInfo RPC.  The new
definition provided is:

message SchemaResult {
  // Serialized Flatbuffer Schema message.
  bytes schema = 1;
}
rpc GetSchema(FlightDescriptor) returns (SchemaResult) {}

Ryan has also provided a PR demonstrating implementation of the new RPC [3]
in Java, C++ and Python which can be reviewed and merged after this
addition is approved.

Please vote whether to accept the addition. The vote will be open for at
least 72 hours.

[ ] +1 Accept this addition to the Flight protocol
[ ] +0
[ ] -1 Do not accept the changes because...


Thanks,
Micah

[1]
https://docs.google.com/document/d/1zLdFYikk3owbKpHvJrARLMlmYpi-Ef6OJy7H90MqViA/edit
[2] https://github.com/apache/arrow/blob/master/format/Flight.proto
[3] https://github.com/apache/arrow/pull/4980

[jira] [Created] (ARROW-6265) [Java] Avro adapter implement Array/Map/Fixed type

2019-08-15 Thread Ji Liu (JIRA)

Ji Liu created ARROW-6265:
-

 Summary: [Java] Avro adapter implement Array/Map/Fixed type
 Key: ARROW-6265
 URL: https://issues.apache.org/jira/browse/ARROW-6265
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Java
Reporter: Ji Liu
Assignee: Ji Liu


Support Array/Map/Fixed type in avro adapter.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Re: [DISCUSS] Add GetFlightSchema to Flight RPC

2019-08-15 Thread Micah Kornfield

I'll start one shortly.

On Thu, Aug 15, 2019 at 4:31 PM Wes McKinney  wrote:

> Yes, I think having a vote as a procedural matter would be a good thing.
>
> I have run dozens of public and private votes in my role as a PMC
> member. I would appreciate if another PMC would assist with this vote.
>
> Thanks
>
> On Wed, Aug 14, 2019 at 5:37 PM Ryan Murray  wrote:
> >
> > Hi All,
> >
> > Does this require a vote? If yes what is the process for initiating one &
> > if no I hope this is enough time for feedback and I would like to remove
> > the draft designation from the PR
> >
> > Best,
> > Ryan
> >
> > On Wed, Aug 7, 2019 at 9:31 AM Ryan Murray  wrote:
> >
> > > As per everyone's feedback I have renamed GetFlightSchema -> GetSchema
> and
> > > have removed the descriptor on the rpc result message. The doc has been
> > > updated as has the draft PR
> > >
> > > On Thu, Aug 1, 2019 at 6:32 PM Bryan Cutler  wrote:
> > >
> > >> Sounds good to me, I would just echo what others have said.
> > >>
> > >> On Thu, Aug 1, 2019 at 8:17 AM Ryan Murray  wrote:
> > >>
> > >> > Thanks Wes,
> > >> >
> > >> > The descriptor is only there to maintain a bit of symmetry with
> > >> > GetFlightInfo. Happy to remove it, I don't think its necessary and
> > >> already
> > >> > a few people agree. Similar with the method name, I am neutral to
> the
> > >> > naming and can call it whatever the community is happy with.
> > >> >
> > >> > Best,
> > >> > Ryan
> > >> >
> > >> > On Thu, Aug 1, 2019 at 3:56 PM Wes McKinney 
> > >> wrote:
> > >> >
> > >> > > I'm generally supporting of adding the new RPC endpoint.
> > >> > >
> > >> > > To make a couple points from the document
> > >> > >
> > >> > > * I'm not sure what the purpose of returning the FlightDescriptor
> is,
> > >> > > but I haven't thought too hard about it
> > >> > > * The Schema consists of a single IPC message -- dictionaries will
> > >> > > appear in the actual DoGet stream. To motivate why this is --
> > >> > > different endpoints might have different dictionaries
> corresponding to
> > >> > > fields in the schema, to have static/constant dictionaries in a
> > >> > > distributed Flight setting is likely to be impractical. I
> summarize
> > >> > > the issue as "dictionaries are data, not metadata".
> > >> > > * I would be OK calling this GetSchema instead of GetFlightSchema
> but
> > >> > > either is okay
> > >> > >
> > >> > > - Wes
> > >> > >
> > >> > > On Thu, Aug 1, 2019 at 8:08 AM David Li 
> > >> wrote:
> > >> > > >
> > >> > > > Hi Ryan,
> > >> > > >
> > >> > > > Thanks for writing this up! I made a couple of minor comments
> in the
> > >> > > > doc/implementation, but overall I'm in favor of having this RPC
> > >> > > > method.
> > >> > > >
> > >> > > > Best,
> > >> > > > David
> > >> > > >
> > >> > > > On 8/1/19, Ryan Murray  wrote:
> > >> > > > > Hi All,
> > >> > > > >
> > >> > > > > Please see the attached document for a proposed addition to
> the
> > >> > Flight
> > >> > > > > RPC[1]. This is the result of a previous mailing list
> > >> discussion[2].
> > >> > > > >
> > >> > > > > I have created the Pull Request[3] to make the proposal a
> little
> > >> more
> > >> > > > > concrete.
> > >> > > > > 
> > >> > > > > Please let me know if you have any questions or concerns.
> > >> > > > >
> > >> > > > > Best,
> > >> > > > > Ryan
> > >> > > > >
> > >> > > > > [1]:
> > >> > > > >
> > >> > >
> > >> >
> > >>
> https://docs.google.com/document/d/1zLdFYikk3owbKpHvJrARLMlmYpi-Ef6OJy7H90MqViA/edit?usp=sharing
> > >> > > > > [2]:
> > >> > > > >
> > >> > >
> > >> >
> > >>
> https://lists.apache.org/thread.html/3539984493cf3d4d439bef25c150fa9e09e0b43ce0afb6be378d41df@%3Cdev.arrow.apache.org%3E
> > >> > > > > [3]: https://github.com/apache/arrow/pull/4980
> > >> > > > >
> > >> > >
> > >> >
> > >> >
> > >> > --
> > >> >
> > >> > Ryan Murray  | Principal Consulting Engineer
> > >> >
> > >> > +447540852009 | rym...@dremio.com
> > >> >
> > >> > 
> > >> > Check out our GitHub , join our
> > >> community
> > >> > site  & Download Dremio
> > >> > 
> > >> >
> > >>
> > >
> > >
> > > --
> > >
> > > Ryan Murray  | Principal Consulting Engineer
> > >
> > > +447540852009 | rym...@dremio.com
> > >
> > > 
> > > Check out our GitHub , join our
> community
> > > site  & Download Dremio
> > > 
> > >
> >
> >
> > --
> >
> > Ryan Murray  | Principal Consulting Engineer
> >
> > +447540852009 | rym...@dremio.com
> >
> > 
> > Check out our GitHub , join our community
> > site  & Download Dremio
> > 
>

[jira] [Created] (ARROW-6264) [Java] There is no need to consider byte order in ArrowBufHasher

2019-08-15 Thread Liya Fan (JIRA)

Liya Fan created ARROW-6264:
---

 Summary: [Java] There is no need to consider byte order in 
ArrowBufHasher
 Key: ARROW-6264
 URL: https://issues.apache.org/jira/browse/ARROW-6264
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Java
Reporter: Liya Fan
Assignee: Liya Fan


According to the discussion in 
[https://github.com/apache/arrow/pull/5063#issuecomment-521276547|https://github.com/apache/arrow/pull/5063#issuecomment-521276547.],
 Arrow has a mechanism to make sure the data is stored in little-endian, so 
there is no need to check byte order.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Created] (ARROW-6263) [Python] RecordBatch.from_arrays does not check array types against a passed schema

2019-08-15 Thread Wes McKinney (JIRA)

Wes McKinney created ARROW-6263:
---

 Summary: [Python] RecordBatch.from_arrays does not check array 
types against a passed schema
 Key: ARROW-6263
 URL: https://issues.apache.org/jira/browse/ARROW-6263
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Reporter: Wes McKinney
 Fix For: 0.15.0


Example came from ARROW-6038

{code}
In [4]: pa.RecordBatch.from_arrays([pa.array([])], schema)  

Out[4]: 

In [5]: rb = pa.RecordBatch.from_arrays([pa.array([])], schema) 


In [6]: rb  

Out[6]: 

In [7]: rb.schema   

Out[7]: col: string

In [8]: rb[0]   

Out[8]: 

0 nulls

{code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Created] (ARROW-6262) [Developer] Show JIRA issue before merging

2019-08-15 Thread Sutou Kouhei (JIRA)

Sutou Kouhei created ARROW-6262:
---

 Summary: [Developer] Show JIRA issue before merging
 Key: ARROW-6262
 URL: https://issues.apache.org/jira/browse/ARROW-6262
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Developer Tools
Reporter: Sutou Kouhei
Assignee: Sutou Kouhei


It's useful to confirm whehter the associated JIRA issue is right or not.

We couldn't find wrong associated JIRA issue after we merge the pull request 
https://github.com/apache/arrow/pull/5050 .




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Created] (ARROW-6261) [C++] Install any bundled components and add installed CMake or pkgconfig configuration to enable downstream linkers to utilize bundled libraries when statically linking

2019-08-15 Thread Wes McKinney (JIRA)

Wes McKinney created ARROW-6261:
---

 Summary: [C++] Install any bundled components and add installed 
CMake or pkgconfig configuration to enable downstream linkers to utilize 
bundled libraries when statically linking
 Key: ARROW-6261
 URL: https://issues.apache.org/jira/browse/ARROW-6261
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++
Reporter: Wes McKinney


The objective of this change would be to make it easier for toolchain builders 
to ship bundled thirdparty libraries together with the Arrow libraries in case 
there is a particular library version that is only used when linking with 
{{libarrow.a}}. In theory configuration could be added to arrowTargets.cmake 
(or pkgconfig) to simplify static linking



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Re: [DISCUSS] Add GetFlightSchema to Flight RPC

2019-08-15 Thread Wes McKinney

Yes, I think having a vote as a procedural matter would be a good thing.

I have run dozens of public and private votes in my role as a PMC
member. I would appreciate if another PMC would assist with this vote.

Thanks

On Wed, Aug 14, 2019 at 5:37 PM Ryan Murray  wrote:
>
> Hi All,
>
> Does this require a vote? If yes what is the process for initiating one &
> if no I hope this is enough time for feedback and I would like to remove
> the draft designation from the PR
>
> Best,
> Ryan
>
> On Wed, Aug 7, 2019 at 9:31 AM Ryan Murray  wrote:
>
> > As per everyone's feedback I have renamed GetFlightSchema -> GetSchema and
> > have removed the descriptor on the rpc result message. The doc has been
> > updated as has the draft PR
> >
> > On Thu, Aug 1, 2019 at 6:32 PM Bryan Cutler  wrote:
> >
> >> Sounds good to me, I would just echo what others have said.
> >>
> >> On Thu, Aug 1, 2019 at 8:17 AM Ryan Murray  wrote:
> >>
> >> > Thanks Wes,
> >> >
> >> > The descriptor is only there to maintain a bit of symmetry with
> >> > GetFlightInfo. Happy to remove it, I don't think its necessary and
> >> already
> >> > a few people agree. Similar with the method name, I am neutral to the
> >> > naming and can call it whatever the community is happy with.
> >> >
> >> > Best,
> >> > Ryan
> >> >
> >> > On Thu, Aug 1, 2019 at 3:56 PM Wes McKinney 
> >> wrote:
> >> >
> >> > > I'm generally supporting of adding the new RPC endpoint.
> >> > >
> >> > > To make a couple points from the document
> >> > >
> >> > > * I'm not sure what the purpose of returning the FlightDescriptor is,
> >> > > but I haven't thought too hard about it
> >> > > * The Schema consists of a single IPC message -- dictionaries will
> >> > > appear in the actual DoGet stream. To motivate why this is --
> >> > > different endpoints might have different dictionaries corresponding to
> >> > > fields in the schema, to have static/constant dictionaries in a
> >> > > distributed Flight setting is likely to be impractical. I summarize
> >> > > the issue as "dictionaries are data, not metadata".
> >> > > * I would be OK calling this GetSchema instead of GetFlightSchema but
> >> > > either is okay
> >> > >
> >> > > - Wes
> >> > >
> >> > > On Thu, Aug 1, 2019 at 8:08 AM David Li 
> >> wrote:
> >> > > >
> >> > > > Hi Ryan,
> >> > > >
> >> > > > Thanks for writing this up! I made a couple of minor comments in the
> >> > > > doc/implementation, but overall I'm in favor of having this RPC
> >> > > > method.
> >> > > >
> >> > > > Best,
> >> > > > David
> >> > > >
> >> > > > On 8/1/19, Ryan Murray  wrote:
> >> > > > > Hi All,
> >> > > > >
> >> > > > > Please see the attached document for a proposed addition to the
> >> > Flight
> >> > > > > RPC[1]. This is the result of a previous mailing list
> >> discussion[2].
> >> > > > >
> >> > > > > I have created the Pull Request[3] to make the proposal a little
> >> more
> >> > > > > concrete.
> >> > > > > 
> >> > > > > Please let me know if you have any questions or concerns.
> >> > > > >
> >> > > > > Best,
> >> > > > > Ryan
> >> > > > >
> >> > > > > [1]:
> >> > > > >
> >> > >
> >> >
> >> https://docs.google.com/document/d/1zLdFYikk3owbKpHvJrARLMlmYpi-Ef6OJy7H90MqViA/edit?usp=sharing
> >> > > > > [2]:
> >> > > > >
> >> > >
> >> >
> >> https://lists.apache.org/thread.html/3539984493cf3d4d439bef25c150fa9e09e0b43ce0afb6be378d41df@%3Cdev.arrow.apache.org%3E
> >> > > > > [3]: https://github.com/apache/arrow/pull/4980
> >> > > > >
> >> > >
> >> >
> >> >
> >> > --
> >> >
> >> > Ryan Murray  | Principal Consulting Engineer
> >> >
> >> > +447540852009 | rym...@dremio.com
> >> >
> >> > 
> >> > Check out our GitHub , join our
> >> community
> >> > site  & Download Dremio
> >> > 
> >> >
> >>
> >
> >
> > --
> >
> > Ryan Murray  | Principal Consulting Engineer
> >
> > +447540852009 | rym...@dremio.com
> >
> > 
> > Check out our GitHub , join our community
> > site  & Download Dremio
> > 
> >
>
>
> --
>
> Ryan Murray  | Principal Consulting Engineer
>
> +447540852009 | rym...@dremio.com
>
> 
> Check out our GitHub , join our community
> site  & Download Dremio
>

Re: Timeline for 0.15.0 release

2019-08-15 Thread Wes McKinney

The Windows wheel issue in 0.14.1 seems to be

https://issues.apache.org/jira/browse/ARROW-6015

I think the root cause could be the Windows changes in

https://github.com/apache/arrow/commit/223ae744cc2a12c60cecb5db593263a03c13f85a

I would be appreciative if a volunteer would look into what was wrong
with the 0.14.1 wheels on Windows. Otherwise 0.15.0 Windows wheels
will be broken, too

The bad wheels can be found at

https://bintray.com/apache/arrow/python#files/python%2F0.14.1

On Thu, Aug 15, 2019 at 1:28 PM Antoine Pitrou  wrote:
>
> On Thu, 15 Aug 2019 11:17:07 -0700
> Micah Kornfield  wrote:
> > >
> > > In C++ they are
> > > independent, we could have 32-bit array lengths and variable-length
> > > types with 64-bit offsets if we wanted (we just wouldn't be able to
> > > have a List child with more than INT32_MAX elements).
> >
> > I think the point is we could do this in C++ but we don't.  I'm not sure we
> > would have introduced the "Large" types if we did.
>
> 64-bit offsets take twice as much space as 32-bit offsets, so if you're
> storing lots of small-ish lists or strings, 32-bit offsets are
> preferrable.  So even with 64-bit array lengths from the start it would
> still be beneficial to have types with 32-bit offsets.
>
> > Going with the limited address space in Java and calling it a reference
> > implementation seems suboptimal. If a consumer uses a "Large" type
> > presumably it is because they need the ability to store more than INT32_MAX
> > child elements in a column, otherwise it is just wasting space [1].
>
> Probably. Though if the individual elements (lists or strings) are
> large, not much space is wasted in proportion, so it may be simpler in
> such a case to always create a "Large" type array.
>
> > [1] I suppose theoretically there might be some performance benefits on
> > 64-bit architectures to using the native word sizes.
>
> Concretely, common 64-bit architectures don't do that, as 32-bit is an
> extremely common integer size even in high-performance code.
>
> Regards
>
> Antoine.
>
>

[jira] [Created] (ARROW-6260) [Website] Use deploy key on Travis to build and push to asf-site

2019-08-15 Thread Neal Richardson (JIRA)

Neal Richardson created ARROW-6260:
--

 Summary: [Website] Use deploy key on Travis to build and push to 
asf-site
 Key: ARROW-6260
 URL: https://issues.apache.org/jira/browse/ARROW-6260
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Website
Reporter: Neal Richardson
Assignee: Neal Richardson


ARROW-4473 added CI/CD for the website, but there was some discomfort about 
having a committer provide a GitHub personal access token to do the pushing of 
the built site to the asf-site branch. Investigate using GitHub Deploy Keys 
instead, which are scoped to a single repository, not all public repositories 
that a user has access to.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Created] (ARROW-6259) [C++][CI] Flatbuffers-related failures in CI on macOS

2019-08-15 Thread Wes McKinney (JIRA)

Wes McKinney created ARROW-6259:
---

 Summary: [C++][CI] Flatbuffers-related failures in CI on macOS
 Key: ARROW-6259
 URL: https://issues.apache.org/jira/browse/ARROW-6259
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Reporter: Wes McKinney
 Fix For: 0.15.0


This seemingly has just started happening randomly today

https://travis-ci.org/apache/arrow/jobs/572381802#L2864



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Created] (ARROW-6258) [R] Add macOS build scripts

2019-08-15 Thread Neal Richardson (JIRA)

Neal Richardson created ARROW-6258:
--

 Summary: [R] Add macOS build scripts
 Key: ARROW-6258
 URL: https://issues.apache.org/jira/browse/ARROW-6258
 Project: Apache Arrow
  Issue Type: Improvement
  Components: R
Reporter: Neal Richardson
Assignee: Neal Richardson


CRAN builds binary packages for Windows and macOS. It generally does this by 
building on its servers and bundling all dependencies in the R package. This 
has been accomplished by having separate processes for building and hosting 
system dependencies, and then downloading and bundling those with scripts that 
get executed at install time (and then create the binary package as a side 
effect).

ARROW-3758 added the Windows PKGBUILD and related packaging scripts and ran 
them on our Appveyor. This ticket is to do the same for the macOS scripts.

The purpose of these tickets is to bring the whole build pipeline under our 
version control and CI so that we can address any C++ build and dependency 
changes as they arise and not be surprised when it comes time to cut a release. 
A side benefit is that they also enable us to offer a nightly binary package 
repository with minimal additional effort.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Created] (ARROW-6257) [C++] Add fnmatch compatible globbing function

2019-08-15 Thread Benjamin Kietzman (JIRA)

Benjamin Kietzman created ARROW-6257:


 Summary: [C++] Add fnmatch compatible globbing function
 Key: ARROW-6257
 URL: https://issues.apache.org/jira/browse/ARROW-6257
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++
Reporter: Benjamin Kietzman
Assignee: Benjamin Kietzman


This will be useful for the filesystems module and in datasource discovery, 
which uses it



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Created] (ARROW-6256) [Rust] parquet-format should be released by Apache process

2019-08-15 Thread Andy Grove (JIRA)

Andy Grove created ARROW-6256:
-

 Summary: [Rust] parquet-format should be released by Apache process
 Key: ARROW-6256
 URL: https://issues.apache.org/jira/browse/ARROW-6256
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Affects Versions: 0.14.1
Reporter: Andy Grove
 Fix For: 0.15.0


The Arrow parquet crate depends on the parquet-format crate. Parquet-format 
2.5.0 was recently released and has breaking changes compared to 2.4.0.

This means that previously published Arrow Parquet/DataFusion crates are now 
unusable out the box (see https://issues.apache.org/jira/browse/ARROW-6255).

We should bring parquet-format into an Apache release process to avoid this 
type of issue in the future.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Created] (ARROW-6255) [Rust] [Parquet] Cannot use any published parquet crate due to parquet-format breaking change

2019-08-15 Thread Andy Grove (JIRA)

Andy Grove created ARROW-6255:
-

 Summary: [Rust] [Parquet] Cannot use any published parquet crate 
due to parquet-format breaking change
 Key: ARROW-6255
 URL: https://issues.apache.org/jira/browse/ARROW-6255
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust
Affects Versions: 0.14.1, 0.14.0, 0.13.0, 0.12.1, 0.12.0
Reporter: Andy Grove
 Fix For: 0.15.0


As a user who wants to use the Rust version of Arrow, I am unable to use any of 
the previously published versions due to the recent breaking change in 
parquet-format 2.5.0.

To reproduce, simply create an empty Rust project using "cargo init example 
--bin", add a dependency on "parquet-0.14.1" and attempt to build the project.
{code:java}
   Compiling parquet v0.13.0

error[E0599]: no variant or associated item named `BOOLEAN` found for type 
`parquet_format::parquet_format::Type` in the current scope

   --> 
/Users/agrove/.cargo/registry/src/github.com-1ecc6299db9ec823/parquet-0.13.0/src/basic.rs:408:28

    |

408 |             parquet::Type::BOOLEAN => Type::BOOLEAN,

    |                            ^^^ variant or associated item not found 
in `parquet_format::parquet_format::Type`{code}
This bug has already been fixed in master, but there is no usable published 
crate. We could consider publishing a 0.14.2 to resolve this or just wait until 
the 0.15.0 release. We could also consider using this Jira to at least document 
a workaround, if one exists (maybe Cargo provides a mechanism for overriding 
transitive dependencies?).



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Created] (ARROW-6254) [Rust][Parquet] Parquet dependency fails to compile

2019-08-15 Thread Dongha Lee (JIRA)

Dongha Lee created ARROW-6254:
-

 Summary: [Rust][Parquet] Parquet dependency fails to compile
 Key: ARROW-6254
 URL: https://issues.apache.org/jira/browse/ARROW-6254
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust
Affects Versions: 0.14.1
Reporter: Dongha Lee


Hi,

I set up a blank rust project, added dependency `parquet = "0.14.1"` and ran 
`cargo build`. But unfortunately, it with a large error message.

Use used rust nightly: `cargo 1.38.0-nightly` and `rustc 1.38.0-nightly`. It 
failed both on arch and ubuntu.

I tried to build directly in 
`.cargo/registry/src/github.com-1ecc6299db9ec823/parquet-0.14.1` but it failed.

I cloned arrow repository and tried to build in the directory `rust/parquet` 
and it succeeded. But as soon I moved the rust/parquet to some other location, 
the build failed. So my guess is that the failure has to do something with 
dependent modules `rust/arrow`.

Is this a known issue? I couldn't find any ticket for that.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Re: Timeline for 0.15.0 release

2019-08-15 Thread Antoine Pitrou

On Thu, 15 Aug 2019 11:17:07 -0700
Micah Kornfield  wrote:
> >
> > In C++ they are
> > independent, we could have 32-bit array lengths and variable-length
> > types with 64-bit offsets if we wanted (we just wouldn't be able to
> > have a List child with more than INT32_MAX elements).  
> 
> I think the point is we could do this in C++ but we don't.  I'm not sure we
> would have introduced the "Large" types if we did.

64-bit offsets take twice as much space as 32-bit offsets, so if you're
storing lots of small-ish lists or strings, 32-bit offsets are
preferrable.  So even with 64-bit array lengths from the start it would
still be beneficial to have types with 32-bit offsets.

> Going with the limited address space in Java and calling it a reference
> implementation seems suboptimal. If a consumer uses a "Large" type
> presumably it is because they need the ability to store more than INT32_MAX
> child elements in a column, otherwise it is just wasting space [1].

Probably. Though if the individual elements (lists or strings) are
large, not much space is wasted in proportion, so it may be simpler in
such a case to always create a "Large" type array.

> [1] I suppose theoretically there might be some performance benefits on
> 64-bit architectures to using the native word sizes.

Concretely, common 64-bit architectures don't do that, as 32-bit is an
extremely common integer size even in high-performance code.

Regards

Antoine.

Re: Timeline for 0.15.0 release

2019-08-15 Thread Micah Kornfield

>
> In C++ they are
> independent, we could have 32-bit array lengths and variable-length
> types with 64-bit offsets if we wanted (we just wouldn't be able to
> have a List child with more than INT32_MAX elements).

I think the point is we could do this in C++ but we don't.  I'm not sure we
would have introduced the "Large" types if we did.
We will have to do this Java, it we don't want to convert to 64-bit
addressing.

Going with the limited address space in Java and calling it a reference
implementation seems suboptimal. If a consumer uses a "Large" type
presumably it is because they need the ability to store more than INT32_MAX
child elements in a column, otherwise it is just wasting space [1].

Let's pause until next week when Jacques is back online (and continue on
the other thread).  Like I said I think there is enough time either way to
get something in along the timeline we expect for the next release.

[1] I suppose theoretically there might be some performance benefits on
64-bit architectures to using the native word sizes.

On Thu, Aug 15, 2019 at 10:59 AM Wes McKinney  wrote:

> On Thu, Aug 15, 2019 at 12:00 AM Micah Kornfield 
> wrote:
> >
> > Hi Wes,
> > >
> > > Do these need to be dependent on the 64-bit array length discussion?
> >
> > We could hack something that can read the lower 32-bit range, so I guess
> > not, but this leaves a bad taste in my mouth.  I think there is likely
> > still enough time to have the discussion and get these implemented, one
> way
> > or another.
> >
>
> I guess I still don't understand how the array lengths and the
> List/Varchar offsets are related to each other. I probably just
> haven't looked at the Java library enough. In C++ they are
> independent, we could have 32-bit array lengths and variable-length
> types with 64-bit offsets if we wanted (we just wouldn't be able to
> have a List child with more than INT32_MAX elements). We would have to
> do a limited amount of boundschecking at IPC boundary points (like
> Java is checking presumably now for vectors exceeding INT32_MAX).
>
> > For the record, I don't think we should hold a major release hostage
> > > if we aren't able to complete various feature milestones in time.
> > > Since it's been about 5-6 weeks since 0.14.0 we're coming close to the
> > > desired 8-10 week timeline for major releases, so if we need to have
> > > 0.16.0 prior to 1.0.0, I think that is OK also.
> >
> > I agree with the time based milestones in practice, but we are
> backpedaling
> > on the intent to keep type parity between the two reference
> > implementations.  At least the way I read the previous threads on the
> > topic, I thought there was lazy consensus that in lieu of requiring
> working
> > implementations in Java and C++ be checked in at the same time, we would
> > rely on the release as a mechanism to forcing function for parity.
> >
>
> I agree with the intent and spirit of the idea, but it seems we have a
> can of worms on our hands now and so I don't think we should keep from
> releasing the work that has been completed if consensus about Java
> changes is not reached in time.
>
> > Thanks,
> > Micah
> >
> > On Wed, Aug 14, 2019 at 11:32 AM Antoine Pitrou 
> wrote:
> >
> > >
> > > Agreed with Wes.
> > >
> > > Regards
> > >
> > > Antoine.
> > >
> > >
> > > Le 14/08/2019 à 20:30, Wes McKinney a écrit :
> > > > For the record, I don't think we should hold a major release hostage
> > > > if we aren't able to complete various feature milestones in time.
> > > > Since it's been about 5-6 weeks since 0.14.0 we're coming close to
> the
> > > > desired 8-10 week timeline for major releases, so if we need to have
> > > > 0.16.0 prior to 1.0.0, I think that is OK also.
> > > >
> > > > On Wed, Aug 14, 2019 at 11:45 AM Wes McKinney 
> > > wrote:
> > > >>
> > > >> On Wed, Aug 14, 2019 at 11:43 AM Micah Kornfield <
> emkornfi...@gmail.com>
> > > wrote:
> > > >>>
> > > 
> > >   is there anything else that has come up that
> > >  definitely needs to happen before we can release again?
> > > >>>
> > > >>> We need to decide on a way forward for LargeList, LargeBinary, etc,
> > > types...
> > > >>>
> > > >>
> > > >> Do these need to be dependent on the 64-bit array length discussion?
> > > >> They seem somewhat orthogonal to me. If we have to release 0.15.0
> > > >> without the Java side of these, that's OK with me, since reaching
> > > >> format implementation completeness is more of a 1.0.0 concern
> > > >>
> > > >>> On Tue, Aug 13, 2019 at 8:27 PM Wes McKinney 
> > > wrote:
> > > >>>
> > >  hi folks,
> > > 
> > >  Since there have been a number of fairly serious issues (e.g.
> > >  ARROW-6060) since 0.14.1 that have been fixed I think we should
> start
> > >  planning of the next major release. Note that we still have some
> > >  format-related work (the Flatbuffers alignment issue) that ought
> to be
> > >  resolved (not a small task since it affects 4 or 5
> implementations),
> > >

Re: Timeline for 0.15.0 release

2019-08-15 Thread Wes McKinney

On Thu, Aug 15, 2019 at 12:00 AM Micah Kornfield  wrote:
>
> Hi Wes,
> >
> > Do these need to be dependent on the 64-bit array length discussion?
>
> We could hack something that can read the lower 32-bit range, so I guess
> not, but this leaves a bad taste in my mouth.  I think there is likely
> still enough time to have the discussion and get these implemented, one way
> or another.
>

I guess I still don't understand how the array lengths and the
List/Varchar offsets are related to each other. I probably just
haven't looked at the Java library enough. In C++ they are
independent, we could have 32-bit array lengths and variable-length
types with 64-bit offsets if we wanted (we just wouldn't be able to
have a List child with more than INT32_MAX elements). We would have to
do a limited amount of boundschecking at IPC boundary points (like
Java is checking presumably now for vectors exceeding INT32_MAX).

> For the record, I don't think we should hold a major release hostage
> > if we aren't able to complete various feature milestones in time.
> > Since it's been about 5-6 weeks since 0.14.0 we're coming close to the
> > desired 8-10 week timeline for major releases, so if we need to have
> > 0.16.0 prior to 1.0.0, I think that is OK also.
>
> I agree with the time based milestones in practice, but we are backpedaling
> on the intent to keep type parity between the two reference
> implementations.  At least the way I read the previous threads on the
> topic, I thought there was lazy consensus that in lieu of requiring working
> implementations in Java and C++ be checked in at the same time, we would
> rely on the release as a mechanism to forcing function for parity.
>

I agree with the intent and spirit of the idea, but it seems we have a
can of worms on our hands now and so I don't think we should keep from
releasing the work that has been completed if consensus about Java
changes is not reached in time.

> Thanks,
> Micah
>
> On Wed, Aug 14, 2019 at 11:32 AM Antoine Pitrou  wrote:
>
> >
> > Agreed with Wes.
> >
> > Regards
> >
> > Antoine.
> >
> >
> > Le 14/08/2019 à 20:30, Wes McKinney a écrit :
> > > For the record, I don't think we should hold a major release hostage
> > > if we aren't able to complete various feature milestones in time.
> > > Since it's been about 5-6 weeks since 0.14.0 we're coming close to the
> > > desired 8-10 week timeline for major releases, so if we need to have
> > > 0.16.0 prior to 1.0.0, I think that is OK also.
> > >
> > > On Wed, Aug 14, 2019 at 11:45 AM Wes McKinney 
> > wrote:
> > >>
> > >> On Wed, Aug 14, 2019 at 11:43 AM Micah Kornfield 
> > wrote:
> > >>>
> > 
> >   is there anything else that has come up that
> >  definitely needs to happen before we can release again?
> > >>>
> > >>> We need to decide on a way forward for LargeList, LargeBinary, etc,
> > types...
> > >>>
> > >>
> > >> Do these need to be dependent on the 64-bit array length discussion?
> > >> They seem somewhat orthogonal to me. If we have to release 0.15.0
> > >> without the Java side of these, that's OK with me, since reaching
> > >> format implementation completeness is more of a 1.0.0 concern
> > >>
> > >>> On Tue, Aug 13, 2019 at 8:27 PM Wes McKinney 
> > wrote:
> > >>>
> >  hi folks,
> > 
> >  Since there have been a number of fairly serious issues (e.g.
> >  ARROW-6060) since 0.14.1 that have been fixed I think we should start
> >  planning of the next major release. Note that we still have some
> >  format-related work (the Flatbuffers alignment issue) that ought to be
> >  resolved (not a small task since it affects 4 or 5 implementations),
> >  but aside from that, is there anything else that has come up that
> >  definitely needs to happen before we can release again?
> > 
> >  I would say cutting a release somewhere around the US Labor Day
> >  holiday (~the week after or so) would be called for.
> > 
> >  Thanks,
> >  Wes
> > 
> >

[jira] [Created] (ARROW-6253) [Python] Expose "enable_buffered_stream" option from parquet::ReaderProperties in pyarrow.parquet.read_table

2019-08-15 Thread Wes McKinney (JIRA)

Wes McKinney created ARROW-6253:
---

 Summary: [Python] Expose "enable_buffered_stream" option from 
parquet::ReaderProperties in pyarrow.parquet.read_table
 Key: ARROW-6253
 URL: https://issues.apache.org/jira/browse/ARROW-6253
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Reporter: Wes McKinney
 Fix For: 0.15.0


See also PARQUET-1370



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Created] (ARROW-6252) [Python] Add pyarrow.Array.diff_contents method

2019-08-15 Thread Wes McKinney (JIRA)

Wes McKinney created ARROW-6252:
---

 Summary: [Python] Add pyarrow.Array.diff_contents method
 Key: ARROW-6252
 URL: https://issues.apache.org/jira/browse/ARROW-6252
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Reporter: Wes McKinney
 Fix For: 0.15.0


This would expose the Array diffing functionality in Python to make it easier 
to see why arrays are unequal



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Created] (ARROW-6251) [Developer] Add PR merge tool to apache/arrow-site

2019-08-15 Thread Wes McKinney (JIRA)

Wes McKinney created ARROW-6251:
---

 Summary: [Developer] Add PR merge tool to apache/arrow-site
 Key: ARROW-6251
 URL: https://issues.apache.org/jira/browse/ARROW-6251
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Developer Tools
Reporter: Wes McKinney
 Fix For: 0.15.0


This will help with creating clean patches and also keeping JIRA clean



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Created] (ARROW-6250) [Java] Implement ApproxEqualsVisitor comparing approx for floating point

2019-08-15 Thread Ji Liu (JIRA)

Ji Liu created ARROW-6250:
-

 Summary: [Java] Implement ApproxEqualsVisitor comparing approx for 
floating point
 Key: ARROW-6250
 URL: https://issues.apache.org/jira/browse/ARROW-6250
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Java
Reporter: Ji Liu
Assignee: Ji Liu


Currently we already implemented {{RangeEqualsVisitor/VectorEqualsVisitor}} for 
comparing range/vector.

And ARROW-6211 is created to make {{ValueVector}} work with generic visitor.

We should also implement {{ApproxEqualsVisitor}} to compare floating point just 
like cpp does

[https://github.com/apache/arrow/blob/master/cpp/src/arrow/compare.cc]



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Created] (ARROW-6249) [Java] Remove useless class ByteArrayWrapper

2019-08-15 Thread Ji Liu (JIRA)

Ji Liu created ARROW-6249:
-

 Summary: [Java] Remove useless class ByteArrayWrapper
 Key: ARROW-6249
 URL: https://issues.apache.org/jira/browse/ARROW-6249
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Java
Reporter: Ji Liu
Assignee: Ji Liu


This class was introduced into encoding part to compare byte[] values equals.

Since now we compare value/vector equals by new added visitor API by ARROW-6022 
instead of  comparing {{getObject}}, this class is no use anymore.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Created] (ARROW-6248) [Python] Use FileNotFoundError in HadoopFileSystem.open() in Python 3

2019-08-15 Thread Alexander Schepanovski (JIRA)

Alexander Schepanovski created ARROW-6248:
-

 Summary: [Python] Use FileNotFoundError in HadoopFileSystem.open() 
in Python 3 
 Key: ARROW-6248
 URL: https://issues.apache.org/jira/browse/ARROW-6248
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Affects Versions: 0.14.1
Reporter: Alexander Schepanovski


When file is absent pyarrow throws 
{code:python}
ArrowIOError('HDFS file does not exist: ...')
{code}
which inherits from {{IOError}} and {{pyarrow.lib.ArrowException}}, it would be 
better if that was {{FileNotFoundError}} a subclass of {{IOError}} for this 
particular purpose. Also, {{.errno}} property is empty (should be 2) so one 
needs to match by error message to check for particular error.

*P.S.* There is no  {{FileNotFoundError}} in Python 2, but there is `.errno` 
property there.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Re: [VOTE] Alter Arrow binary protocol to address 8-byte Flatbuffer alignment requirements (2nd vote)

2019-08-15 Thread Micah Kornfield

>
>
> Actually, it looks like my Mac's version of UBSAN doesn't detect the issue
> at all.  I will try on linux a by EOD.

Actually, the issue was I had alignment checks off.  I verify this works
and it appears there is a UBSan issue with flatbuffers::Verify (I'll try to
see if I can find the issue and make a PR upstream).

On Thu, Aug 15, 2019 at 1:03 AM Micah Kornfield 
wrote:

> I verified with these changes [1], without backwards compatibility
>> support, UBSAN runs cleanly for IPC tests in C++
>
> Actually, it looks like my Mac's version of UBSAN doesn't detect the issue
> at all.  I will try on linux a by EOD.
>
> On Thu, Aug 15, 2019 at 12:52 AM Micah Kornfield 
> wrote:
>
>> +1
>>
>> I verified with these changes [1], without backwards compatibility
>> support, UBSAN runs cleanly for IPC tests in C++
>>
>> Just wanted to clarify:
>>
>>> Additionally with this vote, we want to formally approve the change to
>>> the Arrow "file" format to always write the (new 8-byte) end-of-stream
>>> marker, which enables code that processes Arrow streams to safely read
>>> the file's internal messages as though they were a normal stream.
>>
>> This only allows for reading messages safely, we still aren't
>> guaranteeing dictionary batches occur in the file before they are used,
>> correct?
>>
>> Thanks,
>> Micah
>>
>> [1]
>> https://github.com/emkornfield/arrow/commit/8b8348d8bcf62b50c35ddb4926f3d501b4f7147c
>>
>>
>> On Wed, Aug 14, 2019 at 3:43 PM Wes McKinney  wrote:
>>
>>> hi all,
>>>
>>> As we've been discussing [1], there is a need to introduce 4 bytes of
>>> padding into the preamble of the "encapsulated IPC message" format to
>>> ensure that the Flatbuffers metadata payload begins on an 8-byte
>>> aligned memory offset. The alternative to this would be for Arrow
>>> implementations where alignment is important (e.g. C or C++) to copy
>>> the metadata (which is not always small) into memory when it is
>>> unaligned.
>>>
>>> Micah has proposed to address this by adding a
>>> 4-byte "continuation" value at the beginning of the payload
>>> having the value 0x. The reason to do it this way is that
>>> old clients will see an invalid length (what is currently the
>>> first 4 bytes of the message -- a 32-bit little endian signed
>>> integer indicating the metadata length) rather than potentially
>>> crashing on a valid length. We also propose to expand the "end of
>>> stream" marker used in the stream and file format from 4 to 8
>>> bytes. This has the additional effect of aligning the file footer
>>> defined in File.fbs.
>>>
>>> This would be a backwards incompatible protocol change, so older Arrow
>>> libraries would not be able to read these new messages. Maintaining
>>> forward compatibility (reading data produced by older libraries) would
>>> be possible as we can reason that a value other than the continuation
>>> value was produced by an older library (and then validate the
>>> Flatbuffer message of course). Arrow implementations could offer a
>>> backward compatibility mode for the sake of old readers if they desire
>>> (this may also assist with testing).
>>>
>>> Additionally with this vote, we want to formally approve the change to
>>> the Arrow "file" format to always write the (new 8-byte) end-of-stream
>>> marker, which enables code that processes Arrow streams to safely read
>>> the file's internal messages as though they were a normal stream.
>>>
>>> The PR making these changes to the IPC documentation is here
>>>
>>> https://github.com/apache/arrow/pull/4951
>>>
>>> Please vote to accept these changes. This vote will be open for at
>>> least 72 hours
>>>
>>> [ ] +1 Adopt these Arrow protocol changes
>>> [ ] +0
>>> [ ] -1 I disagree because...
>>>
>>> Here is my vote: +1
>>>
>>> Thanks,
>>> Wes
>>>
>>> [1]:
>>> https://lists.apache.org/thread.html/8440be572c49b7b2ffb76b63e6d935ada9efd9c1c2021369b6d27786@%3Cdev.arrow.apache.org%3E
>>>
>>

Re: [VOTE] Alter Arrow binary protocol to address 8-byte Flatbuffer alignment requirements (2nd vote)

2019-08-15 Thread Micah Kornfield

>
> I verified with these changes [1], without backwards compatibility
> support, UBSAN runs cleanly for IPC tests in C++

Actually, it looks like my Mac's version of UBSAN doesn't detect the issue
at all.  I will try on linux a by EOD.

On Thu, Aug 15, 2019 at 12:52 AM Micah Kornfield 
wrote:

> +1
>
> I verified with these changes [1], without backwards compatibility
> support, UBSAN runs cleanly for IPC tests in C++
>
> Just wanted to clarify:
>
>> Additionally with this vote, we want to formally approve the change to
>> the Arrow "file" format to always write the (new 8-byte) end-of-stream
>> marker, which enables code that processes Arrow streams to safely read
>> the file's internal messages as though they were a normal stream.
>
> This only allows for reading messages safely, we still aren't guaranteeing
> dictionary batches occur in the file before they are used, correct?
>
> Thanks,
> Micah
>
> [1]
> https://github.com/emkornfield/arrow/commit/8b8348d8bcf62b50c35ddb4926f3d501b4f7147c
>
>
> On Wed, Aug 14, 2019 at 3:43 PM Wes McKinney  wrote:
>
>> hi all,
>>
>> As we've been discussing [1], there is a need to introduce 4 bytes of
>> padding into the preamble of the "encapsulated IPC message" format to
>> ensure that the Flatbuffers metadata payload begins on an 8-byte
>> aligned memory offset. The alternative to this would be for Arrow
>> implementations where alignment is important (e.g. C or C++) to copy
>> the metadata (which is not always small) into memory when it is
>> unaligned.
>>
>> Micah has proposed to address this by adding a
>> 4-byte "continuation" value at the beginning of the payload
>> having the value 0x. The reason to do it this way is that
>> old clients will see an invalid length (what is currently the
>> first 4 bytes of the message -- a 32-bit little endian signed
>> integer indicating the metadata length) rather than potentially
>> crashing on a valid length. We also propose to expand the "end of
>> stream" marker used in the stream and file format from 4 to 8
>> bytes. This has the additional effect of aligning the file footer
>> defined in File.fbs.
>>
>> This would be a backwards incompatible protocol change, so older Arrow
>> libraries would not be able to read these new messages. Maintaining
>> forward compatibility (reading data produced by older libraries) would
>> be possible as we can reason that a value other than the continuation
>> value was produced by an older library (and then validate the
>> Flatbuffer message of course). Arrow implementations could offer a
>> backward compatibility mode for the sake of old readers if they desire
>> (this may also assist with testing).
>>
>> Additionally with this vote, we want to formally approve the change to
>> the Arrow "file" format to always write the (new 8-byte) end-of-stream
>> marker, which enables code that processes Arrow streams to safely read
>> the file's internal messages as though they were a normal stream.
>>
>> The PR making these changes to the IPC documentation is here
>>
>> https://github.com/apache/arrow/pull/4951
>>
>> Please vote to accept these changes. This vote will be open for at
>> least 72 hours
>>
>> [ ] +1 Adopt these Arrow protocol changes
>> [ ] +0
>> [ ] -1 I disagree because...
>>
>> Here is my vote: +1
>>
>> Thanks,
>> Wes
>>
>> [1]:
>> https://lists.apache.org/thread.html/8440be572c49b7b2ffb76b63e6d935ada9efd9c1c2021369b6d27786@%3Cdev.arrow.apache.org%3E
>>
>

Re: [VOTE] Alter Arrow binary protocol to address 8-byte Flatbuffer alignment requirements (2nd vote)

2019-08-15 Thread Micah Kornfield

+1

I verified with these changes [1], without backwards compatibility support,
UBSAN runs cleanly for IPC tests in C++

Just wanted to clarify:

> Additionally with this vote, we want to formally approve the change to
> the Arrow "file" format to always write the (new 8-byte) end-of-stream
> marker, which enables code that processes Arrow streams to safely read
> the file's internal messages as though they were a normal stream.

This only allows for reading messages safely, we still aren't guaranteeing
dictionary batches occur in the file before they are used, correct?

Thanks,
Micah

[1]
https://github.com/emkornfield/arrow/commit/8b8348d8bcf62b50c35ddb4926f3d501b4f7147c


On Wed, Aug 14, 2019 at 3:43 PM Wes McKinney  wrote:

> hi all,
>
> As we've been discussing [1], there is a need to introduce 4 bytes of
> padding into the preamble of the "encapsulated IPC message" format to
> ensure that the Flatbuffers metadata payload begins on an 8-byte
> aligned memory offset. The alternative to this would be for Arrow
> implementations where alignment is important (e.g. C or C++) to copy
> the metadata (which is not always small) into memory when it is
> unaligned.
>
> Micah has proposed to address this by adding a
> 4-byte "continuation" value at the beginning of the payload
> having the value 0x. The reason to do it this way is that
> old clients will see an invalid length (what is currently the
> first 4 bytes of the message -- a 32-bit little endian signed
> integer indicating the metadata length) rather than potentially
> crashing on a valid length. We also propose to expand the "end of
> stream" marker used in the stream and file format from 4 to 8
> bytes. This has the additional effect of aligning the file footer
> defined in File.fbs.
>
> This would be a backwards incompatible protocol change, so older Arrow
> libraries would not be able to read these new messages. Maintaining
> forward compatibility (reading data produced by older libraries) would
> be possible as we can reason that a value other than the continuation
> value was produced by an older library (and then validate the
> Flatbuffer message of course). Arrow implementations could offer a
> backward compatibility mode for the sake of old readers if they desire
> (this may also assist with testing).
>
> Additionally with this vote, we want to formally approve the change to
> the Arrow "file" format to always write the (new 8-byte) end-of-stream
> marker, which enables code that processes Arrow streams to safely read
> the file's internal messages as though they were a normal stream.
>
> The PR making these changes to the IPC documentation is here
>
> https://github.com/apache/arrow/pull/4951
>
> Please vote to accept these changes. This vote will be open for at
> least 72 hours
>
> [ ] +1 Adopt these Arrow protocol changes
> [ ] +0
> [ ] -1 I disagree because...
>
> Here is my vote: +1
>
> Thanks,
> Wes
>
> [1]:
> https://lists.apache.org/thread.html/8440be572c49b7b2ffb76b63e6d935ada9efd9c1c2021369b6d27786@%3Cdev.arrow.apache.org%3E
>

[VOTE] Proposed addition to Arrow Flight Protocol

[jira] [Created] (ARROW-6265) [Java] Avro adapter implement Array/Map/Fixed type

Re: [DISCUSS] Add GetFlightSchema to Flight RPC

[jira] [Created] (ARROW-6264) [Java] There is no need to consider byte order in ArrowBufHasher

[jira] [Created] (ARROW-6263) [Python] RecordBatch.from_arrays does not check array types against a passed schema

[jira] [Created] (ARROW-6262) [Developer] Show JIRA issue before merging

[jira] [Created] (ARROW-6261) [C++] Install any bundled components and add installed CMake or pkgconfig configuration to enable downstream linkers to utilize bundled libraries when statically linking

Re: [DISCUSS] Add GetFlightSchema to Flight RPC

Re: Timeline for 0.15.0 release

[jira] [Created] (ARROW-6260) [Website] Use deploy key on Travis to build and push to asf-site

[jira] [Created] (ARROW-6259) [C++][CI] Flatbuffers-related failures in CI on macOS

[jira] [Created] (ARROW-6258) [R] Add macOS build scripts

[jira] [Created] (ARROW-6257) [C++] Add fnmatch compatible globbing function

[jira] [Created] (ARROW-6256) [Rust] parquet-format should be released by Apache process

[jira] [Created] (ARROW-6255) [Rust] [Parquet] Cannot use any published parquet crate due to parquet-format breaking change

[jira] [Created] (ARROW-6254) [Rust][Parquet] Parquet dependency fails to compile

Re: Timeline for 0.15.0 release

Re: Timeline for 0.15.0 release

Re: Timeline for 0.15.0 release

[jira] [Created] (ARROW-6253) [Python] Expose "enable_buffered_stream" option from parquet::ReaderProperties in pyarrow.parquet.read_table

[jira] [Created] (ARROW-6252) [Python] Add pyarrow.Array.diff_contents method

[jira] [Created] (ARROW-6251) [Developer] Add PR merge tool to apache/arrow-site

[jira] [Created] (ARROW-6250) [Java] Implement ApproxEqualsVisitor comparing approx for floating point

[jira] [Created] (ARROW-6249) [Java] Remove useless class ByteArrayWrapper

[jira] [Created] (ARROW-6248) [Python] Use FileNotFoundError in HadoopFileSystem.open() in Python 3

Re: [VOTE] Alter Arrow binary protocol to address 8-byte Flatbuffer alignment requirements (2nd vote)

Re: [VOTE] Alter Arrow binary protocol to address 8-byte Flatbuffer alignment requirements (2nd vote)

Re: [VOTE] Alter Arrow binary protocol to address 8-byte Flatbuffer alignment requirements (2nd vote)

28 matches

Site Navigation

Mail list logo

Footer information