Re: Travis CI delays

2019-09-26 Thread Micah Kornfield
My understanding is the Travis CI queue is shared among all apache
projects, and there are few including Arrow that make heavy use of the
resources.  Hence, a lot of time waiting for jobs to start.  I think there
are some open JIRAs to finish dockerization of builds, I don't know the
current status of finding alternative CI sources though.

On Thu, Sep 26, 2019 at 10:24 PM Andy Grove  wrote:

> I know this has been discussed in the past, and I apologize for not paying
> attention at the time (and searching for arrow + travis in email isn't very
> effective) but why does it take so long for our Travis CI builds and are
> there open JIRA issues related to this?
>
> Thanks,
>
> Andy.
>


[jira] [Created] (ARROW-6720) [JAVA][C++]Support Parquet Read and Write in Java

2019-09-26 Thread Chendi.Xue (Jira)
Chendi.Xue created ARROW-6720:
-

 Summary: [JAVA][C++]Support Parquet Read and Write in Java
 Key: ARROW-6720
 URL: https://issues.apache.org/jira/browse/ARROW-6720
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++, Java
Affects Versions: 0.15.0
Reporter: Chendi.Xue
 Fix For: 0.15.0


We added a new java interface to support parquet read and write from hdfs or 
local file.

The purpose of this implementation is that when we loading and dumping parquet 
data in Java, we can only use rowBased put and get methods. Since arrow already 
has C++ implementation to load and dump parquet, so we wrapped those codes as 
Java APIs.

After test, we noticed in our workload, performance improved more than 2x 
comparing with rowBased load and dump. So we want to contribute codes to arrow.

since this is a total independent change, there is no codes change to current 
arrow codes. We added two folders as listed:  java/adapter/parquet and 
cpp/src/jni/parquet



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6719) Parquet read_table error in Python3.7: pyarrow.lib.ArrowInvalid: Column data for field with type list<...> is inconsistent with schema list<...>

2019-09-26 Thread V Luong (Jira)
V Luong created ARROW-6719:
--

 Summary: Parquet read_table error in Python3.7: 
pyarrow.lib.ArrowInvalid: Column data for field with type list<...> is 
inconsistent with schema list<...>
 Key: ARROW-6719
 URL: https://issues.apache.org/jira/browse/ARROW-6719
 Project: Apache Arrow
  Issue Type: Bug
Affects Versions: 0.14.1
 Environment: Python 3.7
Reporter: V Luong


I have Parquet files with certain complex columns of type List, 
List, etc. and am using latest PyArrow (0.14.1) to process them.

In Python 2.7, pyarrow.parquet.read_table(...) processes these files correctly, 
without any problem.

But in Python 3.7, the same pyarrow.parquet.read_table(...) function calls 
return errors of the following kind:

"pyarrow.lib.ArrowInvalid: Column data for field 0 with type list 
is inconsistent with schema list"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6718) [Rust] packed_simd requires nightly

2019-09-26 Thread Andy Grove (Jira)
Andy Grove created ARROW-6718:
-

 Summary: [Rust] packed_simd requires nightly 
 Key: ARROW-6718
 URL: https://issues.apache.org/jira/browse/ARROW-6718
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Rust
Reporter: Andy Grove


{code:java}
error[E0554]: `#![feature]` may not be used on the stable release channel
   --> 
/home/andy/.cargo/registry/src/github.com-1ecc6299db9ec823/packed_simd-0.3.3/src/lib.rs:202:1
|
202 | / #![feature(
203 | | repr_simd,
204 | | const_fn,
205 | | platform_intrinsics,
...   |
215 | | custom_inner_attributes
216 | | )]
| |__^
 {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6717) Support stable Rust

2019-09-26 Thread Andy Grove (Jira)
Andy Grove created ARROW-6717:
-

 Summary: Support stable Rust
 Key: ARROW-6717
 URL: https://issues.apache.org/jira/browse/ARROW-6717
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Andy Grove


I'm creating this issue to track all the stories we need to implement to be 
able to use stable Rust.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6716) [CI] [Rust] New 1.40.0 nightly causing builds to fail

2019-09-26 Thread Andy Grove (Jira)
Andy Grove created ARROW-6716:
-

 Summary: [CI] [Rust] New 1.40.0 nightly causing builds to fail
 Key: ARROW-6716
 URL: https://issues.apache.org/jira/browse/ARROW-6716
 Project: Apache Arrow
  Issue Type: Bug
  Components: CI, Rust
Reporter: Andy Grove
Assignee: Andy Grove
 Fix For: 1.0.0


So much for pinning the nightly version ... that doesn't work when there is a 
new major version of a nightly apparently.

Travis is now using:
rustc 1.40.0-nightly (37538aa13 2019-09-25)
Despite rust-toolchain containing:
{code:java}
nightly-2019-07-30 {code}
 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6715) [Website] Describe "non-free" section is needed for Plasma packages in install page

2019-09-26 Thread Kouhei Sutou (Jira)
Kouhei Sutou created ARROW-6715:
---

 Summary: [Website] Describe "non-free" section is needed for 
Plasma packages in install page
 Key: ARROW-6715
 URL: https://issues.apache.org/jira/browse/ARROW-6715
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Website
Reporter: Kouhei Sutou
Assignee: Kouhei Sutou


Because Plasma packages depend on nvidia-cuda-toolkit package that in non-free 
section.

Note that Plasma packages are available only for amd64. Because 
nvidia-cuda-toolkit package isn't available for arm64.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6714) [R] Fix untested RecordBatchWriter case

2019-09-26 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-6714:
--

 Summary: [R] Fix untested RecordBatchWriter case
 Key: ARROW-6714
 URL: https://issues.apache.org/jira/browse/ARROW-6714
 Project: Apache Arrow
  Issue Type: Bug
  Components: R
Reporter: Neal Richardson
Assignee: Neal Richardson


Passing a data.frame to RecordBatchWriter$write() would trigger a segfault



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Build issues on macOS [newbie]

2019-09-26 Thread Wes McKinney
It looks like the development toolchain dependencies in
conda_env_cpp.yml aren't installed in your "main" conda environment,
e.g.

https://github.com/apache/arrow/blob/master/ci/conda_env_cpp.yml#L42

You can see what's installed by running "conda list"

Note that most of these dependencies are optional, but we provide the
env files to simplify general development of the project so
contributors aren't struggling to produce comprehensive builds.

On Wed, Sep 25, 2019 at 11:33 AM Tarek Allam Jr.  wrote:
>
> Thanks for the advice Uwe and Neal. I tried your suggestion (as well as 
> turning many of the flags to off) but then ran into other errors afterwards 
> such as:
>
> -- Using ZSTD_ROOT: /usr/local/anaconda3/envs/main
> CMake Error at 
> /usr/local/Cellar/cmake/3.15.3/share/cmake/Modules/FindPackageHandleStandardArgs.cmake:137
>  (message):
>   Could NOT find ZSTD (missing: ZSTD_LIB ZSTD_INCLUDE_DIR)
>   
> /usr/local/Cellar/cmake/3.15.3/share/cmake/Modules/FindPackageHandleStandardArgs.cmake:378
>  (_FPHSA_FAILURE_MESSAGE)
>   cmake_modules/FindZSTD.cmake:61 (find_package_handle_standard_args)
>   cmake_modules/ThirdpartyToolchain.cmake:181 (find_package)
>   cmake_modules/ThirdpartyToolchain.cmake:2033 (resolve_dependency)
>   CMakeLists.txt:412 (include)
>
> I think I will spend some more time to understand CMAKE better and 
> familiarise myself with the codebase more before having another go. Hopefully 
> in this time conda-forge would have removed the SDK requirement as well which 
> like you say should make things much more similar.
>
> Thanks again,
>
> Regards,
> Tarek
>
> On 2019/09/19 16:00:09, "Uwe L. Korn"  wrote:
> > Hello Tarek,
> >
> > this error message is normally the one you get when CONDA_BUILD_SYSROOT 
> > doesn't point to your 10.9 SDK. Please delete your build folder again and 
> > do `export CONDA_BUILD_SYSROOT=..` immediately before running cmake. 
> > Running e.g. a conda install will sadly reset this variable to something 
> > different and break the build.
> >
> > As a sidenote: It looks like in 1-2 months that conda-forge will get rid of 
> > the SDK requirement, then this will be a bit simpler.
> >
> > Cheers
> > Uwe
> >
> > On Thu, Sep 19, 2019, at 5:24 PM, Tarek Allam Jr. wrote:
> > >
> > > Hi all,
> > >
> > > Firstly I must apologies if what I put here is extremely trivial, but I 
> > > am a
> > > complete newcomer to the Apache Arrow project and contributing to Apache 
> > > in
> > > general, but I am very keen to get involved.
> > >
> > > I'm hoping to help where I can so I recently attempted to complete a build
> > > following the instructions laid out in the 'Python Development' section 
> > > of the
> > > documentation here:
> > >
> > > After completing the steps that specifically uses Conda I was able to 
> > > create an
> > > environment but when it comes to building I am unable to do so.
> > >
> > > I am on macOS -- 10.14.6 and as outlined in the docs and here
> > > (https://stackoverflow.com/a/55798942/4521950) I used use 10.9.sdk
> > > instead
> > > of the latest. I have both added this manually using ccmake and also
> > > defining it
> > > like so:
> > >
> > > cmake -DCMAKE_INSTALL_PREFIX=$ARROW_HOME \
> > >   -DCMAKE_INSTALL_LIBDIR=lib \
> > >   -DARROW_FLIGHT=ON \
> > >   -DARROW_GANDIVA=ON \
> > >   -DARROW_ORC=ON \
> > >   -DARROW_PARQUET=ON \
> > >   -DARROW_PYTHON=ON \
> > >   -DARROW_PLASMA=ON \
> > >   -DARROW_BUILD_TESTS=ON \
> > >   -DCONDA_BUILD_SYSROOT=/opt/MacOSX10.9.sdk \
> > >   -DARROW_DEPENDENCY_SOURCE=AUTO \
> > >   ..
> > >
> > > But it seems that whatever I try, I seem to get errors, the main only 
> > > tripping
> > > me up at the moment is:
> > >
> > > -- Building using CMake version: 3.15.3
> > > -- The C compiler identification is Clang 4.0.1
> > > -- The CXX compiler identification is Clang 4.0.1
> > > -- Check for working C compiler:
> > > /usr/local/anaconda3/envs/pyarrow-dev/bin/clang
> > > -- Check for working C compiler:
> > > /usr/local/anaconda3/envs/pyarrow-dev/bin/clang -- broken
> > > CMake Error at
> > > /usr/local/anaconda3/envs/pyarrow-dev/share/cmake-3.15/Modules/CMakeTestCCompiler.cmake:60
> > >  (message):
> > >   The C compiler
> > >
> > > "/usr/local/anaconda3/envs/pyarrow-dev/bin/clang"
> > >
> > >   is not able to compile a simple test program.
> > >
> > >   It fails with the following output:
> > >
> > > Change Dir: /Users/tallamjr/Github/arrow/cpp/build/CMakeFiles/CMakeTmp
> > >
> > > Run Build Command(s):/usr/local/bin/gmake cmTC_b252c/fast &&
> > > /usr/local/bin/gmake -f CMakeFiles/cmTC_b252c.dir/build.make
> > > CMakeFiles/cmTC_b252c.dir/build
> > > gmake[1]: Entering directory
> > > '/Users/tallamjr/Github/arrow/cpp/build/CMakeFiles/CMakeTmp'
> > > Building C object CMakeFiles/cmTC_b252c.dir/testCCompiler.c.o
> > > /usr/local/anaconda3/envs/pyarrow-dev/bin/clang   -march=core2
> > > -mtune=haswell -mssse3 -ftree-vectorize -fPIC -fPIE
> > > 

Re: [NIGHTLY] Arrow Build Report for Job nightly-2019-09-26-0

2019-09-26 Thread Wes McKinney
Should we disable the fuzzit job? This is for a third party CI-type
service, so the failure here seems like it's adding unneeded noise

On Thu, Sep 26, 2019 at 12:31 PM Crossbow  wrote:
>
>
> Arrow Build Report for Job nightly-2019-09-26-0
>
> All tasks: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0
>
> Failed Tasks:
> - docker-spark-integration:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-circle-docker-spark-integration
> - docker-dask-integration:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-circle-docker-dask-integration
> - docker-cpp-fuzzit:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-circle-docker-cpp-fuzzit
>
> Succeeded Tasks:
> - wheel-win-cp36m:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-appveyor-wheel-win-cp36m
> - wheel-manylinux1-cp37m:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-travis-wheel-manylinux1-cp37m
> - homebrew-cpp-autobrew:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-travis-homebrew-cpp-autobrew
> - conda-win-vs2015-py37:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-azure-conda-win-vs2015-py37
> - wheel-manylinux1-cp35m:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-travis-wheel-manylinux1-cp35m
> - docker-pandas-master:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-circle-docker-pandas-master
> - centos-6:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-azure-centos-6
> - conda-osx-clang-py36:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-azure-conda-osx-clang-py36
> - docker-turbodbc-integration:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-circle-docker-turbodbc-integration
> - docker-cpp-release:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-circle-docker-cpp-release
> - docker-docs:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-circle-docker-docs
> - debian-buster-arm64:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-azure-debian-buster-arm64
> - debian-stretch:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-azure-debian-stretch
> - wheel-manylinux1-cp27mu:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-travis-wheel-manylinux1-cp27mu
> - docker-cpp-static-only:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-circle-docker-cpp-static-only
> - docker-r:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-circle-docker-r
> - wheel-manylinux2010-cp27m:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-travis-wheel-manylinux2010-cp27m
> - debian-buster:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-azure-debian-buster
> - wheel-manylinux1-cp27m:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-travis-wheel-manylinux1-cp27m
> - wheel-win-cp37m:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-appveyor-wheel-win-cp37m
> - ubuntu-bionic-arm64:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-azure-ubuntu-bionic-arm64
> - gandiva-jar-trusty:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-travis-gandiva-jar-trusty
> - docker-js:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-circle-docker-js
> - docker-python-2.7-nopandas:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-circle-docker-python-2.7-nopandas
> - wheel-manylinux2010-cp37m:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-travis-wheel-manylinux2010-cp37m
> - wheel-win-cp35m:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-appveyor-wheel-win-cp35m
> - docker-cpp:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-circle-docker-cpp
> - conda-osx-clang-py37:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-azure-conda-osx-clang-py37
> - ubuntu-disco-arm64:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-azure-ubuntu-disco-arm64
> - wheel-osx-cp35m:
>   URL: 
> 

[jira] [Created] (ARROW-6713) [Python] Getting "ArrowIOError: Corrupted file, smaller than file footer" when reading large number of parquet files through ParquetDataset()

2019-09-26 Thread Harini Kannan (Jira)
Harini Kannan created ARROW-6713:


 Summary: [Python] Getting "ArrowIOError: Corrupted file, smaller 
than file footer" when reading large number of parquet files through 
ParquetDataset()
 Key: ARROW-6713
 URL: https://issues.apache.org/jira/browse/ARROW-6713
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Harini Kannan
 Attachments: Screen Shot 2019-09-26 at 2.30.49 PM.png

When trying to read a large number of parquet files (> 600) into 
ParquetDataset(), getting the error: 

ArrowIOError: Corrupted file, smaller than file footer.

 

Note:

-This works fine for small number of (10-20) parquet files.

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6712) [Rust] [Parquet] Reading parquet file into an ndarray

2019-09-26 Thread Adam Lippai (Jira)
Adam Lippai created ARROW-6712:
--

 Summary: [Rust] [Parquet] Reading parquet file into an ndarray 
 Key: ARROW-6712
 URL: https://issues.apache.org/jira/browse/ARROW-6712
 Project: Apache Arrow
  Issue Type: Wish
  Components: Rust
Reporter: Adam Lippai


What's the best way to read a .parquet file into a rust ndarray structure?

Can it be effective with the current API? I assume row iteration is not the 
best idea :) 

I can imagine that even parallel column loading would be possible. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[NIGHTLY] Arrow Build Report for Job nightly-2019-09-26-0

2019-09-26 Thread Crossbow


Arrow Build Report for Job nightly-2019-09-26-0

All tasks: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0

Failed Tasks:
- docker-spark-integration:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-circle-docker-spark-integration
- docker-dask-integration:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-circle-docker-dask-integration
- docker-cpp-fuzzit:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-circle-docker-cpp-fuzzit

Succeeded Tasks:
- wheel-win-cp36m:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-appveyor-wheel-win-cp36m
- wheel-manylinux1-cp37m:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-travis-wheel-manylinux1-cp37m
- homebrew-cpp-autobrew:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-travis-homebrew-cpp-autobrew
- conda-win-vs2015-py37:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-azure-conda-win-vs2015-py37
- wheel-manylinux1-cp35m:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-travis-wheel-manylinux1-cp35m
- docker-pandas-master:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-circle-docker-pandas-master
- centos-6:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-azure-centos-6
- conda-osx-clang-py36:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-azure-conda-osx-clang-py36
- docker-turbodbc-integration:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-circle-docker-turbodbc-integration
- docker-cpp-release:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-circle-docker-cpp-release
- docker-docs:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-circle-docker-docs
- debian-buster-arm64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-azure-debian-buster-arm64
- debian-stretch:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-azure-debian-stretch
- wheel-manylinux1-cp27mu:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-travis-wheel-manylinux1-cp27mu
- docker-cpp-static-only:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-circle-docker-cpp-static-only
- docker-r:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-circle-docker-r
- wheel-manylinux2010-cp27m:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-travis-wheel-manylinux2010-cp27m
- debian-buster:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-azure-debian-buster
- wheel-manylinux1-cp27m:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-travis-wheel-manylinux1-cp27m
- wheel-win-cp37m:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-appveyor-wheel-win-cp37m
- ubuntu-bionic-arm64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-azure-ubuntu-bionic-arm64
- gandiva-jar-trusty:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-travis-gandiva-jar-trusty
- docker-js:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-circle-docker-js
- docker-python-2.7-nopandas:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-circle-docker-python-2.7-nopandas
- wheel-manylinux2010-cp37m:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-travis-wheel-manylinux2010-cp37m
- wheel-win-cp35m:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-appveyor-wheel-win-cp35m
- docker-cpp:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-circle-docker-cpp
- conda-osx-clang-py37:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-azure-conda-osx-clang-py37
- ubuntu-disco-arm64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-azure-ubuntu-disco-arm64
- wheel-osx-cp35m:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-travis-wheel-osx-cp35m
- docker-c_glib:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-circle-docker-c_glib
- conda-win-vs2015-py36:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-azure-conda-win-vs2015-py36
- wheel-manylinux2010-cp36m:
  URL: 

Re: Timeline for 0.15.0 release

2019-09-26 Thread Krisztián Szűcs
There are still missing linux artifacts [1]:
- for amd64 debug symbol packages
- for arm64 optional CUDA, plasma and Gandiva modules

I think we can safely ignore them for the release, crossbow will
report them as missing but the artifact downloading step finish.

Let me know Micah if you have any issues.

[1]: https://github.com/apache/arrow/pull/5506#issuecomment-535495351

On Thu, Sep 26, 2019 at 3:38 PM Micah Kornfield 
wrote:

> Yes, I merged it and it will be included.  I needed to start over due to a
> cross-bow issue...
>
> On Thu, Sep 26, 2019 at 7:18 AM Ji Liu  wrote:
>
>> Hi Micah,
>> Hmm, unfortunately, I just found a bug in JDBC adapter and open a
>> PR, could this change catch up with 0.15?
>> See https://github.com/apache/arrow/pull/5511
>>
>>
>> Thanks,
>> Ji Liu
>>
>>
>> --
>> From:Micah Kornfield 
>> Send Time:2019年9月26日(星期四) 14:23
>> To:Neal Richardson 
>> Cc:"Krisztián Szűcs" ; Wes McKinney <
>> wesmck...@gmail.com>; dev 
>> Subject:Re: Timeline for 0.15.0 release
>>
>> Just an I've started the RC generation process off, the last commit from
>> master is [1]
>>
>> I am currently waiting the crossbow builds (build-690 on
>> ursa-labs/crossbow).  I think this will take a little while so I will pick
>> it up tomorrow (Thursday).
>>
>> Thanks,
>> Micah
>>
>> [1]
>>
>> https://github.com/apache/arrow/commit/07ab5083d5a2925ced6f8168b60b8fa336f4eccc
>>
>> On Wed, Sep 25, 2019 at 2:07 PM Neal Richardson <
>> neal.p.richard...@gmail.com>
>> wrote:
>>
>> > IMO it's too risky to add something that adds a dependency
>> > (aws-sdk-cpp) on the day of cutting a release.
>> >
>> > Neal
>> >
>> > On Wed, Sep 25, 2019 at 12:54 PM Krisztián Szűcs
>> >  wrote:
>> > >
>> > > We don't have a comprehensive documentation yet, so let's postpone it.
>> > >
>> > >
>> > > On Wed, Sep 25, 2019 at 9:48 PM Krisztián Szűcs <
>> > szucs.kriszt...@gmail.com> wrote:
>> > >>
>> > >> The S3 python bindings would be a nice addition to the release.
>> > >> I don't think we should block on this but the PR is ready. Opinions?
>> > >> https://github.com/apache/arrow/pull/5423
>> > >>
>> > >>
>> > >>
>> > >>
>> > >> On Wed, Sep 25, 2019 at 5:28 PM Micah Kornfield <
>> emkornfi...@gmail.com>
>> > wrote:
>> > >>>
>> > >>> OK, I'll start the process today.  I'll send up e-mail updates as I
>> > make progress.
>> > >>>
>> > >>> On Wed, Sep 25, 2019 at 8:22 AM Wes McKinney 
>> > wrote:
>> > 
>> >  Yes, all systems go as far as I'm concerned.
>> > 
>> >  On Wed, Sep 25, 2019 at 9:56 AM Neal Richardson
>> >   wrote:
>> >  >
>>
>> >  > Andy's DataFusion issue and Wes's Parquet one have both been merged,
>>
>> >  > and it looks like the LICENSE issue is being resolved as I type. So
>> >  > are we good to go now?
>> >  >
>> >  > Neal
>> >  >
>> >  >
>> >  > On Tue, Sep 24, 2019 at 10:30 PM Andy Grove <
>> andygrov...@gmail.com>
>> > wrote:
>> >  > >
>> >  > > I found a last minute issue with DataFusion (Rust) and would
>> > appreciate it
>> >  > > if we could merge ARROW-6086 (PR is
>> >  > > https://github.com/apache/arrow/pull/5494
>> ) before cutting the RC.
>> >  > >
>> >  > > Thanks,
>> >  > >
>> >  > > Andy.
>> >  > >
>> >  > >
>> >  > > On Tue, Sep 24, 2019 at 6:19 PM Micah Kornfield <
>> > emkornfi...@gmail.com>
>> >  > > wrote:
>> >  > >
>> >  > > > OK, I'm going to postpone cutting a release until tomorrow
>> > (hoping we can
>> >  > > > issues resolved by then)..  I'll also try to review the
>> > third-party
>> >  > > > additions since 14.x.
>> >  > > >
>> >  > > > On Tue, Sep 24, 2019 at 4:20 PM Wes McKinney <
>> > wesmck...@gmail.com> wrote:
>> >  > > >
>> >  > > > > I found a licensing issue
>> >  > > > >
>> >  > > > > https://issues.apache.org/jira/browse/ARROW-6679
>> >  > > > >
>> >  > > > > It might be worth examining third party code added to the
>> > project
>> >  > > > > since 0.14.x to make sure there are no other such issues.
>> >  > > > >
>> >  > > > > On Tue, Sep 24, 2019 at 6:10 PM Wes McKinney <
>> > wesmck...@gmail.com>
>> >  > > > wrote:
>> >  > > > > >
>>
>> >  > > > > > I have diagnosed the problem (Thrift "string" data must be
>> > UTF-8,
>>
>> >  > > > > > cannot be arbitrary binary) and am working on a patch right
>> > now
>> >  > > > > >
>> >  > > > > > On Tue, Sep 24, 2019 at 6:02 PM Wes McKinney <
>> > wesmck...@gmail.com>
>> >  > > > > wrote:
>> >  > > > > > >
>> >  > > > > > > I just opened
>> >  > > > > > >
>> >  > > > > > > https://issues.apache.org/jira/browse/ARROW-6678
>> >  > > > > > >
>> >  > > > > > > Please don't cut an RC until I have an opportunity to
>> > diagnose this,
>> >  > > > > > > will report back.
>> >  > > > > > >
>> >  > > > > > >
>> >  > > > > > > On Tue, Sep 24, 2019 at 5:51 PM Wes 

[jira] [Created] (ARROW-6710) [Java] Add JDBC adapter test to cover cases which contains some null values

2019-09-26 Thread Ji Liu (Jira)
Ji Liu created ARROW-6710:
-

 Summary: [Java] Add JDBC adapter test to cover cases which 
contains some null values
 Key: ARROW-6710
 URL: https://issues.apache.org/jira/browse/ARROW-6710
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Java
Reporter: Ji Liu
Assignee: Ji Liu


The current JDBC adapter tests only cover the cases that values are all 
non-null or all null.

However, the cases that ResultSet has some null values are not covered 
(ARROW-6709).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Timeline for 0.15.0 release

2019-09-26 Thread Ji Liu
Hi Micah,
Hmm, unfortunately, I just found a bug in JDBC adapter and open a PR, could 
this change catch up with 0.15?
See https://github.com/apache/arrow/pull/5511


Thanks,
Ji Liu



--
From:Micah Kornfield 
Send Time:2019年9月26日(星期四) 14:23
To:Neal Richardson 
Cc:"Krisztián Szűcs" ; Wes McKinney 
; dev 
Subject:Re: Timeline for 0.15.0 release

Just an I've started the RC generation process off, the last commit from
master is [1]

I am currently waiting the crossbow builds (build-690 on
ursa-labs/crossbow).  I think this will take a little while so I will pick
it up tomorrow (Thursday).

Thanks,
Micah

[1]
https://github.com/apache/arrow/commit/07ab5083d5a2925ced6f8168b60b8fa336f4eccc

On Wed, Sep 25, 2019 at 2:07 PM Neal Richardson 
wrote:

> IMO it's too risky to add something that adds a dependency
> (aws-sdk-cpp) on the day of cutting a release.
>
> Neal
>
> On Wed, Sep 25, 2019 at 12:54 PM Krisztián Szűcs
>  wrote:
> >
> > We don't have a comprehensive documentation yet, so let's postpone it.
> >
> >
> > On Wed, Sep 25, 2019 at 9:48 PM Krisztián Szűcs <
> szucs.kriszt...@gmail.com> wrote:
> >>
> >> The S3 python bindings would be a nice addition to the release.
> >> I don't think we should block on this but the PR is ready. Opinions?
> >> https://github.com/apache/arrow/pull/5423
> >>
> >>
> >>
> >>
> >> On Wed, Sep 25, 2019 at 5:28 PM Micah Kornfield 
> wrote:
> >>>
> >>> OK, I'll start the process today.  I'll send up e-mail updates as I
> make progress.
> >>>
> >>> On Wed, Sep 25, 2019 at 8:22 AM Wes McKinney 
> wrote:
> 
>  Yes, all systems go as far as I'm concerned.
> 
>  On Wed, Sep 25, 2019 at 9:56 AM Neal Richardson
>   wrote:
>  >
>  > Andy's DataFusion issue and Wes's Parquet one have both been merged,
>  > and it looks like the LICENSE issue is being resolved as I type. So
>  > are we good to go now?
>  >
>  > Neal
>  >
>  >
>  > On Tue, Sep 24, 2019 at 10:30 PM Andy Grove 
> wrote:
>  > >
>  > > I found a last minute issue with DataFusion (Rust) and would
> appreciate it
>  > > if we could merge ARROW-6086 (PR is
>  > > https://github.com/apache/arrow/pull/5494) before cutting the RC.
>  > >
>  > > Thanks,
>  > >
>  > > Andy.
>  > >
>  > >
>  > > On Tue, Sep 24, 2019 at 6:19 PM Micah Kornfield <
> emkornfi...@gmail.com>
>  > > wrote:
>  > >
>  > > > OK, I'm going to postpone cutting a release until tomorrow
> (hoping we can
>  > > > issues resolved by then)..  I'll also try to review the
> third-party
>  > > > additions since 14.x.
>  > > >
>  > > > On Tue, Sep 24, 2019 at 4:20 PM Wes McKinney <
> wesmck...@gmail.com> wrote:
>  > > >
>  > > > > I found a licensing issue
>  > > > >
>  > > > > https://issues.apache.org/jira/browse/ARROW-6679
>  > > > >
>  > > > > It might be worth examining third party code added to the
> project
>  > > > > since 0.14.x to make sure there are no other such issues.
>  > > > >
>  > > > > On Tue, Sep 24, 2019 at 6:10 PM Wes McKinney <
> wesmck...@gmail.com>
>  > > > wrote:
>  > > > > >
>  > > > > > I have diagnosed the problem (Thrift "string" data must be
> UTF-8,
>  > > > > > cannot be arbitrary binary) and am working on a patch right
> now
>  > > > > >
>  > > > > > On Tue, Sep 24, 2019 at 6:02 PM Wes McKinney <
> wesmck...@gmail.com>
>  > > > > wrote:
>  > > > > > >
>  > > > > > > I just opened
>  > > > > > >
>  > > > > > > https://issues.apache.org/jira/browse/ARROW-6678
>  > > > > > >
>  > > > > > > Please don't cut an RC until I have an opportunity to
> diagnose this,
>  > > > > > > will report back.
>  > > > > > >
>  > > > > > >
>  > > > > > > On Tue, Sep 24, 2019 at 5:51 PM Wes McKinney <
> wesmck...@gmail.com>
>  > > > > wrote:
>  > > > > > > >
>  > > > > > > > I'm investigating a possible Parquet-related
> compatibility bug
>  > > > that I
>  > > > > > > > encountered through some routine testing /
> benchmarking. I'll
>  > > > report
>  > > > > > > > back once I figure out what is going on (if anything)
>  > > > > > > >
>  > > > > > > > On Sun, Sep 22, 2019 at 11:51 PM Micah Kornfield <
>  > > > > emkornfi...@gmail.com> wrote:
>  > > > > > > > >>
>  > > > > > > > >> It's ideal if your GPG key is in the web of trust
> (i.e. you can
>  > > > > get it
>  > > > > > > > >> signed by another PMC member), but is not 100%
> essential.
>  > > > > > > > >
>  > > > > > > > > That won't be an option for me this week (it seems
> like I would
>  > > > > need to meet one face-to-face).  I'll try to get the GPG
> checked in and
>  > > > the
>  > > > > rest of the pre-requisites done tomorrow (Monday) to
> hopefully start the
>  > > > > release on Tuesday (hopefully we can solve the last
> 

[jira] [Created] (ARROW-6708) [C++] "cannot find -lboost_filesystem_static"

2019-09-26 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-6708:
-

 Summary: [C++] "cannot find -lboost_filesystem_static"
 Key: ARROW-6708
 URL: https://issues.apache.org/jira/browse/ARROW-6708
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Reporter: Antoine Pitrou


I'm trying a fresh build on another machine and get this error when using the 
{{boost-cpp}} conda package:

{code}
/usr/bin/ld.gold: error: cannot find -lboost_filesystem_static
/usr/bin/ld.gold: error: cannot find -lboost_system_static
{code}

Note that Boost static libraries are installed, but they are named 
{{libboost_filesystem.a}} and {{libboost_system.a}} (no "_static" suffix).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6706) [Developer Tools] Cannot merge PRs from authors with "Á" (U+00C1) in their name

2019-09-26 Thread Andy Grove (Jira)
Andy Grove created ARROW-6706:
-

 Summary: [Developer Tools] Cannot merge PRs from authors with "Á" 
(U+00C1) in their name
 Key: ARROW-6706
 URL: https://issues.apache.org/jira/browse/ARROW-6706
 Project: Apache Arrow
  Issue Type: Bug
  Components: Developer Tools
Reporter: Andy Grove


I tried merging a PR from Ádám Lippai ([https://github.com/alippai)] and the 
merge script failed with:


 
{code:java}
./dev/merge_arrow_pr.py 
ARROW_HOME = /home/andy/git/andygrove/arrow/dev
PROJECT_NAME = arrow
Which pull request would you like to merge? (e.g. 34): 5499
Env APACHE_JIRA_USERNAME not set, please enter your JIRA username:andygrove
Env APACHE_JIRA_PASSWORD not set, please enter your JIRA password:=== Pull 
Request #5499 ===
title   ARROW-6705: [Rust] [DataFusion] README has invalid github URL
source  alippai/patch-1
target  master
url https://api.github.com/repos/apache/arrow/pulls/5499
=== JIRA ARROW-6705 ===
Summary [Rust] [DataFusion] README has invalid github URL
AssigneeNOT ASSIGNED!!!
Components  Rust
Status  Open
URL https://issues.apache.org/jira/browse/ARROW-6705Proceed with 
merging pull request #5499? (y/n): y
Switched to branch 'PR_TOOL_MERGE_PR_5499_MASTER'
Automatic merge went well; stopped before committing as requested
Traceback (most recent call last):
  File "./dev/merge_arrow_pr.py", line 571, in 
cli()
  File "./dev/merge_arrow_pr.py", line 556, in cli
pr.merge()
  File "./dev/merge_arrow_pr.py", line 354, in merge
print("Author {}: {}".format(i + 1, author))
UnicodeEncodeError: 'ascii' codec can't encode character u'\xc1' in position 0: 
ordinal not in range(128)
 {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Unnesting ListArrays

2019-09-26 Thread Suhail Razzak
Thanks Wes, makes sense. I appreciate that there are use cases where both
could be applicable.

In my example, the most applicable I can think of is unnesting a ListArray
column for a DataFrame (in the future C++ DataFrames API?) similar to the
tidyr unnest function. I don't believe the current implementation wouldn't
be able to align the flattened ListArray with the rest of the columns. I'll
see if there's something I can do on this end.

On Wed, Sep 25, 2019 at 6:27 PM Wes McKinney  wrote:

> hi Suhail,
>
> This follows the columnar format closely. The List layout is composed
> from a child array providing the "inner" values, which are given the
> List interpretation by adding an offsets buffer, and a validity
> buffer to distinguish null from 0-length list values. So flatten()
> here just returns the child array, which has only 3 values in the
> example you gave.
>
> A function could be written to insert "null" for List values that are
> null, but someone would have to write it and give it a name =)
>
> - Wes
>
> On Wed, Sep 25, 2019 at 5:15 PM Suhail Razzak 
> wrote:
> >
> > Hi,
> >
> > I'm working through a certain use case where I'm unnesting ListArrays,
> but
> > I noticed something peculiar - null ListValues are not retained in the
> > unnested array.
> >
> > E.g.
> > In [0]: arr = pa.array([[0, 1], [0], None, None])
> > In [1]: arr.flatten()
> > Out [1]: [0, 1, 0]
> >
> > While I would have expected [0, 1, 0, null, null].
> >
> > I should note that this works if the None is encapsulated in a list. So
> I'm
> > guessing this is expected logic and if so, what's the reasoning for that?
> >
> > Thanks,
> > Suhail
>


Re: Thread-safety guarantees of pyarrow Table (and other) objects

2019-09-26 Thread Antoine Pitrou


Hi Yevgeni,

The main Arrow classes (such as Array, ChunkedArray, RecordBatch, Table)
are immutable so support multi-thread usage out of the box.

We have mutable classes as well (e.g. IO classes, ArrayBuilders, mutable
Buffers...) and those are not thread-safe.

Regards

Antoine.


Le 26/09/2019 à 06:03, Yevgeni Litvin a écrit :
> Where in the documentation can I find information about thread-safety
> guarantee of arrow classes? In particular, is the following usage of
> pyarrow.Table showed by the pseudo-code thread-safe?
> 
> 
> arrow_table = pa.Table.from_pandas(df)
> 
> 
> def other_thread_worker_impl(arrow_table):
> 
> arrow_table.column('some_column')[row].as_py()
> 
> 
> run_in_parallel(other_thread_worker_impl, arrow_table)
> 
> 
> I tried using pandas.DataFrame in the same multi-threaded setup and it
> turned out to be unsafe (https://github.com/pandas-dev/pandas/issues/28439).
> 
> Thank you.
> 
> - Yevgeni
> 


[jira] [Created] (ARROW-6704) [C++] Cast from timestamp to higher resolution does not check out of bounds timestamps

2019-09-26 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-6704:


 Summary: [C++] Cast from timestamp to higher resolution does not 
check out of bounds timestamps
 Key: ARROW-6704
 URL: https://issues.apache.org/jira/browse/ARROW-6704
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Reporter: Joris Van den Bossche


When casting eg {{timestamp('s')}} to {{timestamp('ns')}}, we do not check for 
out of bounds timestamps, giving "garbage" timestamps in the result:

{code}
In [74]: a_np = np.array(["2012-01-01", "2412-01-01"], dtype="datetime64[s]")   

   

In [75]: arr = pa.array(a_np)   

   

In [76]: arr

   
Out[76]: 

[
  2012-01-01 00:00:00,
  2412-01-01 00:00:00
]

In [77]: arr.cast(pa.timestamp('ns'))   

   
Out[77]: 

[
  2012-01-01 00:00:00.0,
  1827-06-13 00:25:26.290448384
]
{code}

Now, this is the same behaviour as numpy, so not sure we should do this. 
However, since we have a {{safe=True/False}}, I would expect that for 
{{safe=True}} we check this and for {{safe=False}} we do not check this.  
(numpy has a similiar {{casting='safe'}} but also does not raise an error in 
that case).




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6703) [Packaging][Linux] Restore ARROW_VERSION environment variable

2019-09-26 Thread Kouhei Sutou (Jira)
Kouhei Sutou created ARROW-6703:
---

 Summary: [Packaging][Linux] Restore ARROW_VERSION environment 
variable
 Key: ARROW-6703
 URL: https://issues.apache.org/jira/browse/ARROW-6703
 Project: Apache Arrow
  Issue Type: Bug
  Components: Packaging
Reporter: Kouhei Sutou
Assignee: Kouhei Sutou
 Fix For: 0.15.0


{{ARROW_VERSION}} is needed to use correct download URL for RC.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6702) [Rust] [DataFusion] Incorrect partition read

2019-09-26 Thread Adam Lippai (Jira)
Adam Lippai created ARROW-6702:
--

 Summary: [Rust] [DataFusion] Incorrect partition read
 Key: ARROW-6702
 URL: https://issues.apache.org/jira/browse/ARROW-6702
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust, Rust - DataFusion
Affects Versions: 0.15.0
Reporter: Adam Lippai


Reading a dir structure of duplicated alltypes_plain.parquet returns 8 rows 
instead of 16 (e.g. read by pandas parquet reader)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6701) [C++][R] Lint failing on R cpp code

2019-09-26 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-6701:
--

 Summary: [C++][R] Lint failing on R cpp code
 Key: ARROW-6701
 URL: https://issues.apache.org/jira/browse/ARROW-6701
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Micah Kornfield
 Fix For: 1.0.0


[See as an example 
https://travis-ci.org/apache/arrow/jobs/589772132#L695|https://travis-ci.org/apache/arrow/jobs/589772132#L695]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Timeline for 0.15.0 release

2019-09-26 Thread Micah Kornfield
Just an I've started the RC generation process off, the last commit from
master is [1]

I am currently waiting the crossbow builds (build-690 on
ursa-labs/crossbow).  I think this will take a little while so I will pick
it up tomorrow (Thursday).

Thanks,
Micah

[1]
https://github.com/apache/arrow/commit/07ab5083d5a2925ced6f8168b60b8fa336f4eccc

On Wed, Sep 25, 2019 at 2:07 PM Neal Richardson 
wrote:

> IMO it's too risky to add something that adds a dependency
> (aws-sdk-cpp) on the day of cutting a release.
>
> Neal
>
> On Wed, Sep 25, 2019 at 12:54 PM Krisztián Szűcs
>  wrote:
> >
> > We don't have a comprehensive documentation yet, so let's postpone it.
> >
> >
> > On Wed, Sep 25, 2019 at 9:48 PM Krisztián Szűcs <
> szucs.kriszt...@gmail.com> wrote:
> >>
> >> The S3 python bindings would be a nice addition to the release.
> >> I don't think we should block on this but the PR is ready. Opinions?
> >> https://github.com/apache/arrow/pull/5423
> >>
> >>
> >>
> >>
> >> On Wed, Sep 25, 2019 at 5:28 PM Micah Kornfield 
> wrote:
> >>>
> >>> OK, I'll start the process today.  I'll send up e-mail updates as I
> make progress.
> >>>
> >>> On Wed, Sep 25, 2019 at 8:22 AM Wes McKinney 
> wrote:
> 
>  Yes, all systems go as far as I'm concerned.
> 
>  On Wed, Sep 25, 2019 at 9:56 AM Neal Richardson
>   wrote:
>  >
>  > Andy's DataFusion issue and Wes's Parquet one have both been merged,
>  > and it looks like the LICENSE issue is being resolved as I type. So
>  > are we good to go now?
>  >
>  > Neal
>  >
>  >
>  > On Tue, Sep 24, 2019 at 10:30 PM Andy Grove 
> wrote:
>  > >
>  > > I found a last minute issue with DataFusion (Rust) and would
> appreciate it
>  > > if we could merge ARROW-6086 (PR is
>  > > https://github.com/apache/arrow/pull/5494) before cutting the RC.
>  > >
>  > > Thanks,
>  > >
>  > > Andy.
>  > >
>  > >
>  > > On Tue, Sep 24, 2019 at 6:19 PM Micah Kornfield <
> emkornfi...@gmail.com>
>  > > wrote:
>  > >
>  > > > OK, I'm going to postpone cutting a release until tomorrow
> (hoping we can
>  > > > issues resolved by then)..  I'll also try to review the
> third-party
>  > > > additions since 14.x.
>  > > >
>  > > > On Tue, Sep 24, 2019 at 4:20 PM Wes McKinney <
> wesmck...@gmail.com> wrote:
>  > > >
>  > > > > I found a licensing issue
>  > > > >
>  > > > > https://issues.apache.org/jira/browse/ARROW-6679
>  > > > >
>  > > > > It might be worth examining third party code added to the
> project
>  > > > > since 0.14.x to make sure there are no other such issues.
>  > > > >
>  > > > > On Tue, Sep 24, 2019 at 6:10 PM Wes McKinney <
> wesmck...@gmail.com>
>  > > > wrote:
>  > > > > >
>  > > > > > I have diagnosed the problem (Thrift "string" data must be
> UTF-8,
>  > > > > > cannot be arbitrary binary) and am working on a patch right
> now
>  > > > > >
>  > > > > > On Tue, Sep 24, 2019 at 6:02 PM Wes McKinney <
> wesmck...@gmail.com>
>  > > > > wrote:
>  > > > > > >
>  > > > > > > I just opened
>  > > > > > >
>  > > > > > > https://issues.apache.org/jira/browse/ARROW-6678
>  > > > > > >
>  > > > > > > Please don't cut an RC until I have an opportunity to
> diagnose this,
>  > > > > > > will report back.
>  > > > > > >
>  > > > > > >
>  > > > > > > On Tue, Sep 24, 2019 at 5:51 PM Wes McKinney <
> wesmck...@gmail.com>
>  > > > > wrote:
>  > > > > > > >
>  > > > > > > > I'm investigating a possible Parquet-related
> compatibility bug
>  > > > that I
>  > > > > > > > encountered through some routine testing /
> benchmarking. I'll
>  > > > report
>  > > > > > > > back once I figure out what is going on (if anything)
>  > > > > > > >
>  > > > > > > > On Sun, Sep 22, 2019 at 11:51 PM Micah Kornfield <
>  > > > > emkornfi...@gmail.com> wrote:
>  > > > > > > > >>
>  > > > > > > > >> It's ideal if your GPG key is in the web of trust
> (i.e. you can
>  > > > > get it
>  > > > > > > > >> signed by another PMC member), but is not 100%
> essential.
>  > > > > > > > >
>  > > > > > > > > That won't be an option for me this week (it seems
> like I would
>  > > > > need to meet one face-to-face).  I'll try to get the GPG
> checked in and
>  > > > the
>  > > > > rest of the pre-requisites done tomorrow (Monday) to
> hopefully start the
>  > > > > release on Tuesday (hopefully we can solve the last
> blocker/integration
>  > > > > tests by then).
>  > > > > > > > >
>  > > > > > > > > On Sat, Sep 21, 2019 at 7:12 PM Wes McKinney <
>  > > > wesmck...@gmail.com>
>  > > > > wrote:
>  > > > > > > > >>
>  > > > > > > > >> It's ideal if your GPG key is in the web of trust
> (i.e. you can
>  > > > > get it
>  > > > > > > > >> signed by another PMC member), but is not 100%
>