Re: Travis CI delays
My understanding is the Travis CI queue is shared among all apache projects, and there are few including Arrow that make heavy use of the resources. Hence, a lot of time waiting for jobs to start. I think there are some open JIRAs to finish dockerization of builds, I don't know the current status of finding alternative CI sources though. On Thu, Sep 26, 2019 at 10:24 PM Andy Grove wrote: > I know this has been discussed in the past, and I apologize for not paying > attention at the time (and searching for arrow + travis in email isn't very > effective) but why does it take so long for our Travis CI builds and are > there open JIRA issues related to this? > > Thanks, > > Andy. >
[jira] [Created] (ARROW-6720) [JAVA][C++]Support Parquet Read and Write in Java
Chendi.Xue created ARROW-6720: - Summary: [JAVA][C++]Support Parquet Read and Write in Java Key: ARROW-6720 URL: https://issues.apache.org/jira/browse/ARROW-6720 Project: Apache Arrow Issue Type: New Feature Components: C++, Java Affects Versions: 0.15.0 Reporter: Chendi.Xue Fix For: 0.15.0 We added a new java interface to support parquet read and write from hdfs or local file. The purpose of this implementation is that when we loading and dumping parquet data in Java, we can only use rowBased put and get methods. Since arrow already has C++ implementation to load and dump parquet, so we wrapped those codes as Java APIs. After test, we noticed in our workload, performance improved more than 2x comparing with rowBased load and dump. So we want to contribute codes to arrow. since this is a total independent change, there is no codes change to current arrow codes. We added two folders as listed: java/adapter/parquet and cpp/src/jni/parquet -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-6719) Parquet read_table error in Python3.7: pyarrow.lib.ArrowInvalid: Column data for field with type list<...> is inconsistent with schema list<...>
V Luong created ARROW-6719: -- Summary: Parquet read_table error in Python3.7: pyarrow.lib.ArrowInvalid: Column data for field with type list<...> is inconsistent with schema list<...> Key: ARROW-6719 URL: https://issues.apache.org/jira/browse/ARROW-6719 Project: Apache Arrow Issue Type: Bug Affects Versions: 0.14.1 Environment: Python 3.7 Reporter: V Luong I have Parquet files with certain complex columns of type List, List, etc. and am using latest PyArrow (0.14.1) to process them. In Python 2.7, pyarrow.parquet.read_table(...) processes these files correctly, without any problem. But in Python 3.7, the same pyarrow.parquet.read_table(...) function calls return errors of the following kind: "pyarrow.lib.ArrowInvalid: Column data for field 0 with type list is inconsistent with schema list" -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-6718) [Rust] packed_simd requires nightly
Andy Grove created ARROW-6718: - Summary: [Rust] packed_simd requires nightly Key: ARROW-6718 URL: https://issues.apache.org/jira/browse/ARROW-6718 Project: Apache Arrow Issue Type: Sub-task Components: Rust Reporter: Andy Grove {code:java} error[E0554]: `#![feature]` may not be used on the stable release channel --> /home/andy/.cargo/registry/src/github.com-1ecc6299db9ec823/packed_simd-0.3.3/src/lib.rs:202:1 | 202 | / #![feature( 203 | | repr_simd, 204 | | const_fn, 205 | | platform_intrinsics, ... | 215 | | custom_inner_attributes 216 | | )] | |__^ {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-6717) Support stable Rust
Andy Grove created ARROW-6717: - Summary: Support stable Rust Key: ARROW-6717 URL: https://issues.apache.org/jira/browse/ARROW-6717 Project: Apache Arrow Issue Type: Improvement Components: Rust Reporter: Andy Grove I'm creating this issue to track all the stories we need to implement to be able to use stable Rust. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-6716) [CI] [Rust] New 1.40.0 nightly causing builds to fail
Andy Grove created ARROW-6716: - Summary: [CI] [Rust] New 1.40.0 nightly causing builds to fail Key: ARROW-6716 URL: https://issues.apache.org/jira/browse/ARROW-6716 Project: Apache Arrow Issue Type: Bug Components: CI, Rust Reporter: Andy Grove Assignee: Andy Grove Fix For: 1.0.0 So much for pinning the nightly version ... that doesn't work when there is a new major version of a nightly apparently. Travis is now using: rustc 1.40.0-nightly (37538aa13 2019-09-25) Despite rust-toolchain containing: {code:java} nightly-2019-07-30 {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-6715) [Website] Describe "non-free" section is needed for Plasma packages in install page
Kouhei Sutou created ARROW-6715: --- Summary: [Website] Describe "non-free" section is needed for Plasma packages in install page Key: ARROW-6715 URL: https://issues.apache.org/jira/browse/ARROW-6715 Project: Apache Arrow Issue Type: Improvement Components: Website Reporter: Kouhei Sutou Assignee: Kouhei Sutou Because Plasma packages depend on nvidia-cuda-toolkit package that in non-free section. Note that Plasma packages are available only for amd64. Because nvidia-cuda-toolkit package isn't available for arm64. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-6714) [R] Fix untested RecordBatchWriter case
Neal Richardson created ARROW-6714: -- Summary: [R] Fix untested RecordBatchWriter case Key: ARROW-6714 URL: https://issues.apache.org/jira/browse/ARROW-6714 Project: Apache Arrow Issue Type: Bug Components: R Reporter: Neal Richardson Assignee: Neal Richardson Passing a data.frame to RecordBatchWriter$write() would trigger a segfault -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: Build issues on macOS [newbie]
It looks like the development toolchain dependencies in conda_env_cpp.yml aren't installed in your "main" conda environment, e.g. https://github.com/apache/arrow/blob/master/ci/conda_env_cpp.yml#L42 You can see what's installed by running "conda list" Note that most of these dependencies are optional, but we provide the env files to simplify general development of the project so contributors aren't struggling to produce comprehensive builds. On Wed, Sep 25, 2019 at 11:33 AM Tarek Allam Jr. wrote: > > Thanks for the advice Uwe and Neal. I tried your suggestion (as well as > turning many of the flags to off) but then ran into other errors afterwards > such as: > > -- Using ZSTD_ROOT: /usr/local/anaconda3/envs/main > CMake Error at > /usr/local/Cellar/cmake/3.15.3/share/cmake/Modules/FindPackageHandleStandardArgs.cmake:137 > (message): > Could NOT find ZSTD (missing: ZSTD_LIB ZSTD_INCLUDE_DIR) > > /usr/local/Cellar/cmake/3.15.3/share/cmake/Modules/FindPackageHandleStandardArgs.cmake:378 > (_FPHSA_FAILURE_MESSAGE) > cmake_modules/FindZSTD.cmake:61 (find_package_handle_standard_args) > cmake_modules/ThirdpartyToolchain.cmake:181 (find_package) > cmake_modules/ThirdpartyToolchain.cmake:2033 (resolve_dependency) > CMakeLists.txt:412 (include) > > I think I will spend some more time to understand CMAKE better and > familiarise myself with the codebase more before having another go. Hopefully > in this time conda-forge would have removed the SDK requirement as well which > like you say should make things much more similar. > > Thanks again, > > Regards, > Tarek > > On 2019/09/19 16:00:09, "Uwe L. Korn" wrote: > > Hello Tarek, > > > > this error message is normally the one you get when CONDA_BUILD_SYSROOT > > doesn't point to your 10.9 SDK. Please delete your build folder again and > > do `export CONDA_BUILD_SYSROOT=..` immediately before running cmake. > > Running e.g. a conda install will sadly reset this variable to something > > different and break the build. > > > > As a sidenote: It looks like in 1-2 months that conda-forge will get rid of > > the SDK requirement, then this will be a bit simpler. > > > > Cheers > > Uwe > > > > On Thu, Sep 19, 2019, at 5:24 PM, Tarek Allam Jr. wrote: > > > > > > Hi all, > > > > > > Firstly I must apologies if what I put here is extremely trivial, but I > > > am a > > > complete newcomer to the Apache Arrow project and contributing to Apache > > > in > > > general, but I am very keen to get involved. > > > > > > I'm hoping to help where I can so I recently attempted to complete a build > > > following the instructions laid out in the 'Python Development' section > > > of the > > > documentation here: > > > > > > After completing the steps that specifically uses Conda I was able to > > > create an > > > environment but when it comes to building I am unable to do so. > > > > > > I am on macOS -- 10.14.6 and as outlined in the docs and here > > > (https://stackoverflow.com/a/55798942/4521950) I used use 10.9.sdk > > > instead > > > of the latest. I have both added this manually using ccmake and also > > > defining it > > > like so: > > > > > > cmake -DCMAKE_INSTALL_PREFIX=$ARROW_HOME \ > > > -DCMAKE_INSTALL_LIBDIR=lib \ > > > -DARROW_FLIGHT=ON \ > > > -DARROW_GANDIVA=ON \ > > > -DARROW_ORC=ON \ > > > -DARROW_PARQUET=ON \ > > > -DARROW_PYTHON=ON \ > > > -DARROW_PLASMA=ON \ > > > -DARROW_BUILD_TESTS=ON \ > > > -DCONDA_BUILD_SYSROOT=/opt/MacOSX10.9.sdk \ > > > -DARROW_DEPENDENCY_SOURCE=AUTO \ > > > .. > > > > > > But it seems that whatever I try, I seem to get errors, the main only > > > tripping > > > me up at the moment is: > > > > > > -- Building using CMake version: 3.15.3 > > > -- The C compiler identification is Clang 4.0.1 > > > -- The CXX compiler identification is Clang 4.0.1 > > > -- Check for working C compiler: > > > /usr/local/anaconda3/envs/pyarrow-dev/bin/clang > > > -- Check for working C compiler: > > > /usr/local/anaconda3/envs/pyarrow-dev/bin/clang -- broken > > > CMake Error at > > > /usr/local/anaconda3/envs/pyarrow-dev/share/cmake-3.15/Modules/CMakeTestCCompiler.cmake:60 > > > (message): > > > The C compiler > > > > > > "/usr/local/anaconda3/envs/pyarrow-dev/bin/clang" > > > > > > is not able to compile a simple test program. > > > > > > It fails with the following output: > > > > > > Change Dir: /Users/tallamjr/Github/arrow/cpp/build/CMakeFiles/CMakeTmp > > > > > > Run Build Command(s):/usr/local/bin/gmake cmTC_b252c/fast && > > > /usr/local/bin/gmake -f CMakeFiles/cmTC_b252c.dir/build.make > > > CMakeFiles/cmTC_b252c.dir/build > > > gmake[1]: Entering directory > > > '/Users/tallamjr/Github/arrow/cpp/build/CMakeFiles/CMakeTmp' > > > Building C object CMakeFiles/cmTC_b252c.dir/testCCompiler.c.o > > > /usr/local/anaconda3/envs/pyarrow-dev/bin/clang -march=core2 > > > -mtune=haswell -mssse3 -ftree-vectorize -fPIC -fPIE > > >
Re: [NIGHTLY] Arrow Build Report for Job nightly-2019-09-26-0
Should we disable the fuzzit job? This is for a third party CI-type service, so the failure here seems like it's adding unneeded noise On Thu, Sep 26, 2019 at 12:31 PM Crossbow wrote: > > > Arrow Build Report for Job nightly-2019-09-26-0 > > All tasks: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0 > > Failed Tasks: > - docker-spark-integration: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-circle-docker-spark-integration > - docker-dask-integration: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-circle-docker-dask-integration > - docker-cpp-fuzzit: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-circle-docker-cpp-fuzzit > > Succeeded Tasks: > - wheel-win-cp36m: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-appveyor-wheel-win-cp36m > - wheel-manylinux1-cp37m: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-travis-wheel-manylinux1-cp37m > - homebrew-cpp-autobrew: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-travis-homebrew-cpp-autobrew > - conda-win-vs2015-py37: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-azure-conda-win-vs2015-py37 > - wheel-manylinux1-cp35m: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-travis-wheel-manylinux1-cp35m > - docker-pandas-master: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-circle-docker-pandas-master > - centos-6: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-azure-centos-6 > - conda-osx-clang-py36: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-azure-conda-osx-clang-py36 > - docker-turbodbc-integration: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-circle-docker-turbodbc-integration > - docker-cpp-release: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-circle-docker-cpp-release > - docker-docs: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-circle-docker-docs > - debian-buster-arm64: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-azure-debian-buster-arm64 > - debian-stretch: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-azure-debian-stretch > - wheel-manylinux1-cp27mu: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-travis-wheel-manylinux1-cp27mu > - docker-cpp-static-only: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-circle-docker-cpp-static-only > - docker-r: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-circle-docker-r > - wheel-manylinux2010-cp27m: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-travis-wheel-manylinux2010-cp27m > - debian-buster: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-azure-debian-buster > - wheel-manylinux1-cp27m: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-travis-wheel-manylinux1-cp27m > - wheel-win-cp37m: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-appveyor-wheel-win-cp37m > - ubuntu-bionic-arm64: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-azure-ubuntu-bionic-arm64 > - gandiva-jar-trusty: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-travis-gandiva-jar-trusty > - docker-js: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-circle-docker-js > - docker-python-2.7-nopandas: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-circle-docker-python-2.7-nopandas > - wheel-manylinux2010-cp37m: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-travis-wheel-manylinux2010-cp37m > - wheel-win-cp35m: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-appveyor-wheel-win-cp35m > - docker-cpp: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-circle-docker-cpp > - conda-osx-clang-py37: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-azure-conda-osx-clang-py37 > - ubuntu-disco-arm64: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-azure-ubuntu-disco-arm64 > - wheel-osx-cp35m: > URL: >
[jira] [Created] (ARROW-6713) [Python] Getting "ArrowIOError: Corrupted file, smaller than file footer" when reading large number of parquet files through ParquetDataset()
Harini Kannan created ARROW-6713: Summary: [Python] Getting "ArrowIOError: Corrupted file, smaller than file footer" when reading large number of parquet files through ParquetDataset() Key: ARROW-6713 URL: https://issues.apache.org/jira/browse/ARROW-6713 Project: Apache Arrow Issue Type: Bug Reporter: Harini Kannan Attachments: Screen Shot 2019-09-26 at 2.30.49 PM.png When trying to read a large number of parquet files (> 600) into ParquetDataset(), getting the error: ArrowIOError: Corrupted file, smaller than file footer. Note: -This works fine for small number of (10-20) parquet files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-6712) [Rust] [Parquet] Reading parquet file into an ndarray
Adam Lippai created ARROW-6712: -- Summary: [Rust] [Parquet] Reading parquet file into an ndarray Key: ARROW-6712 URL: https://issues.apache.org/jira/browse/ARROW-6712 Project: Apache Arrow Issue Type: Wish Components: Rust Reporter: Adam Lippai What's the best way to read a .parquet file into a rust ndarray structure? Can it be effective with the current API? I assume row iteration is not the best idea :) I can imagine that even parallel column loading would be possible. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[NIGHTLY] Arrow Build Report for Job nightly-2019-09-26-0
Arrow Build Report for Job nightly-2019-09-26-0 All tasks: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0 Failed Tasks: - docker-spark-integration: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-circle-docker-spark-integration - docker-dask-integration: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-circle-docker-dask-integration - docker-cpp-fuzzit: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-circle-docker-cpp-fuzzit Succeeded Tasks: - wheel-win-cp36m: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-appveyor-wheel-win-cp36m - wheel-manylinux1-cp37m: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-travis-wheel-manylinux1-cp37m - homebrew-cpp-autobrew: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-travis-homebrew-cpp-autobrew - conda-win-vs2015-py37: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-azure-conda-win-vs2015-py37 - wheel-manylinux1-cp35m: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-travis-wheel-manylinux1-cp35m - docker-pandas-master: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-circle-docker-pandas-master - centos-6: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-azure-centos-6 - conda-osx-clang-py36: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-azure-conda-osx-clang-py36 - docker-turbodbc-integration: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-circle-docker-turbodbc-integration - docker-cpp-release: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-circle-docker-cpp-release - docker-docs: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-circle-docker-docs - debian-buster-arm64: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-azure-debian-buster-arm64 - debian-stretch: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-azure-debian-stretch - wheel-manylinux1-cp27mu: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-travis-wheel-manylinux1-cp27mu - docker-cpp-static-only: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-circle-docker-cpp-static-only - docker-r: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-circle-docker-r - wheel-manylinux2010-cp27m: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-travis-wheel-manylinux2010-cp27m - debian-buster: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-azure-debian-buster - wheel-manylinux1-cp27m: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-travis-wheel-manylinux1-cp27m - wheel-win-cp37m: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-appveyor-wheel-win-cp37m - ubuntu-bionic-arm64: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-azure-ubuntu-bionic-arm64 - gandiva-jar-trusty: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-travis-gandiva-jar-trusty - docker-js: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-circle-docker-js - docker-python-2.7-nopandas: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-circle-docker-python-2.7-nopandas - wheel-manylinux2010-cp37m: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-travis-wheel-manylinux2010-cp37m - wheel-win-cp35m: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-appveyor-wheel-win-cp35m - docker-cpp: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-circle-docker-cpp - conda-osx-clang-py37: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-azure-conda-osx-clang-py37 - ubuntu-disco-arm64: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-azure-ubuntu-disco-arm64 - wheel-osx-cp35m: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-travis-wheel-osx-cp35m - docker-c_glib: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-circle-docker-c_glib - conda-win-vs2015-py36: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-26-0-azure-conda-win-vs2015-py36 - wheel-manylinux2010-cp36m: URL:
Re: Timeline for 0.15.0 release
There are still missing linux artifacts [1]: - for amd64 debug symbol packages - for arm64 optional CUDA, plasma and Gandiva modules I think we can safely ignore them for the release, crossbow will report them as missing but the artifact downloading step finish. Let me know Micah if you have any issues. [1]: https://github.com/apache/arrow/pull/5506#issuecomment-535495351 On Thu, Sep 26, 2019 at 3:38 PM Micah Kornfield wrote: > Yes, I merged it and it will be included. I needed to start over due to a > cross-bow issue... > > On Thu, Sep 26, 2019 at 7:18 AM Ji Liu wrote: > >> Hi Micah, >> Hmm, unfortunately, I just found a bug in JDBC adapter and open a >> PR, could this change catch up with 0.15? >> See https://github.com/apache/arrow/pull/5511 >> >> >> Thanks, >> Ji Liu >> >> >> -- >> From:Micah Kornfield >> Send Time:2019年9月26日(星期四) 14:23 >> To:Neal Richardson >> Cc:"Krisztián Szűcs" ; Wes McKinney < >> wesmck...@gmail.com>; dev >> Subject:Re: Timeline for 0.15.0 release >> >> Just an I've started the RC generation process off, the last commit from >> master is [1] >> >> I am currently waiting the crossbow builds (build-690 on >> ursa-labs/crossbow). I think this will take a little while so I will pick >> it up tomorrow (Thursday). >> >> Thanks, >> Micah >> >> [1] >> >> https://github.com/apache/arrow/commit/07ab5083d5a2925ced6f8168b60b8fa336f4eccc >> >> On Wed, Sep 25, 2019 at 2:07 PM Neal Richardson < >> neal.p.richard...@gmail.com> >> wrote: >> >> > IMO it's too risky to add something that adds a dependency >> > (aws-sdk-cpp) on the day of cutting a release. >> > >> > Neal >> > >> > On Wed, Sep 25, 2019 at 12:54 PM Krisztián Szűcs >> > wrote: >> > > >> > > We don't have a comprehensive documentation yet, so let's postpone it. >> > > >> > > >> > > On Wed, Sep 25, 2019 at 9:48 PM Krisztián Szűcs < >> > szucs.kriszt...@gmail.com> wrote: >> > >> >> > >> The S3 python bindings would be a nice addition to the release. >> > >> I don't think we should block on this but the PR is ready. Opinions? >> > >> https://github.com/apache/arrow/pull/5423 >> > >> >> > >> >> > >> >> > >> >> > >> On Wed, Sep 25, 2019 at 5:28 PM Micah Kornfield < >> emkornfi...@gmail.com> >> > wrote: >> > >>> >> > >>> OK, I'll start the process today. I'll send up e-mail updates as I >> > make progress. >> > >>> >> > >>> On Wed, Sep 25, 2019 at 8:22 AM Wes McKinney >> > wrote: >> > >> > Yes, all systems go as far as I'm concerned. >> > >> > On Wed, Sep 25, 2019 at 9:56 AM Neal Richardson >> > wrote: >> > > >> >> > > Andy's DataFusion issue and Wes's Parquet one have both been merged, >> >> > > and it looks like the LICENSE issue is being resolved as I type. So >> > > are we good to go now? >> > > >> > > Neal >> > > >> > > >> > > On Tue, Sep 24, 2019 at 10:30 PM Andy Grove < >> andygrov...@gmail.com> >> > wrote: >> > > > >> > > > I found a last minute issue with DataFusion (Rust) and would >> > appreciate it >> > > > if we could merge ARROW-6086 (PR is >> > > > https://github.com/apache/arrow/pull/5494 >> ) before cutting the RC. >> > > > >> > > > Thanks, >> > > > >> > > > Andy. >> > > > >> > > > >> > > > On Tue, Sep 24, 2019 at 6:19 PM Micah Kornfield < >> > emkornfi...@gmail.com> >> > > > wrote: >> > > > >> > > > > OK, I'm going to postpone cutting a release until tomorrow >> > (hoping we can >> > > > > issues resolved by then).. I'll also try to review the >> > third-party >> > > > > additions since 14.x. >> > > > > >> > > > > On Tue, Sep 24, 2019 at 4:20 PM Wes McKinney < >> > wesmck...@gmail.com> wrote: >> > > > > >> > > > > > I found a licensing issue >> > > > > > >> > > > > > https://issues.apache.org/jira/browse/ARROW-6679 >> > > > > > >> > > > > > It might be worth examining third party code added to the >> > project >> > > > > > since 0.14.x to make sure there are no other such issues. >> > > > > > >> > > > > > On Tue, Sep 24, 2019 at 6:10 PM Wes McKinney < >> > wesmck...@gmail.com> >> > > > > wrote: >> > > > > > > >> >> > > > > > > I have diagnosed the problem (Thrift "string" data must be >> > UTF-8, >> >> > > > > > > cannot be arbitrary binary) and am working on a patch right >> > now >> > > > > > > >> > > > > > > On Tue, Sep 24, 2019 at 6:02 PM Wes McKinney < >> > wesmck...@gmail.com> >> > > > > > wrote: >> > > > > > > > >> > > > > > > > I just opened >> > > > > > > > >> > > > > > > > https://issues.apache.org/jira/browse/ARROW-6678 >> > > > > > > > >> > > > > > > > Please don't cut an RC until I have an opportunity to >> > diagnose this, >> > > > > > > > will report back. >> > > > > > > > >> > > > > > > > >> > > > > > > > On Tue, Sep 24, 2019 at 5:51 PM Wes
[jira] [Created] (ARROW-6710) [Java] Add JDBC adapter test to cover cases which contains some null values
Ji Liu created ARROW-6710: - Summary: [Java] Add JDBC adapter test to cover cases which contains some null values Key: ARROW-6710 URL: https://issues.apache.org/jira/browse/ARROW-6710 Project: Apache Arrow Issue Type: Improvement Components: Java Reporter: Ji Liu Assignee: Ji Liu The current JDBC adapter tests only cover the cases that values are all non-null or all null. However, the cases that ResultSet has some null values are not covered (ARROW-6709). -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: Timeline for 0.15.0 release
Hi Micah, Hmm, unfortunately, I just found a bug in JDBC adapter and open a PR, could this change catch up with 0.15? See https://github.com/apache/arrow/pull/5511 Thanks, Ji Liu -- From:Micah Kornfield Send Time:2019年9月26日(星期四) 14:23 To:Neal Richardson Cc:"Krisztián Szűcs" ; Wes McKinney ; dev Subject:Re: Timeline for 0.15.0 release Just an I've started the RC generation process off, the last commit from master is [1] I am currently waiting the crossbow builds (build-690 on ursa-labs/crossbow). I think this will take a little while so I will pick it up tomorrow (Thursday). Thanks, Micah [1] https://github.com/apache/arrow/commit/07ab5083d5a2925ced6f8168b60b8fa336f4eccc On Wed, Sep 25, 2019 at 2:07 PM Neal Richardson wrote: > IMO it's too risky to add something that adds a dependency > (aws-sdk-cpp) on the day of cutting a release. > > Neal > > On Wed, Sep 25, 2019 at 12:54 PM Krisztián Szűcs > wrote: > > > > We don't have a comprehensive documentation yet, so let's postpone it. > > > > > > On Wed, Sep 25, 2019 at 9:48 PM Krisztián Szűcs < > szucs.kriszt...@gmail.com> wrote: > >> > >> The S3 python bindings would be a nice addition to the release. > >> I don't think we should block on this but the PR is ready. Opinions? > >> https://github.com/apache/arrow/pull/5423 > >> > >> > >> > >> > >> On Wed, Sep 25, 2019 at 5:28 PM Micah Kornfield > wrote: > >>> > >>> OK, I'll start the process today. I'll send up e-mail updates as I > make progress. > >>> > >>> On Wed, Sep 25, 2019 at 8:22 AM Wes McKinney > wrote: > > Yes, all systems go as far as I'm concerned. > > On Wed, Sep 25, 2019 at 9:56 AM Neal Richardson > wrote: > > > > Andy's DataFusion issue and Wes's Parquet one have both been merged, > > and it looks like the LICENSE issue is being resolved as I type. So > > are we good to go now? > > > > Neal > > > > > > On Tue, Sep 24, 2019 at 10:30 PM Andy Grove > wrote: > > > > > > I found a last minute issue with DataFusion (Rust) and would > appreciate it > > > if we could merge ARROW-6086 (PR is > > > https://github.com/apache/arrow/pull/5494) before cutting the RC. > > > > > > Thanks, > > > > > > Andy. > > > > > > > > > On Tue, Sep 24, 2019 at 6:19 PM Micah Kornfield < > emkornfi...@gmail.com> > > > wrote: > > > > > > > OK, I'm going to postpone cutting a release until tomorrow > (hoping we can > > > > issues resolved by then).. I'll also try to review the > third-party > > > > additions since 14.x. > > > > > > > > On Tue, Sep 24, 2019 at 4:20 PM Wes McKinney < > wesmck...@gmail.com> wrote: > > > > > > > > > I found a licensing issue > > > > > > > > > > https://issues.apache.org/jira/browse/ARROW-6679 > > > > > > > > > > It might be worth examining third party code added to the > project > > > > > since 0.14.x to make sure there are no other such issues. > > > > > > > > > > On Tue, Sep 24, 2019 at 6:10 PM Wes McKinney < > wesmck...@gmail.com> > > > > wrote: > > > > > > > > > > > > I have diagnosed the problem (Thrift "string" data must be > UTF-8, > > > > > > cannot be arbitrary binary) and am working on a patch right > now > > > > > > > > > > > > On Tue, Sep 24, 2019 at 6:02 PM Wes McKinney < > wesmck...@gmail.com> > > > > > wrote: > > > > > > > > > > > > > > I just opened > > > > > > > > > > > > > > https://issues.apache.org/jira/browse/ARROW-6678 > > > > > > > > > > > > > > Please don't cut an RC until I have an opportunity to > diagnose this, > > > > > > > will report back. > > > > > > > > > > > > > > > > > > > > > On Tue, Sep 24, 2019 at 5:51 PM Wes McKinney < > wesmck...@gmail.com> > > > > > wrote: > > > > > > > > > > > > > > > > I'm investigating a possible Parquet-related > compatibility bug > > > > that I > > > > > > > > encountered through some routine testing / > benchmarking. I'll > > > > report > > > > > > > > back once I figure out what is going on (if anything) > > > > > > > > > > > > > > > > On Sun, Sep 22, 2019 at 11:51 PM Micah Kornfield < > > > > > emkornfi...@gmail.com> wrote: > > > > > > > > >> > > > > > > > > >> It's ideal if your GPG key is in the web of trust > (i.e. you can > > > > > get it > > > > > > > > >> signed by another PMC member), but is not 100% > essential. > > > > > > > > > > > > > > > > > > That won't be an option for me this week (it seems > like I would > > > > > need to meet one face-to-face). I'll try to get the GPG > checked in and > > > > the > > > > > rest of the pre-requisites done tomorrow (Monday) to > hopefully start the > > > > > release on Tuesday (hopefully we can solve the last >
[jira] [Created] (ARROW-6708) [C++] "cannot find -lboost_filesystem_static"
Antoine Pitrou created ARROW-6708: - Summary: [C++] "cannot find -lboost_filesystem_static" Key: ARROW-6708 URL: https://issues.apache.org/jira/browse/ARROW-6708 Project: Apache Arrow Issue Type: Bug Components: C++ Reporter: Antoine Pitrou I'm trying a fresh build on another machine and get this error when using the {{boost-cpp}} conda package: {code} /usr/bin/ld.gold: error: cannot find -lboost_filesystem_static /usr/bin/ld.gold: error: cannot find -lboost_system_static {code} Note that Boost static libraries are installed, but they are named {{libboost_filesystem.a}} and {{libboost_system.a}} (no "_static" suffix). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-6706) [Developer Tools] Cannot merge PRs from authors with "Á" (U+00C1) in their name
Andy Grove created ARROW-6706: - Summary: [Developer Tools] Cannot merge PRs from authors with "Á" (U+00C1) in their name Key: ARROW-6706 URL: https://issues.apache.org/jira/browse/ARROW-6706 Project: Apache Arrow Issue Type: Bug Components: Developer Tools Reporter: Andy Grove I tried merging a PR from Ádám Lippai ([https://github.com/alippai)] and the merge script failed with: {code:java} ./dev/merge_arrow_pr.py ARROW_HOME = /home/andy/git/andygrove/arrow/dev PROJECT_NAME = arrow Which pull request would you like to merge? (e.g. 34): 5499 Env APACHE_JIRA_USERNAME not set, please enter your JIRA username:andygrove Env APACHE_JIRA_PASSWORD not set, please enter your JIRA password:=== Pull Request #5499 === title ARROW-6705: [Rust] [DataFusion] README has invalid github URL source alippai/patch-1 target master url https://api.github.com/repos/apache/arrow/pulls/5499 === JIRA ARROW-6705 === Summary [Rust] [DataFusion] README has invalid github URL AssigneeNOT ASSIGNED!!! Components Rust Status Open URL https://issues.apache.org/jira/browse/ARROW-6705Proceed with merging pull request #5499? (y/n): y Switched to branch 'PR_TOOL_MERGE_PR_5499_MASTER' Automatic merge went well; stopped before committing as requested Traceback (most recent call last): File "./dev/merge_arrow_pr.py", line 571, in cli() File "./dev/merge_arrow_pr.py", line 556, in cli pr.merge() File "./dev/merge_arrow_pr.py", line 354, in merge print("Author {}: {}".format(i + 1, author)) UnicodeEncodeError: 'ascii' codec can't encode character u'\xc1' in position 0: ordinal not in range(128) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: Unnesting ListArrays
Thanks Wes, makes sense. I appreciate that there are use cases where both could be applicable. In my example, the most applicable I can think of is unnesting a ListArray column for a DataFrame (in the future C++ DataFrames API?) similar to the tidyr unnest function. I don't believe the current implementation wouldn't be able to align the flattened ListArray with the rest of the columns. I'll see if there's something I can do on this end. On Wed, Sep 25, 2019 at 6:27 PM Wes McKinney wrote: > hi Suhail, > > This follows the columnar format closely. The List layout is composed > from a child array providing the "inner" values, which are given the > List interpretation by adding an offsets buffer, and a validity > buffer to distinguish null from 0-length list values. So flatten() > here just returns the child array, which has only 3 values in the > example you gave. > > A function could be written to insert "null" for List values that are > null, but someone would have to write it and give it a name =) > > - Wes > > On Wed, Sep 25, 2019 at 5:15 PM Suhail Razzak > wrote: > > > > Hi, > > > > I'm working through a certain use case where I'm unnesting ListArrays, > but > > I noticed something peculiar - null ListValues are not retained in the > > unnested array. > > > > E.g. > > In [0]: arr = pa.array([[0, 1], [0], None, None]) > > In [1]: arr.flatten() > > Out [1]: [0, 1, 0] > > > > While I would have expected [0, 1, 0, null, null]. > > > > I should note that this works if the None is encapsulated in a list. So > I'm > > guessing this is expected logic and if so, what's the reasoning for that? > > > > Thanks, > > Suhail >
Re: Thread-safety guarantees of pyarrow Table (and other) objects
Hi Yevgeni, The main Arrow classes (such as Array, ChunkedArray, RecordBatch, Table) are immutable so support multi-thread usage out of the box. We have mutable classes as well (e.g. IO classes, ArrayBuilders, mutable Buffers...) and those are not thread-safe. Regards Antoine. Le 26/09/2019 à 06:03, Yevgeni Litvin a écrit : > Where in the documentation can I find information about thread-safety > guarantee of arrow classes? In particular, is the following usage of > pyarrow.Table showed by the pseudo-code thread-safe? > > > arrow_table = pa.Table.from_pandas(df) > > > def other_thread_worker_impl(arrow_table): > > arrow_table.column('some_column')[row].as_py() > > > run_in_parallel(other_thread_worker_impl, arrow_table) > > > I tried using pandas.DataFrame in the same multi-threaded setup and it > turned out to be unsafe (https://github.com/pandas-dev/pandas/issues/28439). > > Thank you. > > - Yevgeni >
[jira] [Created] (ARROW-6704) [C++] Cast from timestamp to higher resolution does not check out of bounds timestamps
Joris Van den Bossche created ARROW-6704: Summary: [C++] Cast from timestamp to higher resolution does not check out of bounds timestamps Key: ARROW-6704 URL: https://issues.apache.org/jira/browse/ARROW-6704 Project: Apache Arrow Issue Type: Bug Components: C++ Reporter: Joris Van den Bossche When casting eg {{timestamp('s')}} to {{timestamp('ns')}}, we do not check for out of bounds timestamps, giving "garbage" timestamps in the result: {code} In [74]: a_np = np.array(["2012-01-01", "2412-01-01"], dtype="datetime64[s]") In [75]: arr = pa.array(a_np) In [76]: arr Out[76]: [ 2012-01-01 00:00:00, 2412-01-01 00:00:00 ] In [77]: arr.cast(pa.timestamp('ns')) Out[77]: [ 2012-01-01 00:00:00.0, 1827-06-13 00:25:26.290448384 ] {code} Now, this is the same behaviour as numpy, so not sure we should do this. However, since we have a {{safe=True/False}}, I would expect that for {{safe=True}} we check this and for {{safe=False}} we do not check this. (numpy has a similiar {{casting='safe'}} but also does not raise an error in that case). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-6703) [Packaging][Linux] Restore ARROW_VERSION environment variable
Kouhei Sutou created ARROW-6703: --- Summary: [Packaging][Linux] Restore ARROW_VERSION environment variable Key: ARROW-6703 URL: https://issues.apache.org/jira/browse/ARROW-6703 Project: Apache Arrow Issue Type: Bug Components: Packaging Reporter: Kouhei Sutou Assignee: Kouhei Sutou Fix For: 0.15.0 {{ARROW_VERSION}} is needed to use correct download URL for RC. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-6702) [Rust] [DataFusion] Incorrect partition read
Adam Lippai created ARROW-6702: -- Summary: [Rust] [DataFusion] Incorrect partition read Key: ARROW-6702 URL: https://issues.apache.org/jira/browse/ARROW-6702 Project: Apache Arrow Issue Type: Bug Components: Rust, Rust - DataFusion Affects Versions: 0.15.0 Reporter: Adam Lippai Reading a dir structure of duplicated alltypes_plain.parquet returns 8 rows instead of 16 (e.g. read by pandas parquet reader) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-6701) [C++][R] Lint failing on R cpp code
Micah Kornfield created ARROW-6701: -- Summary: [C++][R] Lint failing on R cpp code Key: ARROW-6701 URL: https://issues.apache.org/jira/browse/ARROW-6701 Project: Apache Arrow Issue Type: Bug Reporter: Micah Kornfield Fix For: 1.0.0 [See as an example https://travis-ci.org/apache/arrow/jobs/589772132#L695|https://travis-ci.org/apache/arrow/jobs/589772132#L695] -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: Timeline for 0.15.0 release
Just an I've started the RC generation process off, the last commit from master is [1] I am currently waiting the crossbow builds (build-690 on ursa-labs/crossbow). I think this will take a little while so I will pick it up tomorrow (Thursday). Thanks, Micah [1] https://github.com/apache/arrow/commit/07ab5083d5a2925ced6f8168b60b8fa336f4eccc On Wed, Sep 25, 2019 at 2:07 PM Neal Richardson wrote: > IMO it's too risky to add something that adds a dependency > (aws-sdk-cpp) on the day of cutting a release. > > Neal > > On Wed, Sep 25, 2019 at 12:54 PM Krisztián Szűcs > wrote: > > > > We don't have a comprehensive documentation yet, so let's postpone it. > > > > > > On Wed, Sep 25, 2019 at 9:48 PM Krisztián Szűcs < > szucs.kriszt...@gmail.com> wrote: > >> > >> The S3 python bindings would be a nice addition to the release. > >> I don't think we should block on this but the PR is ready. Opinions? > >> https://github.com/apache/arrow/pull/5423 > >> > >> > >> > >> > >> On Wed, Sep 25, 2019 at 5:28 PM Micah Kornfield > wrote: > >>> > >>> OK, I'll start the process today. I'll send up e-mail updates as I > make progress. > >>> > >>> On Wed, Sep 25, 2019 at 8:22 AM Wes McKinney > wrote: > > Yes, all systems go as far as I'm concerned. > > On Wed, Sep 25, 2019 at 9:56 AM Neal Richardson > wrote: > > > > Andy's DataFusion issue and Wes's Parquet one have both been merged, > > and it looks like the LICENSE issue is being resolved as I type. So > > are we good to go now? > > > > Neal > > > > > > On Tue, Sep 24, 2019 at 10:30 PM Andy Grove > wrote: > > > > > > I found a last minute issue with DataFusion (Rust) and would > appreciate it > > > if we could merge ARROW-6086 (PR is > > > https://github.com/apache/arrow/pull/5494) before cutting the RC. > > > > > > Thanks, > > > > > > Andy. > > > > > > > > > On Tue, Sep 24, 2019 at 6:19 PM Micah Kornfield < > emkornfi...@gmail.com> > > > wrote: > > > > > > > OK, I'm going to postpone cutting a release until tomorrow > (hoping we can > > > > issues resolved by then).. I'll also try to review the > third-party > > > > additions since 14.x. > > > > > > > > On Tue, Sep 24, 2019 at 4:20 PM Wes McKinney < > wesmck...@gmail.com> wrote: > > > > > > > > > I found a licensing issue > > > > > > > > > > https://issues.apache.org/jira/browse/ARROW-6679 > > > > > > > > > > It might be worth examining third party code added to the > project > > > > > since 0.14.x to make sure there are no other such issues. > > > > > > > > > > On Tue, Sep 24, 2019 at 6:10 PM Wes McKinney < > wesmck...@gmail.com> > > > > wrote: > > > > > > > > > > > > I have diagnosed the problem (Thrift "string" data must be > UTF-8, > > > > > > cannot be arbitrary binary) and am working on a patch right > now > > > > > > > > > > > > On Tue, Sep 24, 2019 at 6:02 PM Wes McKinney < > wesmck...@gmail.com> > > > > > wrote: > > > > > > > > > > > > > > I just opened > > > > > > > > > > > > > > https://issues.apache.org/jira/browse/ARROW-6678 > > > > > > > > > > > > > > Please don't cut an RC until I have an opportunity to > diagnose this, > > > > > > > will report back. > > > > > > > > > > > > > > > > > > > > > On Tue, Sep 24, 2019 at 5:51 PM Wes McKinney < > wesmck...@gmail.com> > > > > > wrote: > > > > > > > > > > > > > > > > I'm investigating a possible Parquet-related > compatibility bug > > > > that I > > > > > > > > encountered through some routine testing / > benchmarking. I'll > > > > report > > > > > > > > back once I figure out what is going on (if anything) > > > > > > > > > > > > > > > > On Sun, Sep 22, 2019 at 11:51 PM Micah Kornfield < > > > > > emkornfi...@gmail.com> wrote: > > > > > > > > >> > > > > > > > > >> It's ideal if your GPG key is in the web of trust > (i.e. you can > > > > > get it > > > > > > > > >> signed by another PMC member), but is not 100% > essential. > > > > > > > > > > > > > > > > > > That won't be an option for me this week (it seems > like I would > > > > > need to meet one face-to-face). I'll try to get the GPG > checked in and > > > > the > > > > > rest of the pre-requisites done tomorrow (Monday) to > hopefully start the > > > > > release on Tuesday (hopefully we can solve the last > blocker/integration > > > > > tests by then). > > > > > > > > > > > > > > > > > > On Sat, Sep 21, 2019 at 7:12 PM Wes McKinney < > > > > wesmck...@gmail.com> > > > > > wrote: > > > > > > > > >> > > > > > > > > >> It's ideal if your GPG key is in the web of trust > (i.e. you can > > > > > get it > > > > > > > > >> signed by another PMC member), but is not 100% >