[jira] [Assigned] (ARROW-8509) GArrowRecordBatch <-> GArrowBuffer conversion functions
[ https://issues.apache.org/jira/browse/ARROW-8509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kouhei Sutou reassigned ARROW-8509: --- Assignee: Kouhei Sutou (was: Tanveer) > GArrowRecordBatch <-> GArrowBuffer conversion functions > --- > > Key: ARROW-8509 > URL: https://issues.apache.org/jira/browse/ARROW-8509 > Project: Apache Arrow > Issue Type: New Feature > Components: GLib >Reporter: Tanveer >Assignee: Kouhei Sutou >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > Hi All, > I am working on integrating two programs, both of which are using Plasma API. > For this purpose, I need to convert RecordBatches to Buffer to transfer to > Plasma. > I have created GArrowRecordBatch <-> GArrowBuffer conversion functions which > are working for me locally, but I am not sure if I have adopted the correct > way, I want it to be integrated into c_glib. Can you people please check > these functions and update/accept the pull request? > > https://github.com/apache/arrow/pull/6963 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8610) [Rust] DivideByZero when running arrow crate when simd feature is disabled
[ https://issues.apache.org/jira/browse/ARROW-8610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-8610: - Summary: [Rust] DivideByZero when running arrow crate when simd feature is disabled (was: DivideByZero when running arrow crate when simd feature is disabled) > [Rust] DivideByZero when running arrow crate when simd feature is disabled > -- > > Key: ARROW-8610 > URL: https://issues.apache.org/jira/browse/ARROW-8610 > Project: Apache Arrow > Issue Type: Bug > Components: Rust >Reporter: R. Tyler Croy >Priority: Major > > This is reproducible when running without simd features, or when trying to > compile on an {{aarch64}} machine as well. > > {{% cargo test --no-default-features}} > > {code:java} > failures: > compute::kernels::arithmetic::tests::test_primitive_array_divide_with_nulls > stdout > thread > 'compute::kernels::arithmetic::tests::test_primitive_array_divide_with_nulls' > panicked at 'called `Result::unwrap()` on an `Err` value: DivideByZero', > src/libcore/result.rs:1187:5 > failures: > > compute::kernels::arithmetic::tests::test_primitive_array_divide_with_nullstest > result: FAILED. 312 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out > {code} > > I tried to address the issue this myself, and it looks like the {{divide}} > function with the {{simd}} feature doesn't work properly, something is up > with {{math_op}} but I don't understand this well enough. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-7076) `pip install pyarrow` with python 3.8 fail with message : Could not build wheels for pyarrow which use PEP 517 and cannot be installed directly
[ https://issues.apache.org/jira/browse/ARROW-7076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17094192#comment-17094192 ] Joris Van den Bossche commented on ARROW-7076: -- [~ManthanAdmane] you will need to give more details (the exact commands you ran, the full output, which version you are installing, which platform, ...) > `pip install pyarrow` with python 3.8 fail with message : Could not build > wheels for pyarrow which use PEP 517 and cannot be installed directly > --- > > Key: ARROW-7076 > URL: https://issues.apache.org/jira/browse/ARROW-7076 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.15.1 > Environment: Ubuntu 19.10 / Python 3.8.0 >Reporter: Fabien >Priority: Minor > > When I install pyarrow in python 3.7.5 with `pip install pyarrow` it works. > However with python 3.8.0 it fails with the following error : > {noformat} > 14:06 $ pip install pyarrow > Collecting pyarrow > Using cached > https://files.pythonhosted.org/packages/e0/e6/d14b4a2b54ef065b1a2c576537abe805c1af0c94caef70d365e2d78fc528/pyarrow-0.15.1.tar.gz > Installing build dependencies ... done > Getting requirements to build wheel ... done > Preparing wheel metadata ... done > Collecting numpy>=1.14 > Using cached > https://files.pythonhosted.org/packages/3a/8f/f9ee25c0ae608f86180c26a1e35fe7ea9d71b473ea7f54db20759ba2745e/numpy-1.17.3-cp38-cp38-manylinux1_x86_64.whl > Collecting six>=1.0.0 > Using cached > https://files.pythonhosted.org/packages/65/26/32b8464df2a97e6dd1b656ed26b2c194606c16fe163c695a992b36c11cdf/six-1.13.0-py2.py3-none-any.whl > Building wheels for collected packages: pyarrow > Building wheel for pyarrow (PEP 517) ... error > ERROR: Command errored out with exit status 1: > command: /home/fabien/.local/share/virtualenvs/pipenv-_eZlsrLD/bin/python3.8 > /home/fabien/.local/share/virtualenvs/pipenv-_eZlsrLD/lib/python3.8/site-packages/pip/_vendor/pep517/_in_process.py > build_wheel /tmp/tmp4gpyu82j > cwd: /tmp/pip-install-cj5ucedq/pyarrow > Complete output (490 lines): > running bdist_wheel > running build > running build_py > creating build > creating build/lib.linux-x86_64-3.8 > creating build/lib.linux-x86_64-3.8/pyarrow > copying pyarrow/flight.py -> build/lib.linux-x86_64-3.8/pyarrow > copying pyarrow/orc.py -> build/lib.linux-x86_64-3.8/pyarrow > copying pyarrow/jvm.py -> build/lib.linux-x86_64-3.8/pyarrow > copying pyarrow/util.py -> build/lib.linux-x86_64-3.8/pyarrow > copying pyarrow/pandas_compat.py -> build/lib.linux-x86_64-3.8/pyarrow > copying pyarrow/cuda.py -> build/lib.linux-x86_64-3.8/pyarrow > copying pyarrow/filesystem.py -> build/lib.linux-x86_64-3.8/pyarrow > copying pyarrow/json.py -> build/lib.linux-x86_64-3.8/pyarrow > copying pyarrow/feather.py -> build/lib.linux-x86_64-3.8/pyarrow > copying pyarrow/serialization.py -> build/lib.linux-x86_64-3.8/pyarrow > copying pyarrow/ipc.py -> build/lib.linux-x86_64-3.8/pyarrow > copying pyarrow/parquet.py -> build/lib.linux-x86_64-3.8/pyarrow > copying pyarrow/_generated_version.py -> build/lib.linux-x86_64-3.8/pyarrow > copying pyarrow/benchmark.py -> build/lib.linux-x86_64-3.8/pyarrow > copying pyarrow/types.py -> build/lib.linux-x86_64-3.8/pyarrow > copying pyarrow/hdfs.py -> build/lib.linux-x86_64-3.8/pyarrow > copying pyarrow/fs.py -> build/lib.linux-x86_64-3.8/pyarrow > copying pyarrow/plasma.py -> build/lib.linux-x86_64-3.8/pyarrow > copying pyarrow/csv.py -> build/lib.linux-x86_64-3.8/pyarrow > copying pyarrow/compat.py -> build/lib.linux-x86_64-3.8/pyarrow > copying pyarrow/__init__.py -> build/lib.linux-x86_64-3.8/pyarrow > creating build/lib.linux-x86_64-3.8/pyarrow/tests > copying pyarrow/tests/test_strategies.py -> > build/lib.linux-x86_64-3.8/pyarrow/tests > copying pyarrow/tests/test_array.py -> > build/lib.linux-x86_64-3.8/pyarrow/tests > copying pyarrow/tests/test_tensor.py -> > build/lib.linux-x86_64-3.8/pyarrow/tests > copying pyarrow/tests/test_json.py -> > build/lib.linux-x86_64-3.8/pyarrow/tests > copying pyarrow/tests/test_cython.py -> > build/lib.linux-x86_64-3.8/pyarrow/tests > copying pyarrow/tests/test_deprecations.py -> > build/lib.linux-x86_64-3.8/pyarrow/tests > copying pyarrow/tests/conftest.py -> build/lib.linux-x86_64-3.8/pyarrow/tests > copying pyarrow/tests/test_memory.py -> > build/lib.linux-x86_64-3.8/pyarrow/tests > copying pyarrow/tests/test_io.py -> build/lib.linux-x86_64-3.8/pyarrow/tests > copying pyarrow/tests/pandas_examples.py -> > build/lib.linux-x86_64-3.8/pyarrow/tests > copying pyarrow/tests/test_compute.py -> > build/lib.linux-x86_64-3.8/pyarrow/tests > copying pyarrow/tests/util
[jira] [Created] (ARROW-8610) DivideByZero when running arrow crate when simd feature is disabled
R. Tyler Croy created ARROW-8610: Summary: DivideByZero when running arrow crate when simd feature is disabled Key: ARROW-8610 URL: https://issues.apache.org/jira/browse/ARROW-8610 Project: Apache Arrow Issue Type: Bug Components: Rust Reporter: R. Tyler Croy This is reproducible when running without simd features, or when trying to compile on an {{aarch64}} machine as well. {{% cargo test --no-default-features}} {code:java} failures: compute::kernels::arithmetic::tests::test_primitive_array_divide_with_nulls stdout thread 'compute::kernels::arithmetic::tests::test_primitive_array_divide_with_nulls' panicked at 'called `Result::unwrap()` on an `Err` value: DivideByZero', src/libcore/result.rs:1187:5 failures: compute::kernels::arithmetic::tests::test_primitive_array_divide_with_nullstest result: FAILED. 312 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out {code} I tried to address the issue this myself, and it looks like the {{divide}} function with the {{simd}} feature doesn't work properly, something is up with {{math_op}} but I don't understand this well enough. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8609) [C++]orc JNI bridge crashed on null arrow buffer
[ https://issues.apache.org/jira/browse/ARROW-8609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-8609: -- Labels: pull-request-available (was: ) > [C++]orc JNI bridge crashed on null arrow buffer > > > Key: ARROW-8609 > URL: https://issues.apache.org/jira/browse/ARROW-8609 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Yuan Zhou >Assignee: Yuan Zhou >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > https://github.com/apache/arrow/blob/master/cpp/src/jni/orc/jni_wrapper.cpp#L278-L281 > We should do a check on arrow buffer if it's null, and passing right value to > the constructor. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8609) [C++]orc JNI bridge crashed on null arrow buffer
[ https://issues.apache.org/jira/browse/ARROW-8609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuan Zhou updated ARROW-8609: - Summary: [C++]orc JNI bridge crashed on null arrow buffer (was: orc JNI bridge crashed on null arrow buffer) > [C++]orc JNI bridge crashed on null arrow buffer > > > Key: ARROW-8609 > URL: https://issues.apache.org/jira/browse/ARROW-8609 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Yuan Zhou >Assignee: Yuan Zhou >Priority: Major > > https://github.com/apache/arrow/blob/master/cpp/src/jni/orc/jni_wrapper.cpp#L278-L281 > We should do a check on arrow buffer if it's null, and passing right value to > the constructor. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8609) orc JNI bridge crashed on null arrow buffer
Yuan Zhou created ARROW-8609: Summary: orc JNI bridge crashed on null arrow buffer Key: ARROW-8609 URL: https://issues.apache.org/jira/browse/ARROW-8609 Project: Apache Arrow Issue Type: Bug Components: C++ Reporter: Yuan Zhou Assignee: Yuan Zhou https://github.com/apache/arrow/blob/master/cpp/src/jni/orc/jni_wrapper.cpp#L278-L281 We should do a check on arrow buffer if it's null, and passing right value to the constructor. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8608) Update vendored mpark/variant.h to latest to fix NVCC compilation issues
Mark Harris created ARROW-8608: -- Summary: Update vendored mpark/variant.h to latest to fix NVCC compilation issues Key: ARROW-8608 URL: https://issues.apache.org/jira/browse/ARROW-8608 Project: Apache Arrow Issue Type: Bug Reporter: Mark Harris Arrow vendors [https://github.com/mpark/variant]. The vendored version is from 2019, which has issues compiling with NVCC (CUDA compiler). Projects like cuDF that depend on Arrow are stuck on a version of Arrow from before this dependency was added because they can't compile. mpark/variant's two most recent PRs are fixes for NVCC compilation. We would like to move cuDF forward to Arrow 0.16 or 0.17 soon, so it would be great to update the version mpark/variant in Arrow. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8556) [R] zstd symbol not found if there are multiple installations of zstd
[ https://issues.apache.org/jira/browse/ARROW-8556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neal Richardson updated ARROW-8556: --- Summary: [R] zstd symbol not found if there are multiple installations of zstd (was: [R] zstd symbol not found on Ubuntu 19.10) > [R] zstd symbol not found if there are multiple installations of zstd > - > > Key: ARROW-8556 > URL: https://issues.apache.org/jira/browse/ARROW-8556 > Project: Apache Arrow > Issue Type: Bug > Components: R >Affects Versions: 0.17.0 > Environment: Ubuntu 19.10 > R 3.6.1 >Reporter: Karl Dunkle Werner >Priority: Major > > I would like to install the `arrow` R package on my Ubuntu 19.10 system. > Prebuilt binaries are unavailable, and I want to enable compression, so I set > the {{LIBARROW_MINIMAL=false}} environment variable. When I do so, it looks > like the package is able to compile, but can't be loaded. I'm able to install > correctly if I don't set the {{LIBARROW_MINIMAL}} variable. > Here's the error I get: > {code:java} > ** testing if installed package can be loaded from temporary location > Error: package or namespace load failed for ‘arrow’ in dyn.load(file, DLLpath > = DLLpath, ...): > unable to load shared object > '~/.R/3.6/00LOCK-arrow/00new/arrow/libs/arrow.so': > ~/.R/3.6/00LOCK-arrow/00new/arrow/libs/arrow.so: undefined symbol: > ZSTD_initCStream > Error: loading failed > Execution halted > ERROR: loading failed > * removing ‘~/.R/3.6/arrow’ > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-8556) [R] zstd symbol not found on Ubuntu 19.10
[ https://issues.apache.org/jira/browse/ARROW-8556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17094017#comment-17094017 ] Neal Richardson commented on ARROW-8556: Thanks, that makes some sense. Googling the original undefined symbol error message, all I found were issues caused by having multiple versions of zstd installed (e.g. https://github.com/facebook/wangle/issues/73), but since you said you didn't have it installed before, I didn't think it was relevant. I wish there were a good way to make it not fail in that case, to make sure that if you build from source in the R build, that that version gets picked up. Maybe someone else will have an idea on how to achieve that. > [R] zstd symbol not found on Ubuntu 19.10 > - > > Key: ARROW-8556 > URL: https://issues.apache.org/jira/browse/ARROW-8556 > Project: Apache Arrow > Issue Type: Bug > Components: R >Affects Versions: 0.17.0 > Environment: Ubuntu 19.10 > R 3.6.1 >Reporter: Karl Dunkle Werner >Priority: Major > > I would like to install the `arrow` R package on my Ubuntu 19.10 system. > Prebuilt binaries are unavailable, and I want to enable compression, so I set > the {{LIBARROW_MINIMAL=false}} environment variable. When I do so, it looks > like the package is able to compile, but can't be loaded. I'm able to install > correctly if I don't set the {{LIBARROW_MINIMAL}} variable. > Here's the error I get: > {code:java} > ** testing if installed package can be loaded from temporary location > Error: package or namespace load failed for ‘arrow’ in dyn.load(file, DLLpath > = DLLpath, ...): > unable to load shared object > '~/.R/3.6/00LOCK-arrow/00new/arrow/libs/arrow.so': > ~/.R/3.6/00LOCK-arrow/00new/arrow/libs/arrow.so: undefined symbol: > ZSTD_initCStream > Error: loading failed > Execution halted > ERROR: loading failed > * removing ‘~/.R/3.6/arrow’ > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-8586) [R] installation failure on CentOS 7
[ https://issues.apache.org/jira/browse/ARROW-8586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17094013#comment-17094013 ] Neal Richardson commented on ARROW-8586: Thanks. A few thoughts. Apologies if this is confusing; we're going deep in some different directions: * {{ARROW_R_DEV=true}} is for installation verbosity only, not for crash reporting, and from the install logs you shared, I can see that apparently thrift failed to build/install. I haven't seen it fail in that specific way before, I don't think. If you want to go deeper into the Matrix with me, try reinstalling with {{ARROW_R_DEV=true}} and {{EXTRA_CMAKE_ARGS="-DARROW_VERBOSE_THIRDPARTY_BUILD=ON"}} (but unset {{LIBARROW_BINARY}} so that we build from source) and maybe we'll see what's going on there. * Alternatively, you could try installing {{thrift}} from {{yum}}, though I'm not sure that they have a new enough version (0.11 is the minimum). * Odd that you got a segfault when reading a parquet file. Is there anything special about how your system is configured (compilers, toolchains, etc.) beyond a vanilla CentOS 7 environment? The centos-7 binary is built on a base centos image with this Dockerfile: https://github.com/ursa-labs/arrow-r-nightly/blob/master/linux/yum.Dockerfile So maybe see if setting {{CC=/usr/bin/gcc CXX=/usr/bin/g++}} before installing the R package (with {{LIBARROW_BINARY=centos-7}}). * If that makes a difference, I wonder if https://github.com/ursa-labs/arrow-r-nightly/blob/master/linux/yum.Dockerfile#L18-L20 is what is needed to get the thrift compilation when building everything from source to work. * Thanks for the {{lsb_release}} output. That confirms my suspicion about why it did not try to download the centos-7 binary to begin with (though obviously that's not desirable unless we get it not to segfault for you). > [R] installation failure on CentOS 7 > > > Key: ARROW-8586 > URL: https://issues.apache.org/jira/browse/ARROW-8586 > Project: Apache Arrow > Issue Type: Bug > Components: R >Affects Versions: 0.17.0 > Environment: CentOS 7 >Reporter: Hei >Priority: Major > > Hi, > I am trying to install arrow via RStudio, but it seems like it is not working > that after I installed the package, it kept asking me to run > arrow::install_arrow() even after I did: > {code} > > install.packages("arrow") > Installing package into ‘/home/hc/R/x86_64-redhat-linux-gnu-library/3.6’ > (as ‘lib’ is unspecified) > trying URL 'https://cran.rstudio.com/src/contrib/arrow_0.17.0.tar.gz' > Content type 'application/x-gzip' length 242534 bytes (236 KB) > == > downloaded 236 KB > * installing *source* package ‘arrow’ ... > ** package ‘arrow’ successfully unpacked and MD5 sums checked > ** using staged installation > *** Successfully retrieved C++ source > *** Building C++ libraries > cmake > arrow > ./configure: line 132: cd: libarrow/arrow-0.17.0/lib: Not a directory > - NOTE --- > After installation, please run arrow::install_arrow() > for help installing required runtime libraries > - > ** libs > g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG > -I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" > -I/usr/local/include -fpic -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 > -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 > -grecord-gcc-switches -m64 -mtune=generic -c array.cpp -o array.o > g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG > -I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" > -I/usr/local/include -fpic -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 > -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 > -grecord-gcc-switches -m64 -mtune=generic -c array_from_vector.cpp -o > array_from_vector.o > g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG > -I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" > -I/usr/local/include -fpic -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 > -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 > -grecord-gcc-switches -m64 -mtune=generic -c array_to_vector.cpp -o > array_to_vector.o > g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG > -I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" > -I/usr/local/include -fpic -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 > -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 > -grecord-gcc-switches -m64 -mtune=generic -c arraydata.cpp -o arraydata.o > g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG > -I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" > -I/usr/local/include -fpic -O2 -g -pipe -Wal
[jira] [Commented] (ARROW-8556) [R] zstd symbol not found on Ubuntu 19.10
[ https://issues.apache.org/jira/browse/ARROW-8556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17094007#comment-17094007 ] Karl Dunkle Werner commented on ARROW-8556: --- Update: I remembered dev packages. I had libzstd-dev 1.4.3 installed as a dependency of libgdal-dev. After uninstalling it, I was able to install arrow. Logs are below. {noformat} * installing *source* package ‘arrow’ ... ** package ‘arrow’ successfully unpacked and MD5 sums checked ** using staged installation *** Generating code with data-raw/codegen.R Fatal error: cannot open file 'data-raw/codegen.R': No such file or directory trying URL 'https://dl.bintray.com/ursalabs/arrow-r/libarrow/src/arrow-0.17.0.zip' Error in download.file(from_url, to_file, quiet = quietly) : cannot open URL 'https://dl.bintray.com/ursalabs/arrow-r/libarrow/src/arrow-0.17.0.zip' trying URL 'https://www.apache.org/dyn/closer.lua?action=download&filename=arrow/arrow-0.17.0/apache-arrow-0.17.0.tar.gz' Content type 'application/x-gzip' length 6460548 bytes (6.2 MB) == downloaded 6.2 MB*** Successfully retrieved C++ source *** Building C++ libraries rm: cannot remove 'src/*.o': No such file or directory *** Building with MAKEFLAGS= -j4 arrow with SOURCE_DIR=/tmp/Rtmp9loTsA/file46054fc6ee7f/apache-arrow-0.17.0/cpp BUILD_DIR=/tmp/Rtmp9loTsA/file46055b57ae53 DEST_DIR=libarrow/arrow-0.17.0 CMAKE=/usr/bin/cmake ++ pwd + : /tmp/Rtmppd6Y9y/R.INSTALL45dd4a4e6ea2/arrow + : /tmp/Rtmp9loTsA/file46054fc6ee7f/apache-arrow-0.17.0/cpp + : /tmp/Rtmp9loTsA/file46055b57ae53 + : libarrow/arrow-0.17.0 + : /usr/bin/cmake ++ cd /tmp/Rtmp9loTsA/file46054fc6ee7f/apache-arrow-0.17.0/cpp ++ pwd + SOURCE_DIR=/tmp/Rtmp9loTsA/file46054fc6ee7f/apache-arrow-0.17.0/cpp ++ mkdir -p libarrow/arrow-0.17.0 ++ cd libarrow/arrow-0.17.0 ++ pwd + DEST_DIR=/tmp/Rtmppd6Y9y/R.INSTALL45dd4a4e6ea2/arrow/libarrow/arrow-0.17.0 + '[' '' = '' ']' + which ninja + CMAKE_GENERATOR=Ninja + '[' false = false ']' + ARROW_JEMALLOC=ON + ARROW_WITH_BROTLI=ON + ARROW_WITH_BZ2=ON + ARROW_WITH_LZ4=ON + ARROW_WITH_SNAPPY=ON + ARROW_WITH_ZLIB=ON + ARROW_WITH_ZSTD=ON + mkdir -p /tmp/Rtmp9loTsA/file46055b57ae53 + pushd /tmp/Rtmp9loTsA/file46055b57ae53 /tmp/Rtmp9loTsA/file46055b57ae53 /tmp/Rtmppd6Y9y/R.INSTALL45dd4a4e6ea2/arrow + /usr/bin/cmake -DARROW_BOOST_USE_SHARED=OFF -DARROW_BUILD_TESTS=OFF -DARROW_BUILD_SHARED=OFF -DARROW_BUILD_STATIC=ON -DARROW_COMPUTE=ON -DARROW_CSV=ON -DARROW_DATASET=ON -DARROW_DEPENDENCY_SOURCE=AUTO -DARROW_FILESYSTEM=ON -DARROW_JEMALLOC=ON -DARROW_JSON=ON -DARROW_PARQUET=ON -DARROW_WITH_BROTLI=ON -DARROW_WITH_BZ2=ON -DARROW_WITH_LZ4=ON -DARROW_WITH_SNAPPY=ON -DARROW_WITH_ZLIB=ON -DARROW_WITH_ZSTD=ON -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_LIBDIR=lib -DCMAKE_INSTALL_PREFIX=/tmp/Rtmppd6Y9y/R.INSTALL45dd4a4e6ea2/arrow/libarrow/arrow-0.17.0 -DCMAKE_EXPORT_NO_PACKAGE_REGISTRY=ON -DCMAKE_FIND_PACKAGE_NO_PACKAGE_REGISTRY=ON -DCMAKE_UNITY_BUILD=ON -DOPENSSL_USE_STATIC_LIBS=ON -G Ninja /tmp/Rtmp9loTsA/file46054fc6ee7f/apache-arrow-0.17.0/cpp -- Building using CMake version: 3.13.4 -- The C compiler identification is GNU 9.2.1 -- The CXX compiler identification is GNU 9.2.1 -- Check for working C compiler: /usr/lib/ccache/cc -- Check for working C compiler: /usr/lib/ccache/cc -- works -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Detecting C compile features -- Detecting C compile features - done -- Check for working CXX compiler: /usr/lib/ccache/c++ -- Check for working CXX compiler: /usr/lib/ccache/c++ -- works -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Detecting CXX compile features -- Detecting CXX compile features - done -- Arrow version: 0.17.0 (full: '0.17.0') -- Arrow SO version: 17 (full: 17.0.0) -- Found PkgConfig: /usr/bin/pkg-config (found version "0.29.1") -- clang-tidy not found -- clang-format not found -- Could NOT find ClangTools (missing: CLANG_FORMAT_BIN CLANG_TIDY_BIN) -- infer not found -- Found Python3: /usr/bin/python3.7 (found version "3.7.5") found components: Interpreter -- Using ccache: /usr/bin/ccache -- Found cpplint executable at /tmp/Rtmp9loTsA/file46054fc6ee7f/apache-arrow-0.17.0/cpp/build-support/cpplint.py -- System processor: x86_64 -- Performing Test CXX_SUPPORTS_SSE4_2 -- Performing Test CXX_SUPPORTS_SSE4_2 - Success -- Performing Test CXX_SUPPORTS_AVX2 -- Performing Test CXX_SUPPORTS_AVX2 - Success -- Performing Test CXX_SUPPORTS_AVX512 -- Performing Test CXX_SUPPORTS_AVX512 - Success -- Arrow build warning level: PRODUCTION Using ld linker Configured for RELEASE build (set with cmake -DCMAKE_BUILD_TYPE={release,debug,...}) -- Build Type: RELEASE -- Using AUTO approach to find dependencies -- ARROW_AWSSDK_BUILD_VERSION: 1.7.160 -- ARROW_BOOST_BUILD_VERSION: 1.71.0 -- ARROW_BROTLI_BUILD_VERSION: v1.0.7 -- ARROW_BZIP2_BUILD_
[jira] [Commented] (ARROW-8556) [R] zstd symbol not found on Ubuntu 19.10
[ https://issues.apache.org/jira/browse/ARROW-8556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17094002#comment-17094002 ] Neal Richardson commented on ARROW-8556: Any ideas [~fsaintjacques] [~bkietz]? > [R] zstd symbol not found on Ubuntu 19.10 > - > > Key: ARROW-8556 > URL: https://issues.apache.org/jira/browse/ARROW-8556 > Project: Apache Arrow > Issue Type: Bug > Components: R >Affects Versions: 0.17.0 > Environment: Ubuntu 19.10 > R 3.6.1 >Reporter: Karl Dunkle Werner >Priority: Major > > I would like to install the `arrow` R package on my Ubuntu 19.10 system. > Prebuilt binaries are unavailable, and I want to enable compression, so I set > the {{LIBARROW_MINIMAL=false}} environment variable. When I do so, it looks > like the package is able to compile, but can't be loaded. I'm able to install > correctly if I don't set the {{LIBARROW_MINIMAL}} variable. > Here's the error I get: > {code:java} > ** testing if installed package can be loaded from temporary location > Error: package or namespace load failed for ‘arrow’ in dyn.load(file, DLLpath > = DLLpath, ...): > unable to load shared object > '~/.R/3.6/00LOCK-arrow/00new/arrow/libs/arrow.so': > ~/.R/3.6/00LOCK-arrow/00new/arrow/libs/arrow.so: undefined symbol: > ZSTD_initCStream > Error: loading failed > Execution halted > ERROR: loading failed > * removing ‘~/.R/3.6/arrow’ > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-7076) `pip install pyarrow` with python 3.8 fail with message : Could not build wheels for pyarrow which use PEP 517 and cannot be installed directly
[ https://issues.apache.org/jira/browse/ARROW-7076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17093996#comment-17093996 ] Manthan Admane commented on ARROW-7076: --- I don't know how and why- it gives me a "ERROR: Could not build wheels for pyarrow which use PEP 517 and cannot be installed directly" When I am using a virtual env on anaconda BUT succeeds when I am using the (base) default environment. > `pip install pyarrow` with python 3.8 fail with message : Could not build > wheels for pyarrow which use PEP 517 and cannot be installed directly > --- > > Key: ARROW-7076 > URL: https://issues.apache.org/jira/browse/ARROW-7076 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.15.1 > Environment: Ubuntu 19.10 / Python 3.8.0 >Reporter: Fabien >Priority: Minor > > When I install pyarrow in python 3.7.5 with `pip install pyarrow` it works. > However with python 3.8.0 it fails with the following error : > {noformat} > 14:06 $ pip install pyarrow > Collecting pyarrow > Using cached > https://files.pythonhosted.org/packages/e0/e6/d14b4a2b54ef065b1a2c576537abe805c1af0c94caef70d365e2d78fc528/pyarrow-0.15.1.tar.gz > Installing build dependencies ... done > Getting requirements to build wheel ... done > Preparing wheel metadata ... done > Collecting numpy>=1.14 > Using cached > https://files.pythonhosted.org/packages/3a/8f/f9ee25c0ae608f86180c26a1e35fe7ea9d71b473ea7f54db20759ba2745e/numpy-1.17.3-cp38-cp38-manylinux1_x86_64.whl > Collecting six>=1.0.0 > Using cached > https://files.pythonhosted.org/packages/65/26/32b8464df2a97e6dd1b656ed26b2c194606c16fe163c695a992b36c11cdf/six-1.13.0-py2.py3-none-any.whl > Building wheels for collected packages: pyarrow > Building wheel for pyarrow (PEP 517) ... error > ERROR: Command errored out with exit status 1: > command: /home/fabien/.local/share/virtualenvs/pipenv-_eZlsrLD/bin/python3.8 > /home/fabien/.local/share/virtualenvs/pipenv-_eZlsrLD/lib/python3.8/site-packages/pip/_vendor/pep517/_in_process.py > build_wheel /tmp/tmp4gpyu82j > cwd: /tmp/pip-install-cj5ucedq/pyarrow > Complete output (490 lines): > running bdist_wheel > running build > running build_py > creating build > creating build/lib.linux-x86_64-3.8 > creating build/lib.linux-x86_64-3.8/pyarrow > copying pyarrow/flight.py -> build/lib.linux-x86_64-3.8/pyarrow > copying pyarrow/orc.py -> build/lib.linux-x86_64-3.8/pyarrow > copying pyarrow/jvm.py -> build/lib.linux-x86_64-3.8/pyarrow > copying pyarrow/util.py -> build/lib.linux-x86_64-3.8/pyarrow > copying pyarrow/pandas_compat.py -> build/lib.linux-x86_64-3.8/pyarrow > copying pyarrow/cuda.py -> build/lib.linux-x86_64-3.8/pyarrow > copying pyarrow/filesystem.py -> build/lib.linux-x86_64-3.8/pyarrow > copying pyarrow/json.py -> build/lib.linux-x86_64-3.8/pyarrow > copying pyarrow/feather.py -> build/lib.linux-x86_64-3.8/pyarrow > copying pyarrow/serialization.py -> build/lib.linux-x86_64-3.8/pyarrow > copying pyarrow/ipc.py -> build/lib.linux-x86_64-3.8/pyarrow > copying pyarrow/parquet.py -> build/lib.linux-x86_64-3.8/pyarrow > copying pyarrow/_generated_version.py -> build/lib.linux-x86_64-3.8/pyarrow > copying pyarrow/benchmark.py -> build/lib.linux-x86_64-3.8/pyarrow > copying pyarrow/types.py -> build/lib.linux-x86_64-3.8/pyarrow > copying pyarrow/hdfs.py -> build/lib.linux-x86_64-3.8/pyarrow > copying pyarrow/fs.py -> build/lib.linux-x86_64-3.8/pyarrow > copying pyarrow/plasma.py -> build/lib.linux-x86_64-3.8/pyarrow > copying pyarrow/csv.py -> build/lib.linux-x86_64-3.8/pyarrow > copying pyarrow/compat.py -> build/lib.linux-x86_64-3.8/pyarrow > copying pyarrow/__init__.py -> build/lib.linux-x86_64-3.8/pyarrow > creating build/lib.linux-x86_64-3.8/pyarrow/tests > copying pyarrow/tests/test_strategies.py -> > build/lib.linux-x86_64-3.8/pyarrow/tests > copying pyarrow/tests/test_array.py -> > build/lib.linux-x86_64-3.8/pyarrow/tests > copying pyarrow/tests/test_tensor.py -> > build/lib.linux-x86_64-3.8/pyarrow/tests > copying pyarrow/tests/test_json.py -> > build/lib.linux-x86_64-3.8/pyarrow/tests > copying pyarrow/tests/test_cython.py -> > build/lib.linux-x86_64-3.8/pyarrow/tests > copying pyarrow/tests/test_deprecations.py -> > build/lib.linux-x86_64-3.8/pyarrow/tests > copying pyarrow/tests/conftest.py -> build/lib.linux-x86_64-3.8/pyarrow/tests > copying pyarrow/tests/test_memory.py -> > build/lib.linux-x86_64-3.8/pyarrow/tests > copying pyarrow/tests/test_io.py -> build/lib.linux-x86_64-3.8/pyarrow/tests > copying pyarrow/tests/pandas_examples.py -> > build/lib.linux-x86_64-3.8/pyarrow/tests > copying pyarrow/tests/test_compute.py ->
[jira] [Resolved] (ARROW-8607) [R][CI] Unbreak builds following R 4.0 release
[ https://issues.apache.org/jira/browse/ARROW-8607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neal Richardson resolved ARROW-8607. Resolution: Fixed Issue resolved by pull request 7047 [https://github.com/apache/arrow/pull/7047] > [R][CI] Unbreak builds following R 4.0 release > -- > > Key: ARROW-8607 > URL: https://issues.apache.org/jira/browse/ARROW-8607 > Project: Apache Arrow > Issue Type: Improvement > Components: Continuous Integration, R >Reporter: Neal Richardson >Assignee: Neal Richardson >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 40m > Remaining Estimate: 0h > > Just a tourniquet to get master passing again while I work on ARROW-8604. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Issue Comment Deleted] (ARROW-5634) [C#] ArrayData.NullCount should be a property
[ https://issues.apache.org/jira/browse/ARROW-5634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zachary Gramana updated ARROW-5634: --- Comment: was deleted (was: [GitHub Pull Request #7032|https://github.com/apache/arrow/pull/7032] now properly computes the `NullCount` value and passes it to the `ArrayData` ctor in the `Slice` method. `NullCount` should remain a readonly field, however, in order to preserve immutability.) > [C#] ArrayData.NullCount should be a property > -- > > Key: ARROW-5634 > URL: https://issues.apache.org/jira/browse/ARROW-5634 > Project: Apache Arrow > Issue Type: Task > Components: C# >Reporter: Prashanth Govindarajan >Priority: Major > > ArrayData.NullCount should be a property so that it can be computed when > necessary: for ex: after Slice(), NullCount is -1 and needs to be computed -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-8606) [CI] Don't trigger all builds on a change to any file in ci/
[ https://issues.apache.org/jira/browse/ARROW-8606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney resolved ARROW-8606. - Fix Version/s: 1.0.0 Resolution: Fixed Issue resolved by pull request 7046 [https://github.com/apache/arrow/pull/7046] > [CI] Don't trigger all builds on a change to any file in ci/ > > > Key: ARROW-8606 > URL: https://issues.apache.org/jira/browse/ARROW-8606 > Project: Apache Arrow > Issue Type: Improvement > Components: Continuous Integration >Reporter: Neal Richardson >Assignee: Neal Richardson >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8607) [R][CI] Unbreak builds following R 4.0 release
[ https://issues.apache.org/jira/browse/ARROW-8607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-8607: -- Labels: pull-request-available (was: ) > [R][CI] Unbreak builds following R 4.0 release > -- > > Key: ARROW-8607 > URL: https://issues.apache.org/jira/browse/ARROW-8607 > Project: Apache Arrow > Issue Type: Improvement > Components: Continuous Integration, R >Reporter: Neal Richardson >Assignee: Neal Richardson >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > Just a tourniquet to get master passing again while I work on ARROW-8604. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-8603) [Documentation] Fix Sphinx doxygen comment
[ https://issues.apache.org/jira/browse/ARROW-8603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ben Kietzman resolved ARROW-8603. - Fix Version/s: 1.0.0 Resolution: Fixed Issue resolved by pull request 7045 [https://github.com/apache/arrow/pull/7045] > [Documentation] Fix Sphinx doxygen comment > -- > > Key: ARROW-8603 > URL: https://issues.apache.org/jira/browse/ARROW-8603 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Documentation >Reporter: Francois Saint-Jacques >Assignee: Francois Saint-Jacques >Priority: Trivial > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 20m > Remaining Estimate: 0h > > See [https://github.com/apache/arrow/runs/622393532] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (ARROW-8603) [Documentation] Fix Sphinx doxygen comment
[ https://issues.apache.org/jira/browse/ARROW-8603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ben Kietzman reassigned ARROW-8603: --- Assignee: Francois Saint-Jacques > [Documentation] Fix Sphinx doxygen comment > -- > > Key: ARROW-8603 > URL: https://issues.apache.org/jira/browse/ARROW-8603 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Documentation >Reporter: Francois Saint-Jacques >Assignee: Francois Saint-Jacques >Priority: Trivial > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > See [https://github.com/apache/arrow/runs/622393532] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8607) [R][CI] Unbreak builds following R 4.0 release
Neal Richardson created ARROW-8607: -- Summary: [R][CI] Unbreak builds following R 4.0 release Key: ARROW-8607 URL: https://issues.apache.org/jira/browse/ARROW-8607 Project: Apache Arrow Issue Type: Improvement Components: Continuous Integration, R Reporter: Neal Richardson Assignee: Neal Richardson Fix For: 1.0.0 Just a tourniquet to get master passing again while I work on ARROW-8604. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-7610) [Java] Finish support for 64 bit int allocations
[ https://issues.apache.org/jira/browse/ARROW-7610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler resolved ARROW-7610. - Fix Version/s: 1.0.0 Resolution: Fixed Issue resolved by pull request 6323 [https://github.com/apache/arrow/pull/6323] > [Java] Finish support for 64 bit int allocations > - > > Key: ARROW-7610 > URL: https://issues.apache.org/jira/browse/ARROW-7610 > Project: Apache Arrow > Issue Type: Bug > Components: Java >Reporter: Micah Kornfield >Assignee: Liya Fan >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 8.5h > Remaining Estimate: 0h > > 1. Add an allocator capable of allocating larger then 2GB of data. > 2. Do end-to-end round trip trip on a larger vector/record batch size. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-7873) [Python] Segfault in pandas version 1.0.1, read_parquet after creating a clickhouse odbc connection
[ https://issues.apache.org/jira/browse/ARROW-7873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17093843#comment-17093843 ] Matt Calder commented on ARROW-7873: No, we have so far kept pandas at version 0.25.3. We're transitioning away from the odbc driver and to our own in-house version so the issue may be moot for us. Matt > [Python] Segfault in pandas version 1.0.1, read_parquet after creating a > clickhouse odbc connection > --- > > Key: ARROW-7873 > URL: https://issues.apache.org/jira/browse/ARROW-7873 > Project: Apache Arrow > Issue Type: Bug > Components: Python > Environment: Ubuntu 18.04 >Reporter: Matt Calder >Priority: Minor > Attachments: foo.pkl, foo.pq > > > [I posted this issue to the pandas > github|[https://github.com/pandas-dev/pandas/issues/31981]]. > We get a segfault when making a call to pd.read_parquet after having made a > connection to clickhouse via odbc. Like so, > {code:python} > import pyodbc > import pandas as pd > con_str = > f"Driver=libclickhouseodbc.so;url=http://clickhouse/query;timeout=600"; > with pyodbc.connect(con_str, autocommit=True) as con: > pass > df = pd.DataFrame({'A': [1,1,1], 'B': ['a', 'b', 'c']}) > df.to_parquet('/tmp/foo.pq') > # This line core dumps: > pd.read_parquet('/tmp/foo.pq') > {code} > This happens with pandas version 1.0.1 but not with pandas 0.25.3. Here's a > stacktrace: > {code:java} > #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51 > #1 0x77a24801 in __GI_abort () at abort.c:79 > #2 0x763c1957 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 > #3 0x763c7ab6 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 > #4 0x763c7af1 in std::terminate() () from > /usr/lib/x86_64-linux-gnu/libstdc++.so.6 > #5 0x763c7d24 in __cxa_throw () from > /usr/lib/x86_64-linux-gnu/libstdc++.so.6 > #6 0x763c6a52 in __cxa_bad_cast () from > /usr/lib/x86_64-linux-gnu/libstdc++.so.6 > #7 0x764131ec in std::__cxx11::collate const& > std::use_facet >(std::locale const&) () from > /usr/lib/x86_64-linux-gnu/libstdc++.so.6 > #8 0x7fffbe4b8279 in std::__cxx11::basic_string std::char_traits, std::allocator > > std::__cxx11::regex_traits::transform_primary(char const*, > char const*) const () from /usr/local/lib/libparquet.so.100 > #9 0x7fffbe4bd71c in > std::__detail::_BracketMatcher, false, > false>::_M_ready() () from /usr/local/lib/libparquet.so.100 > #10 0x7fffbe4bda9e in void > std::__detail::_Compiler > >::_M_insert_character_class_matcher() () from > /usr/local/lib/libparquet.so.100 > #11 0x7fffbe4c0569 in > std::__detail::_Compiler >::_M_atom() () > from /usr/local/lib/libparquet.so.100 > #12 0x7fffbe4c0ad8 in > std::__detail::_Compiler >::_M_alternative() > () from /usr/local/lib/libparquet.so.100 > #13 0x7fffbe4c0a43 in > std::__detail::_Compiler >::_M_alternative() > () from /usr/local/lib/libparquet.so.100 > #14 0x7fffbe4c0d1c in > std::__detail::_Compiler >::_M_disjunction() > () from /usr/local/lib/libparquet.so.100 > #15 0x7fffbe4c1469 in > std::__detail::_Compiler >::_Compiler(char > const*, char const*, std::locale const&, > std::regex_constants::syntax_option_type) () from > /usr/local/lib/libparquet.so.100 > #16 0x7fffbe4a93d1 in > parquet::ApplicationVersion::ApplicationVersion(std::__cxx11::basic_string std::char_traits, std::allocator > const&) () from > /usr/local/lib/libparquet.so.100 > #17 0x7fffbe4c1c03 in > parquet::FileMetaData::FileMetaDataImpl::FileMetaDataImpl(void const*, > unsigned int*, std::shared_ptr const&) () from > /usr/local/lib/libparquet.so.100 > #18 0x7fffbe4a9e62 in parquet::FileMetaData::FileMetaData(void const*, > unsigned int*, std::shared_ptr const&) () from > /usr/local/lib/libparquet.so.100 > #19 0x7fffbe4a9ec2 in parquet::FileMetaData::Make(void const*, unsigned > int*, std::shared_ptr const&) () from > /usr/local/lib/libparquet.so.100 > #20 0x7fffbe48acaf in > parquet::SerializedFile::ParseUnencryptedFileMetadata(std::shared_ptr > const&, long, long, std::shared_ptr*, unsigned int*, unsigned > int*) () from /usr/local/lib/libparquet.so.100 > #21 0x7fffbe492d75 in parquet::SerializedFile::ParseMetaData() () from > /usr/local/lib/libparquet.so.100 > #22 0x7fffbe48d8f8 in > parquet::ParquetFileReader::Contents::Open(std::shared_ptr, > parquet::ReaderProperties const&, std::shared_ptr) () > from /usr/local/lib/libparquet.so.100 > #23 0x7fffbe48e598 in > parquet::ParquetFileReader::Open(std::shared_ptr, > parquet::ReaderProperties const&, std::shared_ptr) () > from /usr/local/lib/libparquet.so.100 > #24 0x7fffbe3a89bd in >
[jira] [Updated] (ARROW-8606) [CI] Don't trigger all builds on a change to any file in ci/
[ https://issues.apache.org/jira/browse/ARROW-8606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-8606: -- Labels: pull-request-available (was: ) > [CI] Don't trigger all builds on a change to any file in ci/ > > > Key: ARROW-8606 > URL: https://issues.apache.org/jira/browse/ARROW-8606 > Project: Apache Arrow > Issue Type: Improvement > Components: Continuous Integration >Reporter: Neal Richardson >Assignee: Neal Richardson >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8606) [CI] Don't trigger all builds on a change to any file in ci/
Neal Richardson created ARROW-8606: -- Summary: [CI] Don't trigger all builds on a change to any file in ci/ Key: ARROW-8606 URL: https://issues.apache.org/jira/browse/ARROW-8606 Project: Apache Arrow Issue Type: Improvement Components: Continuous Integration Reporter: Neal Richardson Assignee: Neal Richardson -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-8605) [R] Add support for brotli to Windows build
[ https://issues.apache.org/jira/browse/ARROW-8605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17093766#comment-17093766 ] Neal Richardson commented on ARROW-8605: You are correct. We do not build the windows package with brotli. Here is what we do build with: https://github.com/apache/arrow/blob/master/ci/scripts/PKGBUILD#L28-L31 If you were interested in adding it, ARROW-6960 is the right model to follow. > [R] Add support for brotli to Windows build > --- > > Key: ARROW-8605 > URL: https://issues.apache.org/jira/browse/ARROW-8605 > Project: Apache Arrow > Issue Type: New Feature >Affects Versions: 0.17.0 >Reporter: Hei >Priority: Major > > Hi, > My friend installed arrow and tried to open a parquet file with brotli codec. > But then, he got an error when calling read_parquet("my.parquet") on Windows: > {code} > Error in parquet__arrow__FileReader__ReadTable(self) : >IOError: NotImplemented: Brotli codec support not built > {code} > It sounds similar to ARROW-6960. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8605) [R] Add support for brotli to Windows build
[ https://issues.apache.org/jira/browse/ARROW-8605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neal Richardson updated ARROW-8605: --- Summary: [R] Add support for brotli to Windows build (was: Missing brotli Support in R Package?) > [R] Add support for brotli to Windows build > --- > > Key: ARROW-8605 > URL: https://issues.apache.org/jira/browse/ARROW-8605 > Project: Apache Arrow > Issue Type: Bug >Affects Versions: 0.17.0 >Reporter: Hei >Priority: Major > > Hi, > My friend installed arrow and tried to open a parquet file with brotli codec. > But then, he got an error when calling read_parquet("my.parquet") on Windows: > {code} > Error in parquet__arrow__FileReader__ReadTable(self) : >IOError: NotImplemented: Brotli codec support not built > {code} > It sounds similar to ARROW-6960. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8605) [R] Add support for brotli to Windows build
[ https://issues.apache.org/jira/browse/ARROW-8605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neal Richardson updated ARROW-8605: --- Issue Type: New Feature (was: Bug) > [R] Add support for brotli to Windows build > --- > > Key: ARROW-8605 > URL: https://issues.apache.org/jira/browse/ARROW-8605 > Project: Apache Arrow > Issue Type: New Feature >Affects Versions: 0.17.0 >Reporter: Hei >Priority: Major > > Hi, > My friend installed arrow and tried to open a parquet file with brotli codec. > But then, he got an error when calling read_parquet("my.parquet") on Windows: > {code} > Error in parquet__arrow__FileReader__ReadTable(self) : >IOError: NotImplemented: Brotli codec support not built > {code} > It sounds similar to ARROW-6960. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-7681) [Rust] Explicitly seeking a BufReader will discard the internal buffer
[ https://issues.apache.org/jira/browse/ARROW-7681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun resolved ARROW-7681. - Resolution: Fixed Issue resolved by pull request 6949 [https://github.com/apache/arrow/pull/6949] > [Rust] Explicitly seeking a BufReader will discard the internal buffer > -- > > Key: ARROW-7681 > URL: https://issues.apache.org/jira/browse/ARROW-7681 > Project: Apache Arrow > Issue Type: Bug > Components: Rust >Reporter: Max Burke >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 5h > Remaining Estimate: 0h > > This behavior was observed in the Parquet Rust file reader > (parquet/src/util/io.rs). > > Pull request: [https://github.com/apache/arrow/pull/6280] > > From the Rust documentation for BufReader: > > "Seeking always discards the internal buffer, even if the seek position would > otherwise fall within it. This guarantees that calling {{.into_inner()}} > immediately after a seek yields the underlying reader at the same position." > > [https://doc.rust-lang.org/std/io/struct.BufReader.html#impl-Seek] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8074) [C++][Dataset] Support for file-like objects (buffers) in FileSystemDataset?
[ https://issues.apache.org/jira/browse/ARROW-8074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-8074: - Fix Version/s: 1.0.0 > [C++][Dataset] Support for file-like objects (buffers) in FileSystemDataset? > > > Key: ARROW-8074 > URL: https://issues.apache.org/jira/browse/ARROW-8074 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, Python >Reporter: Joris Van den Bossche >Assignee: Ben Kietzman >Priority: Major > Labels: dataset, pull-request-available > Fix For: 1.0.0 > > Time Spent: 50m > Remaining Estimate: 0h > > The current {{pyarrow.parquet.read_table}}/{{ParquetFile}} can work with > buffer (reader) objects (file-like objects, pyarrow.Buffer, > pyarrow.BufferReader) as input when dealing with single files. This > functionality is for example being used by pandas and kartothek (in addition > to being extensively used in our own tests as well). > While we could keep the old implementation to handle single files (which is > different from the ParquetDataset logic), there are also some advantages of > being able to handle this in the Datasets API. > For example, this would enable to filtering functionality of the datasets > API, also for this single-file buffers use case, which would be a nice > enhancement (currently, {{read_table}} does not support {{filters}} in case > of single files, which is eg why kartothek implements this themselves). > Would this be possible to support? > The {{arrow::dataset::FileSource}} already has PATH and BUFFER enum types > (https://github.com/apache/arrow/blob/08f8bff05af37921ff1e5a2b630ce1e7ec1c0ede/cpp/src/arrow/dataset/file_base.h#L46-L49), > so it seems in principle possible to create a FileSource (for a > FileSystemDataset / FileFragment) from a buffer instead of from a path? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8596) [C++][Dataset] Add test case to check if all essential properties are reserved once ScannerBuilder::Project is called
[ https://issues.apache.org/jira/browse/ARROW-8596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Francois Saint-Jacques updated ARROW-8596: -- Labels: dataset (was: ) > [C++][Dataset] Add test case to check if all essential properties are > reserved once ScannerBuilder::Project is called > - > > Key: ARROW-8596 > URL: https://issues.apache.org/jira/browse/ARROW-8596 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Affects Versions: 0.17.0 >Reporter: Hongze Zhang >Assignee: Hongze Zhang >Priority: Major > Labels: dataset > > This is a follow-up of ARROW-8499. It's better to provide a test around > ScanOptions::ReplaceSchema to check if all properties other than projector > are copied when the function is called. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (ARROW-8394) Typescript compiler errors for arrow d.ts files, when using es2015-esm package
[ https://issues.apache.org/jira/browse/ARROW-8394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17093616#comment-17093616 ] Phil Price edited comment on ARROW-8394 at 4/27/20, 3:15 PM: - This also happens with `0.17.0` from what I can tell there are two things which I would consider adoption blockers: # apache-arrow compiles with typescript@3.5, but typescript@3.6+ has stricter type-checks, and cannot consume the output. A couple of cases I've found that are invalid: ## Extension of static `new()` methods with different typed parameter order (column derived from chunked but changes the signature of `new` to takes `string | Field` as the first param over `Data` ## Not passing template types to fields (e.g. `foo: Schema` to `foo: Schema`) # Attempting to upgrade `apache-arrow` to typescript@3.8 (or 3.9-beta) falls afoul of a typescript compiler bug ([https://github.com/microsoft/TypeScript/issues/35186]); from my understanding, the compiler is on an error path anyway but bails when trying to print error detail. This makes the upgrade difficult as it's a case of whack-a-mole. was (Author: pprice): This also happens with `0.17.0` from what I can tell there are two things which I would consider adoption blockers: # `apache-arrow` compiles with typescript@3.5, but typescript@3.6+ has stricter type-checks, and cannot consume the output. A couple of cases I've found that are invalid: ## Extension of static `new()` methods with different typed parameter order (column derived from chunked but changes the signature of `new` to takes `string | Field` as the first param over `Data` ## Not passing template types to fields (e.g. `foo: Schema` to `foo: Schema`) # Attempting to upgrade `apache-arrow` to typescript@3.8 (or 3.9-beta) falls afoul of a typescript compiler bug ([https://github.com/microsoft/TypeScript/issues/35186]); from my understanding, the compiler is on an error path anyway but bails when trying to print error detail. This makes the upgrade difficult as it's a case of whack-a-mole. > Typescript compiler errors for arrow d.ts files, when using es2015-esm package > -- > > Key: ARROW-8394 > URL: https://issues.apache.org/jira/browse/ARROW-8394 > Project: Apache Arrow > Issue Type: Bug > Components: JavaScript >Affects Versions: 0.16.0 >Reporter: Shyamal Shukla >Priority: Blocker > > Attempting to use apache-arrow within a web application, but typescript > compiler throws the following errors in some of arrow's .d.ts files > import \{ Table } from "../node_modules/@apache-arrow/es2015-esm/Arrow"; > export class SomeClass { > . > . > constructor() { > const t = Table.from(''); > } > *node_modules/@apache-arrow/es2015-esm/column.d.ts:14:22* - error TS2417: > Class static side 'typeof Column' incorrectly extends base class static side > 'typeof Chunked'. Types of property 'new' are incompatible. > *node_modules/@apache-arrow/es2015-esm/ipc/reader.d.ts:238:5* - error TS2717: > Subsequent property declarations must have the same type. Property 'schema' > must be of type 'Schema', but here has type 'Schema'. > 238 schema: Schema; > *node_modules/@apache-arrow/es2015-esm/recordbatch.d.ts:17:18* - error > TS2430: Interface 'RecordBatch' incorrectly extends interface 'StructVector'. > The types of 'slice(...).clone' are incompatible between these types. > the tsconfig.json file looks like > { > "compilerOptions": { > "target":"ES6", > "outDir": "dist", > "baseUrl": "src/" > }, > "exclude": ["dist"], > "include": ["src/*.ts"] > } -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-8394) Typescript compiler errors for arrow d.ts files, when using es2015-esm package
[ https://issues.apache.org/jira/browse/ARROW-8394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17093616#comment-17093616 ] Phil Price commented on ARROW-8394: --- This also happens with `0.17.0` from what I can tell there are two things which I would consider adoption blockers: # `apache-arrow` compiles with typescript@3.5, but typescript@3.6+ has stricter type-checks, and cannot consume the output. A couple of cases I've found that are invalid: ## Extension of static `new()` methods with different typed parameter order (column derived from chunked but changes the signature of `new` to takes `string | Field` as the first param over `Data` ## Not passing template types to fields (e.g. `foo: Schema` to `foo: Schema`) # Attempting to upgrade `apache-arrow` to typescript@3.8 (or 3.9-beta) falls afoul of a typescript compiler bug ([https://github.com/microsoft/TypeScript/issues/35186]); from my understanding, the compiler is on an error path anyway but bails when trying to print error detail. This makes the upgrade difficult as it's a case of whack-a-mole. > Typescript compiler errors for arrow d.ts files, when using es2015-esm package > -- > > Key: ARROW-8394 > URL: https://issues.apache.org/jira/browse/ARROW-8394 > Project: Apache Arrow > Issue Type: Bug > Components: JavaScript >Affects Versions: 0.16.0 >Reporter: Shyamal Shukla >Priority: Blocker > > Attempting to use apache-arrow within a web application, but typescript > compiler throws the following errors in some of arrow's .d.ts files > import \{ Table } from "../node_modules/@apache-arrow/es2015-esm/Arrow"; > export class SomeClass { > . > . > constructor() { > const t = Table.from(''); > } > *node_modules/@apache-arrow/es2015-esm/column.d.ts:14:22* - error TS2417: > Class static side 'typeof Column' incorrectly extends base class static side > 'typeof Chunked'. Types of property 'new' are incompatible. > *node_modules/@apache-arrow/es2015-esm/ipc/reader.d.ts:238:5* - error TS2717: > Subsequent property declarations must have the same type. Property 'schema' > must be of type 'Schema', but here has type 'Schema'. > 238 schema: Schema; > *node_modules/@apache-arrow/es2015-esm/recordbatch.d.ts:17:18* - error > TS2430: Interface 'RecordBatch' incorrectly extends interface 'StructVector'. > The types of 'slice(...).clone' are incompatible between these types. > the tsconfig.json file looks like > { > "compilerOptions": { > "target":"ES6", > "outDir": "dist", > "baseUrl": "src/" > }, > "exclude": ["dist"], > "include": ["src/*.ts"] > } -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (ARROW-7706) [Python] saving a dataframe to the same partitioned location silently doubles the data
[ https://issues.apache.org/jira/browse/ARROW-7706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17093604#comment-17093604 ] Gregory Hayes edited comment on ARROW-7706 at 4/27/20, 3:09 PM: One additional thought -- Spark 2.3 also implements the ability to dynamically overwrite only partitions that have changed, as described [here|https://jaceklaskowski.gitbooks.io/mastering-spark-sql/spark-sql-dynamic-partition-inserts.html ], by using: spark.conf.set("spark.sql.sources.partitionOverwriteMode","dynamic"), allowing for incrementally updating data in a pipeline. It would be awesome to see this in the roadmap eventually. was (Author: hayesgb): One additional thought -- Spark 2.3 also implements the ability to dynamically overwrite only partitions that have changed, as described [here|https://jaceklaskowski.gitbooks.io/mastering-spark-sql/spark-sql-dynamic-partition-inserts.html ]. It would be awesome to see this in the roadmap eventually. > [Python] saving a dataframe to the same partitioned location silently doubles > the data > -- > > Key: ARROW-7706 > URL: https://issues.apache.org/jira/browse/ARROW-7706 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.15.1 >Reporter: Tsvika Shapira >Priority: Major > Labels: dataset, parquet > > When a user saves a dataframe: > {code:python} > df1.to_parquet('/tmp/table', partition_cols=['col_a'], engine='pyarrow') > {code} > it will create sub-directories named "{{a=val1}}", "{{a=val2}}" in > {{/tmp/table}}. Each of them will contain one (or more?) parquet files with > random filenames. > If a user runs the same command again, the code will use the existing > sub-directories, but with different (random) filenames. As a result, any data > loaded from this folder will be wrong - each row will be present twice. > For example, when using > {code:python} > df1.to_parquet('/tmp/table', partition_cols=['col_a'], engine='pyarrow') # > second time > df2 = pd.read_parquet('/tmp/table', engine='pyarrow') > assert len(df1) == len(df2) # raise an error{code} > This is a subtle change in the data that can pass unnoticed. > > I would expect that the code will prevent the user from using an non-empty > destination as partitioned target. an overwrite flag can also be useful. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-7706) [Python] saving a dataframe to the same partitioned location silently doubles the data
[ https://issues.apache.org/jira/browse/ARROW-7706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17093604#comment-17093604 ] Gregory Hayes commented on ARROW-7706: -- One additional thought -- Spark 2.3 also implements the ability to dynamically overwrite only partitions that have changed, as described [here|https://jaceklaskowski.gitbooks.io/mastering-spark-sql/spark-sql-dynamic-partition-inserts.html ]. It would be awesome to see this in the roadmap eventually. > [Python] saving a dataframe to the same partitioned location silently doubles > the data > -- > > Key: ARROW-7706 > URL: https://issues.apache.org/jira/browse/ARROW-7706 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.15.1 >Reporter: Tsvika Shapira >Priority: Major > Labels: dataset, parquet > > When a user saves a dataframe: > {code:python} > df1.to_parquet('/tmp/table', partition_cols=['col_a'], engine='pyarrow') > {code} > it will create sub-directories named "{{a=val1}}", "{{a=val2}}" in > {{/tmp/table}}. Each of them will contain one (or more?) parquet files with > random filenames. > If a user runs the same command again, the code will use the existing > sub-directories, but with different (random) filenames. As a result, any data > loaded from this folder will be wrong - each row will be present twice. > For example, when using > {code:python} > df1.to_parquet('/tmp/table', partition_cols=['col_a'], engine='pyarrow') # > second time > df2 = pd.read_parquet('/tmp/table', engine='pyarrow') > assert len(df1) == len(df2) # raise an error{code} > This is a subtle change in the data that can pass unnoticed. > > I would expect that the code will prevent the user from using an non-empty > destination as partitioned target. an overwrite flag can also be useful. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8605) Missing brotli Support in R Package?
Hei created ARROW-8605: -- Summary: Missing brotli Support in R Package? Key: ARROW-8605 URL: https://issues.apache.org/jira/browse/ARROW-8605 Project: Apache Arrow Issue Type: Bug Affects Versions: 0.17.0 Reporter: Hei Hi, My friend installed arrow and tried to open a parquet file with brotli codec. But then, he got an error when calling read_parquet("my.parquet") on Windows: {code} Error in parquet__arrow__FileReader__ReadTable(self) : IOError: NotImplemented: Brotli codec support not built {code} It sounds similar to ARROW-6960. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (ARROW-8604) [R] Windows compilation failure
[ https://issues.apache.org/jira/browse/ARROW-8604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neal Richardson reassigned ARROW-8604: -- Assignee: Neal Richardson > [R] Windows compilation failure > --- > > Key: ARROW-8604 > URL: https://issues.apache.org/jira/browse/ARROW-8604 > Project: Apache Arrow > Issue Type: Bug > Components: R >Reporter: Francois Saint-Jacques >Assignee: Neal Richardson >Priority: Major > Fix For: 1.0.0 > > > [Master|[https://github.com/apache/arrow/runs/622393526]] fails to compile. > The C++ cmake build is not using the same > [compiler|[https://github.com/apache/arrow/runs/622393526#step:8:807]] than > the R extension > [compiler|[https://github.com/apache/arrow/runs/622393526#step:11:141]]. > {code:java} > // Files installed here > adding: arrow-0.17.0.9000/lib-4.9.3/i386/libarrow.a (deflated 85%) > adding: arrow-0.17.0.9000/lib-4.9.3/i386/libarrow_dataset.a (deflated 82%) > adding: arrow-0.17.0.9000/lib-4.9.3/i386/libparquet.a (deflated 84%) > adding: arrow-0.17.0.9000/lib-4.9.3/i386/libsnappy.a (deflated 61%) > adding: arrow-0.17.0.9000/lib-4.9.3/i386/libthrift.a (deflated 81%) > // Linker is using `-L` > C:/Rtools/mingw_32/bin/g++ -shared -s -static-libgcc -o arrow.dll tmp.def > array.o array_from_vector.o array_to_vector.o arraydata.o arrowExports.o > buffer.o chunkedarray.o compression.o compute.o csv.o dataset.o datatype.o > expression.o feather.o field.o filesystem.o io.o json.o memorypool.o > message.o parquet.o py-to-r.o recordbatch.o recordbatchreader.o > recordbatchwriter.o schema.o symbols.o table.o threadpool.o > -L../windows/arrow-0.17.0.9000/lib-8.3.0/i386 > -L../windows/arrow-0.17.0.9000/lib/i386 -lparquet -larrow_dataset -larrow > -lthrift -lsnappy -lz -lzstd -llz4 -lcrypto -lcrypt32 -lws2_32 > -LC:/R/bin/i386 -lR > C:/Rtools/mingw_32/bin/../lib/gcc/i686-w64-mingw32/4.9.3/../../../../i686-w64-mingw32/bin/ld.exe: > cannot find -lparquet > C:/Rtools/mingw_32/bin/../lib/gcc/i686-w64-mingw32/4.9.3/../../../../i686-w64-mingw32/bin/ld.exe: > cannot find -larrow_dataset > C:/Rtools/mingw_32/bin/../lib/gcc/i686-w64-mingw32/4.9.3/../../../../i686-w64-mingw32/bin/ld.exe: > cannot find -larrow > C:/Rtools/mingw_32/bin/../lib/gcc/i686-w64-mingw32/4.9.3/../../../../i686-w64-mingw32/bin/ld.exe: > cannot find -lthrift > C:/Rtools/mingw_32/bin/../lib/gcc/i686-w64-mingw32/4.9.3/../../../../i686-w64-mingw32/bin/ld.exe: > cannot find -lsnappy > {code} > > C++ developers, rejoice, this is almost the end of gcc-4.9. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8604) [R] Update CI to use R 4.0
[ https://issues.apache.org/jira/browse/ARROW-8604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neal Richardson updated ARROW-8604: --- Summary: [R] Update CI to use R 4.0 (was: [R] Windows compilation failure) > [R] Update CI to use R 4.0 > -- > > Key: ARROW-8604 > URL: https://issues.apache.org/jira/browse/ARROW-8604 > Project: Apache Arrow > Issue Type: Bug > Components: R >Reporter: Francois Saint-Jacques >Assignee: Neal Richardson >Priority: Major > Fix For: 1.0.0 > > > [Master|[https://github.com/apache/arrow/runs/622393526]] fails to compile. > The C++ cmake build is not using the same > [compiler|[https://github.com/apache/arrow/runs/622393526#step:8:807]] than > the R extension > [compiler|[https://github.com/apache/arrow/runs/622393526#step:11:141]]. > {code:java} > // Files installed here > adding: arrow-0.17.0.9000/lib-4.9.3/i386/libarrow.a (deflated 85%) > adding: arrow-0.17.0.9000/lib-4.9.3/i386/libarrow_dataset.a (deflated 82%) > adding: arrow-0.17.0.9000/lib-4.9.3/i386/libparquet.a (deflated 84%) > adding: arrow-0.17.0.9000/lib-4.9.3/i386/libsnappy.a (deflated 61%) > adding: arrow-0.17.0.9000/lib-4.9.3/i386/libthrift.a (deflated 81%) > // Linker is using `-L` > C:/Rtools/mingw_32/bin/g++ -shared -s -static-libgcc -o arrow.dll tmp.def > array.o array_from_vector.o array_to_vector.o arraydata.o arrowExports.o > buffer.o chunkedarray.o compression.o compute.o csv.o dataset.o datatype.o > expression.o feather.o field.o filesystem.o io.o json.o memorypool.o > message.o parquet.o py-to-r.o recordbatch.o recordbatchreader.o > recordbatchwriter.o schema.o symbols.o table.o threadpool.o > -L../windows/arrow-0.17.0.9000/lib-8.3.0/i386 > -L../windows/arrow-0.17.0.9000/lib/i386 -lparquet -larrow_dataset -larrow > -lthrift -lsnappy -lz -lzstd -llz4 -lcrypto -lcrypt32 -lws2_32 > -LC:/R/bin/i386 -lR > C:/Rtools/mingw_32/bin/../lib/gcc/i686-w64-mingw32/4.9.3/../../../../i686-w64-mingw32/bin/ld.exe: > cannot find -lparquet > C:/Rtools/mingw_32/bin/../lib/gcc/i686-w64-mingw32/4.9.3/../../../../i686-w64-mingw32/bin/ld.exe: > cannot find -larrow_dataset > C:/Rtools/mingw_32/bin/../lib/gcc/i686-w64-mingw32/4.9.3/../../../../i686-w64-mingw32/bin/ld.exe: > cannot find -larrow > C:/Rtools/mingw_32/bin/../lib/gcc/i686-w64-mingw32/4.9.3/../../../../i686-w64-mingw32/bin/ld.exe: > cannot find -lthrift > C:/Rtools/mingw_32/bin/../lib/gcc/i686-w64-mingw32/4.9.3/../../../../i686-w64-mingw32/bin/ld.exe: > cannot find -lsnappy > {code} > > C++ developers, rejoice, this is almost the end of gcc-4.9. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-7251) [Python] Open CSVs with different encodings
[ https://issues.apache.org/jira/browse/ARROW-7251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17093550#comment-17093550 ] Sascha Hofmann commented on ARROW-7251: --- Our current setup allows users to upload CSV files which we are parsing to arrow. Right now, we are not doing any the preprocessing of the csv file so we can receive arbitrary weird files. I will propose your "recode on the fly" suggestion. > [Python] Open CSVs with different encodings > --- > > Key: ARROW-7251 > URL: https://issues.apache.org/jira/browse/ARROW-7251 > Project: Apache Arrow > Issue Type: Wish > Components: Python >Reporter: Sascha Hofmann >Priority: Major > > I would like to open an UTF-16 encoded CSVs (among others) without > preprocessing in let's say Pandas. Is there maybe a way to do this already ? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8604) [R] Windows compilation failure
[ https://issues.apache.org/jira/browse/ARROW-8604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Francois Saint-Jacques updated ARROW-8604: -- Description: [Master|[https://github.com/apache/arrow/runs/622393526]] fails to compile. The C++ cmake build is not using the same [compiler|[https://github.com/apache/arrow/runs/622393526#step:8:807]] than the R extension [compiler|[https://github.com/apache/arrow/runs/622393526#step:11:141]]. {code:java} // Files installed here adding: arrow-0.17.0.9000/lib-4.9.3/i386/libarrow.a (deflated 85%) adding: arrow-0.17.0.9000/lib-4.9.3/i386/libarrow_dataset.a (deflated 82%) adding: arrow-0.17.0.9000/lib-4.9.3/i386/libparquet.a (deflated 84%) adding: arrow-0.17.0.9000/lib-4.9.3/i386/libsnappy.a (deflated 61%) adding: arrow-0.17.0.9000/lib-4.9.3/i386/libthrift.a (deflated 81%) // Linker is using `-L` C:/Rtools/mingw_32/bin/g++ -shared -s -static-libgcc -o arrow.dll tmp.def array.o array_from_vector.o array_to_vector.o arraydata.o arrowExports.o buffer.o chunkedarray.o compression.o compute.o csv.o dataset.o datatype.o expression.o feather.o field.o filesystem.o io.o json.o memorypool.o message.o parquet.o py-to-r.o recordbatch.o recordbatchreader.o recordbatchwriter.o schema.o symbols.o table.o threadpool.o -L../windows/arrow-0.17.0.9000/lib-8.3.0/i386 -L../windows/arrow-0.17.0.9000/lib/i386 -lparquet -larrow_dataset -larrow -lthrift -lsnappy -lz -lzstd -llz4 -lcrypto -lcrypt32 -lws2_32 -LC:/R/bin/i386 -lR C:/Rtools/mingw_32/bin/../lib/gcc/i686-w64-mingw32/4.9.3/../../../../i686-w64-mingw32/bin/ld.exe: cannot find -lparquet C:/Rtools/mingw_32/bin/../lib/gcc/i686-w64-mingw32/4.9.3/../../../../i686-w64-mingw32/bin/ld.exe: cannot find -larrow_dataset C:/Rtools/mingw_32/bin/../lib/gcc/i686-w64-mingw32/4.9.3/../../../../i686-w64-mingw32/bin/ld.exe: cannot find -larrow C:/Rtools/mingw_32/bin/../lib/gcc/i686-w64-mingw32/4.9.3/../../../../i686-w64-mingw32/bin/ld.exe: cannot find -lthrift C:/Rtools/mingw_32/bin/../lib/gcc/i686-w64-mingw32/4.9.3/../../../../i686-w64-mingw32/bin/ld.exe: cannot find -lsnappy {code} C++ developers, rejoice, this is almost the end of gcc-4.9. was:Master fails to compile. > [R] Windows compilation failure > --- > > Key: ARROW-8604 > URL: https://issues.apache.org/jira/browse/ARROW-8604 > Project: Apache Arrow > Issue Type: Bug > Components: R >Reporter: Francois Saint-Jacques >Priority: Major > Fix For: 1.0.0 > > > [Master|[https://github.com/apache/arrow/runs/622393526]] fails to compile. > The C++ cmake build is not using the same > [compiler|[https://github.com/apache/arrow/runs/622393526#step:8:807]] than > the R extension > [compiler|[https://github.com/apache/arrow/runs/622393526#step:11:141]]. > {code:java} > // Files installed here > adding: arrow-0.17.0.9000/lib-4.9.3/i386/libarrow.a (deflated 85%) > adding: arrow-0.17.0.9000/lib-4.9.3/i386/libarrow_dataset.a (deflated 82%) > adding: arrow-0.17.0.9000/lib-4.9.3/i386/libparquet.a (deflated 84%) > adding: arrow-0.17.0.9000/lib-4.9.3/i386/libsnappy.a (deflated 61%) > adding: arrow-0.17.0.9000/lib-4.9.3/i386/libthrift.a (deflated 81%) > // Linker is using `-L` > C:/Rtools/mingw_32/bin/g++ -shared -s -static-libgcc -o arrow.dll tmp.def > array.o array_from_vector.o array_to_vector.o arraydata.o arrowExports.o > buffer.o chunkedarray.o compression.o compute.o csv.o dataset.o datatype.o > expression.o feather.o field.o filesystem.o io.o json.o memorypool.o > message.o parquet.o py-to-r.o recordbatch.o recordbatchreader.o > recordbatchwriter.o schema.o symbols.o table.o threadpool.o > -L../windows/arrow-0.17.0.9000/lib-8.3.0/i386 > -L../windows/arrow-0.17.0.9000/lib/i386 -lparquet -larrow_dataset -larrow > -lthrift -lsnappy -lz -lzstd -llz4 -lcrypto -lcrypt32 -lws2_32 > -LC:/R/bin/i386 -lR > C:/Rtools/mingw_32/bin/../lib/gcc/i686-w64-mingw32/4.9.3/../../../../i686-w64-mingw32/bin/ld.exe: > cannot find -lparquet > C:/Rtools/mingw_32/bin/../lib/gcc/i686-w64-mingw32/4.9.3/../../../../i686-w64-mingw32/bin/ld.exe: > cannot find -larrow_dataset > C:/Rtools/mingw_32/bin/../lib/gcc/i686-w64-mingw32/4.9.3/../../../../i686-w64-mingw32/bin/ld.exe: > cannot find -larrow > C:/Rtools/mingw_32/bin/../lib/gcc/i686-w64-mingw32/4.9.3/../../../../i686-w64-mingw32/bin/ld.exe: > cannot find -lthrift > C:/Rtools/mingw_32/bin/../lib/gcc/i686-w64-mingw32/4.9.3/../../../../i686-w64-mingw32/bin/ld.exe: > cannot find -lsnappy > {code} > > C++ developers, rejoice, this is almost the end of gcc-4.9. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8604) [R] Windows compilation failure
[ https://issues.apache.org/jira/browse/ARROW-8604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Francois Saint-Jacques updated ARROW-8604: -- Description: Master fails to compile. > [R] Windows compilation failure > --- > > Key: ARROW-8604 > URL: https://issues.apache.org/jira/browse/ARROW-8604 > Project: Apache Arrow > Issue Type: Bug > Components: R >Reporter: Francois Saint-Jacques >Priority: Major > Fix For: 1.0.0 > > > Master fails to compile. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8604) [R] Windows compilation failure
Francois Saint-Jacques created ARROW-8604: - Summary: [R] Windows compilation failure Key: ARROW-8604 URL: https://issues.apache.org/jira/browse/ARROW-8604 Project: Apache Arrow Issue Type: Bug Components: R Reporter: Francois Saint-Jacques Fix For: 1.0.0 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8603) [Documentation] Fix Sphinx doxygen comment
[ https://issues.apache.org/jira/browse/ARROW-8603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-8603: -- Labels: pull-request-available (was: ) > [Documentation] Fix Sphinx doxygen comment > -- > > Key: ARROW-8603 > URL: https://issues.apache.org/jira/browse/ARROW-8603 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Documentation >Reporter: Francois Saint-Jacques >Priority: Trivial > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > See [https://github.com/apache/arrow/runs/622393532] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8603) [Documentation] Fix Sphinx doxygen comment
Francois Saint-Jacques created ARROW-8603: - Summary: [Documentation] Fix Sphinx doxygen comment Key: ARROW-8603 URL: https://issues.apache.org/jira/browse/ARROW-8603 Project: Apache Arrow Issue Type: Bug Components: C++, Documentation Reporter: Francois Saint-Jacques See [https://github.com/apache/arrow/runs/622393532] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8602) [CMake] Fix ws2_32 link issue when cross-compiling on Linux
[ https://issues.apache.org/jira/browse/ARROW-8602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-8602: -- Labels: pull-request-available (was: ) > [CMake] Fix ws2_32 link issue when cross-compiling on Linux > --- > > Key: ARROW-8602 > URL: https://issues.apache.org/jira/browse/ARROW-8602 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Francois Saint-Jacques >Priority: Trivial > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-8602) [CMake] Fix ws2_32 link issue when cross-compiling on Linux
[ https://issues.apache.org/jira/browse/ARROW-8602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Francois Saint-Jacques resolved ARROW-8602. --- Fix Version/s: 1.0.0 Resolution: Fixed Issue resolved by pull request 7001 [https://github.com/apache/arrow/pull/7001] > [CMake] Fix ws2_32 link issue when cross-compiling on Linux > --- > > Key: ARROW-8602 > URL: https://issues.apache.org/jira/browse/ARROW-8602 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Francois Saint-Jacques >Priority: Trivial > Fix For: 1.0.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8602) [CMake] Fix ws2_32 link issue when cross-compiling on Linux
Francois Saint-Jacques created ARROW-8602: - Summary: [CMake] Fix ws2_32 link issue when cross-compiling on Linux Key: ARROW-8602 URL: https://issues.apache.org/jira/browse/ARROW-8602 Project: Apache Arrow Issue Type: Bug Components: C++ Reporter: Francois Saint-Jacques -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (ARROW-7251) [Python] Open CSVs with different encodings
[ https://issues.apache.org/jira/browse/ARROW-7251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17093465#comment-17093465 ] Antoine Pitrou edited comment on ARROW-7251 at 4/27/20, 12:45 PM: -- cc [~saschahofmann] Is there anything that prevents you from recoding the CSV file before opening it with Arrow? (what are your constraints? performance? file size?) With some care, you could even implement a file-like object in Python that recodes data to UTF-8 on the fly. It should be accepted by {{csv.read_csv}}. was (Author: pitrou): cc [~saschahofmann] Is there anything that prevents you from recoding the CSV file before opening it with Arrow? (what are you constraints? performance? file size?) With some care, you could even implement a file-like object in Python that recodes data to UTF-8 on the fly. It should be accepted by {{csv.read_csv}}. > [Python] Open CSVs with different encodings > --- > > Key: ARROW-7251 > URL: https://issues.apache.org/jira/browse/ARROW-7251 > Project: Apache Arrow > Issue Type: Wish > Components: Python >Reporter: Sascha Hofmann >Priority: Major > > I would like to open an UTF-16 encoded CSVs (among others) without > preprocessing in let's say Pandas. Is there maybe a way to do this already ? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-7251) [Python] Open CSVs with different encodings
[ https://issues.apache.org/jira/browse/ARROW-7251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17093465#comment-17093465 ] Antoine Pitrou commented on ARROW-7251: --- cc [~saschahofmann] Is there anything that prevents you from recoding the CSV file before opening it with Arrow? (what are you constraints? performance? file size?) With some care, you could even implement a file-like object in Python that recodes data to UTF-8 on the fly. It should be accepted by {{csv.read_csv}}. > [Python] Open CSVs with different encodings > --- > > Key: ARROW-7251 > URL: https://issues.apache.org/jira/browse/ARROW-7251 > Project: Apache Arrow > Issue Type: Wish > Components: Python >Reporter: Sascha Hofmann >Priority: Major > > I would like to open an UTF-16 encoded CSVs (among others) without > preprocessing in let's say Pandas. Is there maybe a way to do this already ? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8601) [Go][Flight] Implement Flight Writer interface
[ https://issues.apache.org/jira/browse/ARROW-8601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-8601: -- Labels: pull-request-available (was: ) > [Go][Flight] Implement Flight Writer interface > -- > > Key: ARROW-8601 > URL: https://issues.apache.org/jira/browse/ARROW-8601 > Project: Apache Arrow > Issue Type: Improvement > Components: FlightRPC, Go >Reporter: Francois Saint-Jacques >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8601) [Go][Flight] Implement Flight Writer interface
Francois Saint-Jacques created ARROW-8601: - Summary: [Go][Flight] Implement Flight Writer interface Key: ARROW-8601 URL: https://issues.apache.org/jira/browse/ARROW-8601 Project: Apache Arrow Issue Type: Improvement Components: FlightRPC, Go Reporter: Francois Saint-Jacques -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-7251) [Python] Open CSVs with different encodings
[ https://issues.apache.org/jira/browse/ARROW-7251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17093435#comment-17093435 ] Sascha Hofmann commented on ARROW-7251: --- For us having different string encoding support would be amazing. That being said, I admit other encodings are rare/dying out but we stumble upon them once in a while. From those, I don't know how many are using a BOM to identify their encoding. We haven't actually tried it but we might use pandas as mentioned above in cases where a file has a BOM different than the utf-8 (see comment above). I am not sure how you did the csv reading in pandas but I assume it might not be worth going through it again. In the end, it might be best to force people using UTF-8. > [Python] Open CSVs with different encodings > --- > > Key: ARROW-7251 > URL: https://issues.apache.org/jira/browse/ARROW-7251 > Project: Apache Arrow > Issue Type: Wish > Components: Python >Reporter: Sascha Hofmann >Priority: Major > > I would like to open an UTF-16 encoded CSVs (among others) without > preprocessing in let's say Pandas. Is there maybe a way to do this already ? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-8152) [C++] IO: split large coalesced reads into smaller ones
[ https://issues.apache.org/jira/browse/ARROW-8152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17093429#comment-17093429 ] David Li commented on ARROW-8152: - FWIW, I'm not sure this specific task is needed anymore - I originally didn't realize the Parquet reader issued individual reads for each column chunk. Splitting large reads hence isn't needed. It may help for people who have very large column chunks, but that can be pursued separately if it comes up. > [C++] IO: split large coalesced reads into smaller ones > --- > > Key: ARROW-8152 > URL: https://issues.apache.org/jira/browse/ARROW-8152 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: David Li >Priority: Major > Fix For: 1.0.0 > > > We have a facility to coalesce small reads, but remote filesystems may also > benefit from splitting large reads to take advantage of concurrency. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-8565) [C++] Static build with AWS SDK
[ https://issues.apache.org/jira/browse/ARROW-8565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17093421#comment-17093421 ] Francois Saint-Jacques commented on ARROW-8565: --- I'm not sure if you're aware, but the AWS SDK supports selectively building components in the bundled library. The following should make a smaller build that supports S3. {code:java} cmake -GNinja -DCMAKE_BUILD_TYPE=Release -DBUILD_ONLY="s3;core;config;transfer" {code} > [C++] Static build with AWS SDK > --- > > Key: ARROW-8565 > URL: https://issues.apache.org/jira/browse/ARROW-8565 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Affects Versions: 0.17.0 >Reporter: Remi Dettai >Priority: Major > Labels: aws-s3, build-problem > > I can't find my way around the build system when using the S3 client. > It seems that only shared target is allowed when the S3 feature is ON. In the > thirdparty toolchain, when printing: > ??FATAL_ERROR "FIXME: Building AWS C++ SDK from source will link with wrong > libcrypto"?? > What is actually meant is that static build will not work, correct ? If it is > the case, should libarrow.a be generated at all when S3 feature is on ? > What can be done to fix this ? What does it mean that the SDK links to the > wrong libcrypto ? Is it fixable ? Or is their a way to have the static build > but maintain a dynamic link to a shared version of the SDK ? > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (ARROW-8586) [R] installation failure on CentOS 7
[ https://issues.apache.org/jira/browse/ARROW-8586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17092950#comment-17092950 ] Hei edited comment on ARROW-8586 at 4/27/20, 11:10 AM: --- Hi Neal, I tried out your suggestion by setting LIBARROW_BINARY=centos-7 and then ran install.packages("arrow") to reinstall. Then I tried: {code} > library(arrow) Attaching package: ‘arrow’ The following object is masked from ‘package:utils’: timestamp > df <- read_parquet('/home/hc/my.10.level.20200331.2book.parquet') {code} And then RStudio's session crashed with a popup saying, "R Session Abort. R encountered a fatal error. The session was terminated". There is no extra info in the console even I set ARROW_R_DEV=true. Restarting the session doesn't help -- same error popup when loading the parquet file. Restarting rstudio client doesn't help neither. Here is my RStudio's version: {code} > RStudio.Version() $citation To cite RStudio in publications use: RStudio Team (2020). RStudio: Integrated Development for R. RStudio, Inc., Boston, MA URL http://www.rstudio.com/. A BibTeX entry for LaTeX users is @Manual{, title = {RStudio: Integrated Development Environment for R}, author = {{RStudio Team}}, organization = {RStudio, Inc.}, address = {Boston, MA}, year = {2020}, url = {http://www.rstudio.com/}, } $mode [1] "desktop" $version [1] ‘1.2.5042’ $release_name [1] "Double Marigold" {code} I tried to load the same parquet file with python 3.6 to construct pandas dataframe, it works fine. Any idea? was (Author: hei): Hi Neal, I tried out your suggestion by setting LIBARROW_BINARY=centos-7 and then ran install.packages("arrow") to reinstall. Then I tried: {code} > library(arrow) Attaching package: ‘arrow’ The following object is masked from ‘package:utils’: timestamp > df <- read_parquet('/home/hc/my.10.level.20200331.2book.parquet') {code} And then RStudio's session crashed with a popup saying, "R Session Abort. R encountered a fatal error. The session was terminated". Restarting the session doesn't help -- same error popup when loading the parquet file. Restarting rstudio client doesn't help neither. Here is my RStudio's version: {code} > RStudio.Version() $citation To cite RStudio in publications use: RStudio Team (2020). RStudio: Integrated Development for R. RStudio, Inc., Boston, MA URL http://www.rstudio.com/. A BibTeX entry for LaTeX users is @Manual{, title = {RStudio: Integrated Development Environment for R}, author = {{RStudio Team}}, organization = {RStudio, Inc.}, address = {Boston, MA}, year = {2020}, url = {http://www.rstudio.com/}, } $mode [1] "desktop" $version [1] ‘1.2.5042’ $release_name [1] "Double Marigold" {code} I tried to load the same parquet file with python 3.6 to construct pandas dataframe, it works fine. Any idea? > [R] installation failure on CentOS 7 > > > Key: ARROW-8586 > URL: https://issues.apache.org/jira/browse/ARROW-8586 > Project: Apache Arrow > Issue Type: Bug > Components: R >Affects Versions: 0.17.0 > Environment: CentOS 7 >Reporter: Hei >Priority: Major > > Hi, > I am trying to install arrow via RStudio, but it seems like it is not working > that after I installed the package, it kept asking me to run > arrow::install_arrow() even after I did: > {code} > > install.packages("arrow") > Installing package into ‘/home/hc/R/x86_64-redhat-linux-gnu-library/3.6’ > (as ‘lib’ is unspecified) > trying URL 'https://cran.rstudio.com/src/contrib/arrow_0.17.0.tar.gz' > Content type 'application/x-gzip' length 242534 bytes (236 KB) > == > downloaded 236 KB > * installing *source* package ‘arrow’ ... > ** package ‘arrow’ successfully unpacked and MD5 sums checked > ** using staged installation > *** Successfully retrieved C++ source > *** Building C++ libraries > cmake > arrow > ./configure: line 132: cd: libarrow/arrow-0.17.0/lib: Not a directory > - NOTE --- > After installation, please run arrow::install_arrow() > for help installing required runtime libraries > - > ** libs > g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG > -I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" > -I/usr/local/include -fpic -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 > -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 > -grecord-gcc-switches -m64 -mtune=generic -c array.cpp -o array.o > g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG > -I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" > -I/usr/local/include -fpic -O2 -g -pipe -Wall -Wp,-D_FO
[jira] [Commented] (ARROW-8565) [C++] Static build with AWS SDK
[ https://issues.apache.org/jira/browse/ARROW-8565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17093185#comment-17093185 ] Remi Dettai commented on ARROW-8565: I finally managed to make a static build but it requires changing the root CMakeLists to append AWSSDK_LINK_LIBRARIES to ARROW_STATIC_LINK_LIBS. I'm not sure this is safe to do because I really don't understand how the Arrow and AWS SDK dependencies are competing. In the end the static build is barely more compact than the shared one, because the C++ AWS SDK build is a whole adventure. I'm continuing my investigation to see if I can come up with something nicer, but I really lack some cmake expertise :) > [C++] Static build with AWS SDK > --- > > Key: ARROW-8565 > URL: https://issues.apache.org/jira/browse/ARROW-8565 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Affects Versions: 0.17.0 >Reporter: Remi Dettai >Priority: Major > Labels: aws-s3, build-problem > > I can't find my way around the build system when using the S3 client. > It seems that only shared target is allowed when the S3 feature is ON. In the > thirdparty toolchain, when printing: > ??FATAL_ERROR "FIXME: Building AWS C++ SDK from source will link with wrong > libcrypto"?? > What is actually meant is that static build will not work, correct ? If it is > the case, should libarrow.a be generated at all when S3 feature is on ? > What can be done to fix this ? What does it mean that the SDK links to the > wrong libcrypto ? Is it fixable ? Or is their a way to have the static build > but maintain a dynamic link to a shared version of the SDK ? > > -- This message was sent by Atlassian Jira (v8.3.4#803005)