[jira] [Created] (ARROW-11463) Allow configuration of IpcWriterOptions 64Bit from PyArrow
Leonard Lausen created ARROW-11463: -- Summary: Allow configuration of IpcWriterOptions 64Bit from PyArrow Key: ARROW-11463 URL: https://issues.apache.org/jira/browse/ARROW-11463 Project: Apache Arrow Issue Type: Task Components: Python Reporter: Leonard Lausen For tables with many chunks (2M+ rows, 20k+ chunks), `pyarrow.Table.take` will be around 1000x slower compared to the `pyarrow.Table.take` on the table with combined chunks (1 chunk). Unfortunately, if such table contains large list data type, it's easy for the flattened table to contain more than 2**31 rows and serialization (eg for Plasma store) will fail due to `pyarrow.lib.ArrowCapacityError: Cannot write arrays larger than 2^31 - 1 in length` I couldn't find a way to enable 64bit support for the serialization as called from Python (IpcWriteOptions in Python does not expose the CIpcWriteOptions 64 bit setting; further the Python serialization APIs do not allow specification of IpcWriteOptions) I was able to serialize successfully after changing the default and rebuilding ``` modified cpp/src/arrow/ipc/options.h @@ -42,7 +42,7 @@ struct ARROW_EXPORT IpcWriteOptions { /// \brief If true, allow field lengths that don't fit in a signed 32-bit int. /// /// Some implementations may not be able to parse streams created with this option. - bool allow_64bit = false; + bool allow_64bit = true; /// \brief The maximum permitted schema nesting depth. int max_recursion_depth = kMaxNestingDepth; ``` -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11380) Plasma packages for arm64
Leonard Lausen created ARROW-11380: -- Summary: Plasma packages for arm64 Key: ARROW-11380 URL: https://issues.apache.org/jira/browse/ARROW-11380 Project: Apache Arrow Issue Type: Improvement Reporter: Leonard Lausen "Note that Plasma packages are available only for amd64. Because nvidia-cuda-toolkit package isn't available for arm64." https://issues.apache.org/jira/browse/ARROW-6715 Nvidia supports Cuda on ARM, so this should be possible in principle? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10868) pip install --user fails to install lib
Leonard Lausen created ARROW-10868: -- Summary: pip install --user fails to install lib Key: ARROW-10868 URL: https://issues.apache.org/jira/browse/ARROW-10868 Project: Apache Arrow Issue Type: Task Components: Python Reporter: Leonard Lausen Compiling and installing C++ library via: ``` cd ~/src/pyarrow/cpp mkdir build cd build CC=clang-11 CXX=clang++-11 cmake -GNinja -DARROW_PYTHON=ON .. ninja sudo ninja install ``` Then installing python package as follows will claim to succeed, but actually fail to provide `pyarrow.lib` (`python3 -c 'import pyarrow.lib'` will fail) ``` cd ~/src/pyarrow/python pip install --user . ``` -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10867) build failure on aarch64 with -DARROW_PYTHON=ON and gcc
Leonard Lausen created ARROW-10867: -- Summary: build failure on aarch64 with -DARROW_PYTHON=ON and gcc Key: ARROW-10867 URL: https://issues.apache.org/jira/browse/ARROW-10867 Project: Apache Arrow Issue Type: Task Reporter: Leonard Lausen Attachments: arrow Arrow will trigger compiler errors in (at least) gcc7, gcc8 and gcc9 on aarch64 on a https://aws.amazon.com/ec2/instance-types/c6/ instance. Compiling with clang-11 works fine. ``` ../src/arrow/compute/kernels/scalar_cast_nested.cc: In function ‘void arrow::compute::internal::CastListExec(arrow::compute::KernelContext*, const arrow ::compute::ExecBatch&, arrow::Datum*) [with Type = arrow::LargeListType]’: ../src/arrow/compute/kernels/scalar_cast_nested.cc:33:6: internal compiler error: Segmentation fault void CastListExec(KernelContext* ctx, const ExecBatch& batch, Datum* out) { ^~~~ Please submit a full bug report, with preprocessed source if appropriate. See for instructions. ``` Full build log attached. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10866) manylinux aarch64 wheel
Leonard Lausen created ARROW-10866: -- Summary: manylinux aarch64 wheel Key: ARROW-10866 URL: https://issues.apache.org/jira/browse/ARROW-10866 Project: Apache Arrow Issue Type: Task Components: Python Reporter: Leonard Lausen Please provide a aarch64 wheel on https://pypi.org/project/pyarrow/#files -- This message was sent by Atlassian Jira (v8.3.4#803005)