[jira] [Created] (ARROW-11463) Allow configuration of IpcWriterOptions 64Bit from PyArrow

2021-02-01 Thread Leonard Lausen (Jira)
Leonard Lausen created ARROW-11463:
--

 Summary: Allow configuration of IpcWriterOptions 64Bit from PyArrow
 Key: ARROW-11463
 URL: https://issues.apache.org/jira/browse/ARROW-11463
 Project: Apache Arrow
  Issue Type: Task
  Components: Python
Reporter: Leonard Lausen


For tables with many chunks (2M+ rows, 20k+ chunks), `pyarrow.Table.take` will 
be around 1000x slower compared to the `pyarrow.Table.take` on the table with 
combined chunks (1 chunk). Unfortunately, if such table contains large list 
data type, it's easy for the flattened table to contain more than 2**31 rows 
and serialization (eg for Plasma store) will fail due to 
`pyarrow.lib.ArrowCapacityError: Cannot write arrays larger than 2^31 - 1 in 
length`

I couldn't find a way to enable 64bit support for the serialization as called 
from Python (IpcWriteOptions in Python does not expose the CIpcWriteOptions 64 
bit setting; further the Python serialization APIs do not allow specification 
of IpcWriteOptions)

I was able to serialize successfully after changing the default and rebuilding

```
modified   cpp/src/arrow/ipc/options.h
@@ -42,7 +42,7 @@ struct ARROW_EXPORT IpcWriteOptions {
   /// \brief If true, allow field lengths that don't fit in a signed 32-bit 
int.
   ///
   /// Some implementations may not be able to parse streams created with this 
option.
-  bool allow_64bit = false;
+  bool allow_64bit = true;
 
   /// \brief The maximum permitted schema nesting depth.
   int max_recursion_depth = kMaxNestingDepth;
```



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11380) Plasma packages for arm64

2021-01-25 Thread Leonard Lausen (Jira)
Leonard Lausen created ARROW-11380:
--

 Summary: Plasma packages for arm64
 Key: ARROW-11380
 URL: https://issues.apache.org/jira/browse/ARROW-11380
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Leonard Lausen


"Note that Plasma packages are available only for amd64. Because 
nvidia-cuda-toolkit package isn't available for arm64." 
https://issues.apache.org/jira/browse/ARROW-6715

Nvidia supports Cuda on ARM, so this should be possible in principle?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10868) pip install --user fails to install lib

2020-12-09 Thread Leonard Lausen (Jira)
Leonard Lausen created ARROW-10868:
--

 Summary: pip install --user fails to install lib
 Key: ARROW-10868
 URL: https://issues.apache.org/jira/browse/ARROW-10868
 Project: Apache Arrow
  Issue Type: Task
  Components: Python
Reporter: Leonard Lausen


Compiling and installing C++ library via:

```
cd ~/src/pyarrow/cpp
mkdir build
cd build
CC=clang-11 CXX=clang++-11 cmake -GNinja -DARROW_PYTHON=ON ..
ninja
sudo ninja install
```

Then installing python package as follows will claim to succeed, but actually 
fail to provide `pyarrow.lib` (`python3 -c 'import pyarrow.lib'` will fail)

```
cd ~/src/pyarrow/python
pip install --user .
```




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10867) build failure on aarch64 with -DARROW_PYTHON=ON and gcc

2020-12-09 Thread Leonard Lausen (Jira)
Leonard Lausen created ARROW-10867:
--

 Summary: build failure on aarch64 with -DARROW_PYTHON=ON and gcc
 Key: ARROW-10867
 URL: https://issues.apache.org/jira/browse/ARROW-10867
 Project: Apache Arrow
  Issue Type: Task
Reporter: Leonard Lausen
 Attachments: arrow

Arrow will trigger compiler errors in (at least) gcc7, gcc8 and gcc9 on aarch64 
on a https://aws.amazon.com/ec2/instance-types/c6/ instance.
Compiling with clang-11 works fine.

```
../src/arrow/compute/kernels/scalar_cast_nested.cc: In function ‘void 
arrow::compute::internal::CastListExec(arrow::compute::KernelContext*, const 
arrow
::compute::ExecBatch&, arrow::Datum*) [with Type = arrow::LargeListType]’:  

../src/arrow/compute/kernels/scalar_cast_nested.cc:33:6: internal compiler 
error: Segmentation fault
 void CastListExec(KernelContext* ctx, const ExecBatch& batch, Datum* out) {
  ^~~~
Please submit a full bug report,
with preprocessed source if appropriate.
See 
 for instructions.
```

Full build log attached.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10866) manylinux aarch64 wheel

2020-12-09 Thread Leonard Lausen (Jira)
Leonard Lausen created ARROW-10866:
--

 Summary: manylinux aarch64 wheel
 Key: ARROW-10866
 URL: https://issues.apache.org/jira/browse/ARROW-10866
 Project: Apache Arrow
  Issue Type: Task
  Components: Python
Reporter: Leonard Lausen


Please provide a aarch64 wheel on https://pypi.org/project/pyarrow/#files



--
This message was sent by Atlassian Jira
(v8.3.4#803005)