[jira] [Created] (ARROW-14065) FixedSizeBinaryBuilder behaves incorrectly since v5.0 with "Resize+Advance" operation
Tao He created ARROW-14065: -- Summary: FixedSizeBinaryBuilder behaves incorrectly since v5.0 with "Resize+Advance" operation Key: ARROW-14065 URL: https://issues.apache.org/jira/browse/ARROW-14065 Project: Apache Arrow Issue Type: Bug Components: C++ Affects Versions: 5.0.0 Reporter: Tao He With the following code, we first "Resize" a builder, then fill the content, and finally use "Advance" to move the pointer to the end, ```cpp #include #include #include "arrow/array/array_binary.h" #include "arrow/array/builder_binary.h" #include "arrow/status.h" #include "arrow/util/config.h" int main(int argc, char** argv) { struct S { int64_t a; double b; }; arrow::FixedSizeBinaryBuilder b1(arrow::fixed_size_binary(sizeof(S))); arrow::FixedSizeBinaryBuilder b4(arrow::fixed_size_binary(sizeof(S))); b4.Resize(10); // ... fill the array data in random-access fashion ... b4.Advance(10); std::shared_ptr a4; b4.Finish(); std::cout << "array length: " << a4->length() << std::endl; std::cout << "buffer size: " << a4->values()->size() << std::endl; return 0; } ``` The output is 10 and 160 with arrow 4.0 (which is desired behavior) however arrow 5.0 yields 10 and 0, which means the length of array is not 0 but the underlying buffer is a null pointer. The same error doesn't happen to other types, e.g., IntBuilders. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-12532) [Release] Prebuilt artifacts (e.g., wheel on pypi) for release 4.0.0.
Tao He created ARROW-12532: -- Summary: [Release] Prebuilt artifacts (e.g., wheel on pypi) for release 4.0.0. Key: ARROW-12532 URL: https://issues.apache.org/jira/browse/ARROW-12532 Project: Apache Arrow Issue Type: Task Components: Developer Tools Affects Versions: 4.0.0 Reporter: Tao He It looks that there's a v4.0.0 release tag on Github. However the prebuilt artifacts hasn't been upload yet. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11836) Target libarrow_bundled_dependencies.a is not alreay created but is already required.
Tao He created ARROW-11836: -- Summary: Target libarrow_bundled_dependencies.a is not alreay created but is already required. Key: ARROW-11836 URL: https://issues.apache.org/jira/browse/ARROW-11836 Project: Apache Arrow Issue Type: Bug Components: C++ Affects Versions: 3.0.0 Reporter: Tao He When ``-DARROW_BUILD_STATIC=ON``, all build dependencies built as static libraries by the Arrow build system will be merged together to create a static library ``arrow_bundled_dependencies``. But that is only true when there are indeed some dependencies, i.e., when ``ARROW_BUNDLED_STATIC_LIBS`` is not empty [1]. It could be empty when we just enable some of features when building arrow (e.g., just the arrow core). However the target is unconditionally required by the target ``arrow_static`` [2]. That makes the staticly-built arrow libs cannot be used with cmake. [1]: [https://github.com/apache/arrow/blob/master/cpp/src/arrow/CMakeLists.txt#L523] [2]: https://github.com/apache/arrow/blob/master/cpp/src/arrow/ArrowConfig.cmake.in#L74 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10956) Python 3.9 support
Tao He created ARROW-10956: -- Summary: Python 3.9 support Key: ARROW-10956 URL: https://issues.apache.org/jira/browse/ARROW-10956 Project: Apache Arrow Issue Type: New Feature Components: Python Affects Versions: 2.0.0, 1.0.1, 1.0.0 Reporter: Tao He Python 3.9 has been officially release at Oct. 5, 2020. Is there any plan to publish python 3.9 wheels on pypi? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10617) RecordBatchStreamReader's iterator doesn't work with python 3.8
Tao He created ARROW-10617: -- Summary: RecordBatchStreamReader's iterator doesn't work with python 3.8 Key: ARROW-10617 URL: https://issues.apache.org/jira/browse/ARROW-10617 Project: Apache Arrow Issue Type: Bug Components: Python Affects Versions: 1.0.1 Reporter: Tao He The following example code doesn't work with python 3.8: ```python import pyarrow as pa data = [ pa.array([1, 2, 3, 4]), pa.array(['foo', 'bar', 'baz', None]), pa.array([True, None, False, True]) ] batch = pa.record_batch(data, names=['f0', 'f1', 'f2']) sink = pa.BufferOutputStream() writer = pa.ipc.new_stream(sink, batch.schema) for i in range(5): writer.write_batch(batch) writer.close() buf = sink.getvalue() reader = pa.ipc.open_stream(buf) [i for i in reader] ``` -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10599) Prebuilt distributions (aka. pyarrow and libarrow-dev) should use the same ABI (with or without the DUAL abi)
Tao He created ARROW-10599: -- Summary: Prebuilt distributions (aka. pyarrow and libarrow-dev) should use the same ABI (with or without the DUAL abi) Key: ARROW-10599 URL: https://issues.apache.org/jira/browse/ARROW-10599 Project: Apache Arrow Issue Type: New Feature Components: C++, Python Affects Versions: 2.0.0, 1.0.1, 0.17.0 Reporter: Tao He I have observed that the python release (pyarrow) and c++ release (libarrow-dev for ubuntu) are built using the different GCC ABI. The former, pyarrow, builtin within the manylinux1 environment, using gcc-4.8, however the later's ABI has a `[cxx11]` tag. That blocks users to develop python C extensions that depends on libarrow-dev. For example, we have developed `lib` A in C++, which use arrow's `Arrow::Buffer` from libarrow-dev, and wrap it using things like `pybind11` to a python module `liba`. After building the `liba` on commodity Ubuntu (which could install libarrow-dev with apt-get), the user import both `liba` and `pyarrow` to the python's script, it won't work correctly due to the ABI confliction (especially when it comes to the string cases). I can see two options to make it works: 1. build arrow's python package using static link, that the pyarrow won't contains so many shared libraries (libarrow.so, libarrow_python.so, etc.) 2. distribute `libarrow-dev` with `-D_GLIBCXX_USE_CXX11_ABI=0` I'm also wondering if there's any technical issues that not distributing packages in different languages with the same ABI. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-6054) pyarrow.serialize should respect the value of structured dtype of numpy
Tao He created ARROW-6054: - Summary: pyarrow.serialize should respect the value of structured dtype of numpy Key: ARROW-6054 URL: https://issues.apache.org/jira/browse/ARROW-6054 Project: Apache Arrow Issue Type: Bug Components: Python Affects Versions: 0.14.1 Reporter: Tao He Assignee: Tao He -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (ARROW-2455) The bytes_allocated_ in CudaContextImpl isn't initialized
Tao He created ARROW-2455: - Summary: The bytes_allocated_ in CudaContextImpl isn't initialized Key: ARROW-2455 URL: https://issues.apache.org/jira/browse/ARROW-2455 Project: Apache Arrow Issue Type: Bug Components: GPU Reporter: Tao He The atomic counter `bytes_allocated_` in `CudaContextImpl` isn't initialized, leading to failure of cuda-test on windows. -- This message was sent by Atlassian JIRA (v7.6.3#76005)