[jira] [Created] (ARROW-14065) FixedSizeBinaryBuilder behaves incorrectly since v5.0 with "Resize+Advance" operation

2021-09-22 Thread Tao He (Jira)
Tao He created ARROW-14065:
--

 Summary: FixedSizeBinaryBuilder behaves incorrectly since v5.0 
with "Resize+Advance" operation
 Key: ARROW-14065
 URL: https://issues.apache.org/jira/browse/ARROW-14065
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Affects Versions: 5.0.0
Reporter: Tao He


With the following code, we first "Resize" a builder, then fill the content, 
and finally use "Advance" to move the pointer to the end,

 ```cpp
#include 
#include 

#include "arrow/array/array_binary.h"
#include "arrow/array/builder_binary.h"
#include "arrow/status.h"
#include "arrow/util/config.h"

int main(int argc, char** argv) {
struct S {
  int64_t a;
  double b;
};
arrow::FixedSizeBinaryBuilder b1(arrow::fixed_size_binary(sizeof(S)));

arrow::FixedSizeBinaryBuilder b4(arrow::fixed_size_binary(sizeof(S)));
b4.Resize(10);

// ... fill the array data in random-access fashion ...

b4.Advance(10);

std::shared_ptr a4;
b4.Finish();

std::cout << "array length: " << a4->length() << std::endl;
std::cout << "buffer size: " << a4->values()->size() << std::endl;

return 0;
}
```

The output is 10 and 160 with arrow 4.0 (which is desired behavior) however 
arrow 5.0 yields 10 and 0, which means the length of array is not 0 but the 
underlying buffer is a null pointer.

The same error doesn't happen to other types, e.g., IntBuilders.





--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-12532) [Release] Prebuilt artifacts (e.g., wheel on pypi) for release 4.0.0.

2021-04-24 Thread Tao He (Jira)
Tao He created ARROW-12532:
--

 Summary: [Release] Prebuilt artifacts (e.g., wheel on pypi) for 
release 4.0.0.
 Key: ARROW-12532
 URL: https://issues.apache.org/jira/browse/ARROW-12532
 Project: Apache Arrow
  Issue Type: Task
  Components: Developer Tools
Affects Versions: 4.0.0
Reporter: Tao He


It looks that there's a v4.0.0 release tag on Github. However the prebuilt 
artifacts hasn't been upload yet.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11836) Target libarrow_bundled_dependencies.a is not alreay created but is already required.

2021-03-02 Thread Tao He (Jira)
Tao He created ARROW-11836:
--

 Summary: Target libarrow_bundled_dependencies.a is not alreay 
created but is already required.
 Key: ARROW-11836
 URL: https://issues.apache.org/jira/browse/ARROW-11836
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Affects Versions: 3.0.0
Reporter: Tao He


When ``-DARROW_BUILD_STATIC=ON``, all build dependencies built as static
libraries by the Arrow build system will be merged together to create a static
library ``arrow_bundled_dependencies``.
 
But that is only true when there are indeed some dependencies, i.e., when 
``ARROW_BUNDLED_STATIC_LIBS`` is not empty [1]. It could be empty when we just 
enable some of features when building arrow (e.g., just the arrow core).
 
However the target is unconditionally required by the target ``arrow_static`` 
[2]. That makes the staticly-built arrow libs cannot be used with cmake.
 
[1]: 
[https://github.com/apache/arrow/blob/master/cpp/src/arrow/CMakeLists.txt#L523]
[2]: 
https://github.com/apache/arrow/blob/master/cpp/src/arrow/ArrowConfig.cmake.in#L74



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10956) Python 3.9 support

2020-12-17 Thread Tao He (Jira)
Tao He created ARROW-10956:
--

 Summary: Python 3.9 support
 Key: ARROW-10956
 URL: https://issues.apache.org/jira/browse/ARROW-10956
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Python
Affects Versions: 2.0.0, 1.0.1, 1.0.0
Reporter: Tao He


Python 3.9 has been officially release at Oct. 5, 2020. Is there any plan to 
publish python 3.9 wheels on pypi?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10617) RecordBatchStreamReader's iterator doesn't work with python 3.8

2020-11-16 Thread Tao He (Jira)
Tao He created ARROW-10617:
--

 Summary: RecordBatchStreamReader's iterator doesn't work with 
python 3.8
 Key: ARROW-10617
 URL: https://issues.apache.org/jira/browse/ARROW-10617
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Affects Versions: 1.0.1
Reporter: Tao He


The following example code doesn't work with python 3.8:

```python
import pyarrow as pa
data = [
pa.array([1, 2, 3, 4]),
pa.array(['foo', 'bar', 'baz', None]),
pa.array([True, None, False, True])
]
batch = pa.record_batch(data, names=['f0', 'f1', 'f2'])

sink = pa.BufferOutputStream()
writer = pa.ipc.new_stream(sink, batch.schema)

for i in range(5):
   writer.write_batch(batch)
writer.close()
buf = sink.getvalue()

reader = pa.ipc.open_stream(buf)
[i for i in reader]
```



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10599) Prebuilt distributions (aka. pyarrow and libarrow-dev) should use the same ABI (with or without the DUAL abi)

2020-11-15 Thread Tao He (Jira)
Tao He created ARROW-10599:
--

 Summary: Prebuilt distributions (aka. pyarrow and libarrow-dev) 
should use the same ABI (with or without the DUAL abi)
 Key: ARROW-10599
 URL: https://issues.apache.org/jira/browse/ARROW-10599
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++, Python
Affects Versions: 2.0.0, 1.0.1, 0.17.0
Reporter: Tao He


I have observed that the python release (pyarrow) and c++ release (libarrow-dev 
for ubuntu) are built using the different GCC ABI.

The former, pyarrow, builtin within the manylinux1 environment, using gcc-4.8, 
however the later's ABI has a `[cxx11]` tag. That blocks users to develop 
python C extensions that depends on libarrow-dev. For example, we have 
developed `lib` A in C++, which use arrow's `Arrow::Buffer` from libarrow-dev, 
and wrap it using things like `pybind11` to a python module `liba`. After 
building the `liba` on commodity Ubuntu (which could install libarrow-dev with 
apt-get), the user import both `liba` and `pyarrow` to the python's script, it 
won't work correctly due to the ABI confliction (especially when it comes to 
the string cases).

I can see two options to make it works:

1. build arrow's python package using static link, that the pyarrow won't 
contains so many shared libraries (libarrow.so, libarrow_python.so, etc.)
2. distribute `libarrow-dev` with `-D_GLIBCXX_USE_CXX11_ABI=0`

I'm also wondering if there's any technical issues that not distributing 
packages in different languages with the same ABI.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6054) pyarrow.serialize should respect the value of structured dtype of numpy

2019-07-27 Thread Tao He (JIRA)
Tao He created ARROW-6054:
-

 Summary: pyarrow.serialize should respect the value of structured 
dtype of numpy
 Key: ARROW-6054
 URL: https://issues.apache.org/jira/browse/ARROW-6054
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Affects Versions: 0.14.1
Reporter: Tao He
Assignee: Tao He






--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (ARROW-2455) The bytes_allocated_ in CudaContextImpl isn't initialized

2018-04-13 Thread Tao He (JIRA)
Tao He created ARROW-2455:
-

 Summary: The bytes_allocated_ in CudaContextImpl isn't initialized
 Key: ARROW-2455
 URL: https://issues.apache.org/jira/browse/ARROW-2455
 Project: Apache Arrow
  Issue Type: Bug
  Components: GPU
Reporter: Tao He


The atomic counter `bytes_allocated_` in `CudaContextImpl` isn't initialized, 
leading to failure of cuda-test on windows.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)