[jira] [Created] (ARROW-11458) PyArrow 1.x and 2.x do not work with numpy 1.20
Zhuo Peng created ARROW-11458: - Summary: PyArrow 1.x and 2.x do not work with numpy 1.20 Key: ARROW-11458 URL: https://issues.apache.org/jira/browse/ARROW-11458 Project: Apache Arrow Issue Type: Bug Components: Python Affects Versions: 2.0.0, 1.0.1, 1.0.0 Reporter: Zhuo Peng Numpy 1.20 was released on 1/30 and it is not compatible with libraries that built against numpy<1.16.6 which is the case for pyarrow 1.x and 2.x. However, pyarrow does not specify an upper bound for the numpy version [1]. ``` Python 3.7.9 (default, Oct 30 2020, 13:50:59) [GCC 10.2.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import pyarrow as pa >>> import numpy as np >>> np.__version__ '1.20.0' >>> pa.__version__ '2.0.0' >>> pa.array(np.arange(10)) Traceback (most recent call last): File "", line 1, in File "pyarrow/array.pxi", line 292, in pyarrow.lib.array File "pyarrow/array.pxi", line 79, in pyarrow.lib._ndarray_to_array File "pyarrow/array.pxi", line 67, in pyarrow.lib._ndarray_to_type File "pyarrow/error.pxi", line 107, in pyarrow.lib.check_status pyarrow.lib.ArrowTypeError: Did not pass numpy.dtype object ``` [1] https://github.com/apache/arrow/blob/478286658055bb91737394c2065b92a7e92fb0c1/python/setup.py#L572 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11232) Table::CombineChunks() returns incorrect results if Table has no column
Zhuo Peng created ARROW-11232: - Summary: Table::CombineChunks() returns incorrect results if Table has no column Key: ARROW-11232 URL: https://issues.apache.org/jira/browse/ARROW-11232 Project: Apache Arrow Issue Type: Bug Affects Versions: 2.0.0 Reporter: Zhuo Peng Assignee: Zhuo Peng >>> pa.table([[1]], ["a"]) pyarrow.Table a: int64 >>> t = pa.table([[1]], ["a"]) >>> t.num_rows 1 >>> t1 = t.drop(["a"]) >>> t1.num_rows 1 >>> t2 = t1.combine_chunks() >>> t2.num_rows 0 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-9098) RecordBatch::ToStructArray cannot handle record batches with 0 column
Zhuo Peng created ARROW-9098: Summary: RecordBatch::ToStructArray cannot handle record batches with 0 column Key: ARROW-9098 URL: https://issues.apache.org/jira/browse/ARROW-9098 Project: Apache Arrow Issue Type: Bug Components: C++ Affects Versions: 0.17.1 Reporter: Zhuo Peng If RecordBatch::ToStructArray is called against a record batch with 0 column, the following error will be raised: Invalid: Can't infer struct array length with 0 child arrays -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-9071) [C++] MakeArrayOfNull makes invalid ListArray
Zhuo Peng created ARROW-9071: Summary: [C++] MakeArrayOfNull makes invalid ListArray Key: ARROW-9071 URL: https://issues.apache.org/jira/browse/ARROW-9071 Project: Apache Arrow Issue Type: Bug Components: C++, Python Reporter: Zhuo Peng One way to reproduce this bug is: >>> a = pa.array([[1, 2]]) >>> b = pa.array([None, None], type=pa.null()) >>> t1 = pa.Table.from_arrays([a], ["a"]) >>> t2 = pa.Table.from_arrays([b], ["b"]) >>> pa.concat_tables([t1, t2], promote=True) Traceback (most recent call last): File "", line 1, in File "pyarrow/table.pxi", line 2138, in pyarrow.lib.concat_tables File "pyarrow/public-api.pxi", line 390, in pyarrow.lib.pyarrow_wrap_table File "pyarrow/error.pxi", line 85, in pyarrow.lib.check_status pyarrow.lib.ArrowInvalid: Column 0: In chunk 1: Invalid: List child array invalid: Invalid: Buffer #1 too small in array of type int64 and length 2: expected at least 16 byte(s), got 12 (because concat_tables(promote=True) will call MakeArrayOfNulls ([https://github.com/apache/arrow/blob/ec3bae18157723411bb772fca628cbd02eea5c25/cpp/src/arrow/table.cc#L647))|https://github.com/apache/arrow/blob/ec3bae18157723411bb772fca628cbd02eea5c25/cpp/src/arrow/table.cc#L647)'] The code here seems incorrect: [https://github.com/apache/arrow/blob/ec3bae18157723411bb772fca628cbd02eea5c25/cpp/src/arrow/array/util.cc#L218] the length of the child array of a ListArray may not equal to the length of the ListArray. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-9037) [C++/C-ABI] unable to import array with null count == -1 (which could be exported)
Zhuo Peng created ARROW-9037: Summary: [C++/C-ABI] unable to import array with null count == -1 (which could be exported) Key: ARROW-9037 URL: https://issues.apache.org/jira/browse/ARROW-9037 Project: Apache Arrow Issue Type: Bug Components: C++ Affects Versions: 0.17.1 Reporter: Zhuo Peng If an Array is created with null_count == -1 but without any null (and thus no null bitmap buffer), then ArrayData.null_count will remain -1 when exporting if null_count is never computed. The exported C struct also has null_count == -1 [1]. But when importing, if null_count != 0, an error [2] will be raised. [1] https://github.com/apache/arrow/blob/5389008df0267ba8d57edb7d6bb6ec0bfa10ff9a/cpp/src/arrow/c/bridge.cc#L560 [2] https://github.com/apache/arrow/blob/5389008df0267ba8d57edb7d6bb6ec0bfa10ff9a/cpp/src/arrow/c/bridge.cc#L1404 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-7229) [C++] Unify ConcatenateTables APIs
[ https://issues.apache.org/jira/browse/ARROW-7229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17092964#comment-17092964 ] Zhuo Peng commented on ARROW-7229: -- AFAIK this is done. The API has been unified and an option struct has been introduced. Maybe the test cases in table_test.cc could be refactored to reflect closely the API change. > [C++] Unify ConcatenateTables APIs > -- > > Key: ARROW-7229 > URL: https://issues.apache.org/jira/browse/ARROW-7229 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Zhuo Peng >Assignee: Zhuo Peng >Priority: Minor > Fix For: 1.0.0 > > > Today we have ConcatenateTables() and ConcatenateTablesWithPromotion() in > C++. It's anticipated that they will allow more customization/tweaking. To > avoid complicating the API surface, we should introduce a > ConcatenateTableOption object, unify the two functions, and allow further > customization to be expressed in that option object. > Related discussion: > [https://lists.apache.org/thread.html/1fa85b078dae09639de04afcf948aad1bfabd48ea8a38e33969495c5@%3Cdev.arrow.apache.org%3E] > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8277) [Python] RecordBatch interface improvements
Zhuo Peng created ARROW-8277: Summary: [Python] RecordBatch interface improvements Key: ARROW-8277 URL: https://issues.apache.org/jira/browse/ARROW-8277 Project: Apache Arrow Issue Type: Improvement Components: Python Reporter: Zhuo Peng Assignee: Zhuo Peng Currently __eq__, __repr__ of RecordBatch are not implemented. compute::Take also supports RecordBatch inputs but there's no python wrapper for it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (ARROW-7806) [Python] Implement to_pandas for lists of LargeBinary/String
[ https://issues.apache.org/jira/browse/ARROW-7806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhuo Peng reassigned ARROW-7806: Assignee: Zhuo Peng > [Python] Implement to_pandas for lists of LargeBinary/String > > > Key: ARROW-7806 > URL: https://issues.apache.org/jira/browse/ARROW-7806 > Project: Apache Arrow > Issue Type: Bug >Reporter: Zhuo Peng >Assignee: Zhuo Peng >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > For example: > > >>> a = pa.array([['a']], type=pa.list_(pa.large_binary())) > >>> a.to_pandas() > Traceback (most recent call last): > File "", line 1, in > File "pyarrow/array.pxi", line 468, in > pyarrow.lib._PandasConvertible.to_pandas > File "pyarrow/array.pxi", line 902, in pyarrow.lib.Array._to_pandas > File "pyarrow/error.pxi", line 86, in pyarrow.lib.check_status > pyarrow.lib.ArrowNotImplementedError: Not implemented type for lists: > large_binary -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-1231) [C++] Add filesystem / IO implementation for Google Cloud Storage
[ https://issues.apache.org/jira/browse/ARROW-1231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17052531#comment-17052531 ] Zhuo Peng commented on ARROW-1231: -- I don't work on related stuff, but looking at our internal site, google-cloud-cpp seems to be right choice. Micah might know more. [https://googleapis.dev/cpp/google-cloud-storage/latest/] seems to be the documentation for [https://googleapis.github.io/google-cloud-cpp/] ? > [C++] Add filesystem / IO implementation for Google Cloud Storage > - > > Key: ARROW-1231 > URL: https://issues.apache.org/jira/browse/ARROW-1231 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Wes McKinney >Priority: Major > Labels: filesystem > > See example jumping off point > https://github.com/tensorflow/tensorflow/tree/master/tensorflow/core/platform/cloud -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (ARROW-7802) [C++] Support for LargeBinary and LargeString in the hash kernel
[ https://issues.apache.org/jira/browse/ARROW-7802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhuo Peng reassigned ARROW-7802: Assignee: Zhuo Peng > [C++] Support for LargeBinary and LargeString in the hash kernel > > > Key: ARROW-7802 > URL: https://issues.apache.org/jira/browse/ARROW-7802 > Project: Apache Arrow > Issue Type: Bug >Reporter: Zhuo Peng >Assignee: Zhuo Peng >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Currently they are not supported: > https://github.com/apache/arrow/blob/a76e277213e166dbeb148260498995ba053566fb/cpp/src/arrow/compute/kernels/hash.cc#L456 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7806) [Python] {Array,Table,RecordBatch}.to_pandas() do not support Large variants of ListArray, BinaryArray and StringArray
Zhuo Peng created ARROW-7806: Summary: [Python] {Array,Table,RecordBatch}.to_pandas() do not support Large variants of ListArray, BinaryArray and StringArray Key: ARROW-7806 URL: https://issues.apache.org/jira/browse/ARROW-7806 Project: Apache Arrow Issue Type: Bug Reporter: Zhuo Peng For example: >>> a = pa.array([['a']], type=pa.list_(pa.large_binary())) >>> a.to_pandas() Traceback (most recent call last): File "", line 1, in File "pyarrow/array.pxi", line 468, in pyarrow.lib._PandasConvertible.to_pandas File "pyarrow/array.pxi", line 902, in pyarrow.lib.Array._to_pandas File "pyarrow/error.pxi", line 86, in pyarrow.lib.check_status pyarrow.lib.ArrowNotImplementedError: Not implemented type for lists: large_binary -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7802) [C++] Support for LargeBinary and LargeString in the hash kernel
Zhuo Peng created ARROW-7802: Summary: [C++] Support for LargeBinary and LargeString in the hash kernel Key: ARROW-7802 URL: https://issues.apache.org/jira/browse/ARROW-7802 Project: Apache Arrow Issue Type: Bug Reporter: Zhuo Peng Currently they are not supported: https://github.com/apache/arrow/blob/a76e277213e166dbeb148260498995ba053566fb/cpp/src/arrow/compute/kernels/hash.cc#L456 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-7510) [C++] Array::null_count() is not thread-compatible
[ https://issues.apache.org/jira/browse/ARROW-7510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17009948#comment-17009948 ] Zhuo Peng commented on ARROW-7510: -- Yes. Please see the attached articles. > [C++] Array::null_count() is not thread-compatible > -- > > Key: ARROW-7510 > URL: https://issues.apache.org/jira/browse/ARROW-7510 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Zhuo Peng >Priority: Minor > > ArrayData has a mutable member null_count, that can be updated in a const > function. However null_count is not atomic, so it's subject to data race. > > I guess Arrays are not thread-safe (which is reasonable), but at least they > should be thread-compatible so that concurrent access to const member > functions are fine. > (The race looks "benign", but see [1][2]) > [https://github.com/apache/arrow/blob/dbe708c7527a4aa6b63df7722cd57db4e0bd2dc7/cpp/src/arrow/array.cc#L123] > > [1][https://software.intel.com/en-us/blogs/2013/01/06/benign-data-races-what-could-possibly-go-wrong] > [2][https://bartoszmilewski.com/2014/10/25/dealing-with-benign-data-races-the-c-way/] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7510) [C++] Array::null_count() is not thread-compatible
Zhuo Peng created ARROW-7510: Summary: [C++] Array::null_count() is not thread-compatible Key: ARROW-7510 URL: https://issues.apache.org/jira/browse/ARROW-7510 Project: Apache Arrow Issue Type: Bug Components: C++ Reporter: Zhuo Peng ArrayData has a mutable member null_count, that can be updated in a const function. However null_count is not atomic, so it's subject to data race. I guess Arrays are not thread-safe (which is reasonable), but at least they should be thread-compatible so that concurrent access to const member functions are fine. (The race looks "benign", but see [1][2]) [https://github.com/apache/arrow/blob/dbe708c7527a4aa6b63df7722cd57db4e0bd2dc7/cpp/src/arrow/array.cc#L123] [1][https://software.intel.com/en-us/blogs/2013/01/06/benign-data-races-what-could-possibly-go-wrong] [2][https://bartoszmilewski.com/2014/10/25/dealing-with-benign-data-races-the-c-way/] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (ARROW-7096) [C++] Add options structs for concatenation-with-promotion and schema unification
[ https://issues.apache.org/jira/browse/ARROW-7096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhuo Peng reassigned ARROW-7096: Assignee: Zhuo Peng > [C++] Add options structs for concatenation-with-promotion and schema > unification > - > > Key: ARROW-7096 > URL: https://issues.apache.org/jira/browse/ARROW-7096 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Wes McKinney >Assignee: Zhuo Peng >Priority: Major > Fix For: 1.0.0 > > > Follow up to ARROW-6625 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7362) [Python] ListArray.flatten() should take care of slicing offsets
Zhuo Peng created ARROW-7362: Summary: [Python] ListArray.flatten() should take care of slicing offsets Key: ARROW-7362 URL: https://issues.apache.org/jira/browse/ARROW-7362 Project: Apache Arrow Issue Type: Bug Reporter: Zhuo Peng Assignee: Zhuo Peng Currently ListArray.flatten() simply returns the child array. If a ListArray is a slice of another ListArray, they will share the same child array, however the expected behavior (I think) of flatten() should be returning an Array that's a concatenation of all the sub-lists in the ListArray, so the slicing offset should be taken into account. For example: a = pa.array([[1], [2], [3]]) assert a.flatten().equals(pa.array([1,2,3])) # expected: a.slice(1).flatten().equals(pa.array([2, 3])) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7229) [C++] Unify ConcatenateTables APIs
Zhuo Peng created ARROW-7229: Summary: [C++] Unify ConcatenateTables APIs Key: ARROW-7229 URL: https://issues.apache.org/jira/browse/ARROW-7229 Project: Apache Arrow Issue Type: Improvement Reporter: Zhuo Peng Assignee: Zhuo Peng Today we have ConcatenateTables() and ConcatenateTablesWithPromotion() in C++. It's anticipated that they will allow more customization/tweaking. To avoid complicating the API surface, we should introduce a ConcatenateTableOption object, unify the two functions, and allow further customization to be expressed in that option object. Related discussion: [https://lists.apache.org/thread.html/1fa85b078dae09639de04afcf948aad1bfabd48ea8a38e33969495c5@%3Cdev.arrow.apache.org%3E] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7228) [Python] Expose RecordBatch.FromStructArray in Python.
Zhuo Peng created ARROW-7228: Summary: [Python] Expose RecordBatch.FromStructArray in Python. Key: ARROW-7228 URL: https://issues.apache.org/jira/browse/ARROW-7228 Project: Apache Arrow Issue Type: New Feature Components: Python Reporter: Zhuo Peng Assignee: Zhuo Peng Fix For: 1.0.0 This API was introduced in ARROW-6243. It will make converting from a list of python dicts to a RecordBatch easier: struct_array = pa.array([\{"column1": 1, "column2": 5}, \{"column2": 6}]) record_batch = pa.RecordBatch.from_struct_array(struct_array) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7227) [Python] Provide wrappers for ConcatenateWithPromotion()
Zhuo Peng created ARROW-7227: Summary: [Python] Provide wrappers for ConcatenateWithPromotion() Key: ARROW-7227 URL: https://issues.apache.org/jira/browse/ARROW-7227 Project: Apache Arrow Issue Type: New Feature Reporter: Zhuo Peng Assignee: Zhuo Peng Fix For: 1.0.0 [https://github.com/apache/arrow/pull/5534] Introduced ConcatenateWithPromotion() to C++. Provide a Python wrapper for it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-6878) [Python] pa.array() does not handle list of dicts with bytes keys correctly under python3
Zhuo Peng created ARROW-6878: Summary: [Python] pa.array() does not handle list of dicts with bytes keys correctly under python3 Key: ARROW-6878 URL: https://issues.apache.org/jira/browse/ARROW-6878 Project: Apache Arrow Issue Type: Bug Reporter: Zhuo Peng It creates sub-arrays with nulls filled, instead of the provided values. $ python Python 3.6.8 (default, Jan 3 2019, 03:42:36) [GCC 8.2.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import pyarrow as pa >>> pa.__version__ '0.15.0' >>> a = pa.array([\{b"a": [1, 2, 3]}]) >>> a -- is_valid: all not null -- child 0 type: list [ null ] >>> a = pa.array([\{"a": [1, 2, 3]}]) >>> a -- is_valid: all not null -- child 0 type: list [ [ 1, 2, 3 ] ] It works under python2. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-6848) [C++] Specify -std=c++11 instead of -std=gnu++11 when building
Zhuo Peng created ARROW-6848: Summary: [C++] Specify -std=c++11 instead of -std=gnu++11 when building Key: ARROW-6848 URL: https://issues.apache.org/jira/browse/ARROW-6848 Project: Apache Arrow Issue Type: Bug Reporter: Zhuo Peng Relevant discussion: [https://lists.apache.org/thread.html/5807e65d865c1736b3a7a32653ca8bb405d719eb13b8a10b6fe0e904@%3Cdev.arrow.apache.org%3E] in addition to set(CMAKE_CXX_STANDARD 11) , we also need to set(CMAKE_CXX_EXTENSIONS OFF) in order to turn off compiler-specific extensions (with GCC, it's -std=gnu++11) This is supposed to be a no-op, because Arrow builds fine with other compilers (Clang-LLVM / MSCV). But opening this bug to track any issues with flipping the switch. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-6775) Proposal for several Array utility functions
Zhuo Peng created ARROW-6775: Summary: Proposal for several Array utility functions Key: ARROW-6775 URL: https://issues.apache.org/jira/browse/ARROW-6775 Project: Apache Arrow Issue Type: Wish Reporter: Zhuo Peng Hi, We developed several utilities that computes / accesses certain properties of Arrays and wonder if they make sense to get them into the upstream (into both the C++ API and pyarrow) and assuming yes, where is the best place to put them? Maybe I have overlooked existing APIs that already do the same.. in that case please point out. 1/ ListLengthFromListArray(ListArray&) Returns lengths of lists in a ListArray, as a Int32Array (or Int64Array for large lists). For example: [[1, 2, 3], [], None] => [3, 0, 0] (or [3, 0, None], but we hope the returned array can be converted to numpy) 2/ GetBinaryArrayTotalByteSize(BinaryArray&) Returns the total byte size of a BinaryArray (basically offset[len - 1] - offset[0]). Alternatively, a BinaryArray::Flatten() -> Uint8Array would work. 3/ GetArrayNullBitmapAsByteArray(Array&) Returns the array's null bitmap as a UInt8Array (which can be efficiently converted to a bool numpy array) 4/ GetFlattenedArrayParentIndices(ListArray&) Makes a int32 array of the same length as the flattened ListArray. returned_array[i] == j means i-th element in the flattened ListArray came from j-th list in the ListArray. For example [[1,2,3], [], None, [4,5]] => [0, 0, 0, 3, 3] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-6625) [Python] Allow concat_tables to null or default fill missing columns
[ https://issues.apache.org/jira/browse/ARROW-6625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16938956#comment-16938956 ] Zhuo Peng commented on ARROW-6625: -- Daniel / Wes, are you working on implementing this? I'm also interested in this feature and if you are not working on it I can take it. I have a small amendment to this FR though: does it make sense to allow concatenating a column of type Null (NullArray) with any other type of column? The result would be again a column of the other type, with null filled for the rows in the NullArray. > [Python] Allow concat_tables to null or default fill missing columns > > > Key: ARROW-6625 > URL: https://issues.apache.org/jira/browse/ARROW-6625 > Project: Apache Arrow > Issue Type: Wish > Components: Python >Reporter: Daniel Nugent >Priority: Minor > Fix For: 1.0.0 > > > The concat_tables function currently requires schemas to be identical across > all tables to be concat'ed together. However, tables occasionally are > conforming on type where present, but a column will be absent. > In this case, allowing for null filling (or default filling) would be ideal. > I imagine this feature would be an optional parameter on the concat_tables > function. Presumably the argument could be either a boolean in the case of > blanket null filling, or a mapping type for default filling. If a user wanted > to default fill some columns, but null fill others, they could use a None as > the value (defaultdict would make it simple to provide a blanket null fill if > only a few default value columns were desired). > If a mapping wasn't present, the function should probably raise an error. > The default behavior would be the current and thus the default value of the > parameter should be False or None. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-5894) [C++] libgandiva.so.14 is exporting libstdc++ symbols
[ https://issues.apache.org/jira/browse/ARROW-5894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16885414#comment-16885414 ] Zhuo Peng commented on ARROW-5894: -- [https://github.com/apache/arrow/pull/4883] > [C++] libgandiva.so.14 is exporting libstdc++ symbols > - > > Key: ARROW-5894 > URL: https://issues.apache.org/jira/browse/ARROW-5894 > Project: Apache Arrow > Issue Type: Bug > Components: C++ - Gandiva >Affects Versions: 0.14.0 >Reporter: Zhuo Peng >Priority: Critical > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > For example: > $ nm libgandiva.so.14 | grep "once_proxy" > 018c0a10 T __once_proxy > > many other symbols are also exported which I guess shouldn't be (e.g. LLVM > symbols) > > There seems to be no linker script for libgandiva.so (there was, but was > never used and got deleted? > [https://github.com/apache/arrow/blob/9265fe35b67db93f5af0b47e92e039c637ad5b3e/cpp/src/gandiva/symbols-helpers.map]). > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (ARROW-5894) libgandiva.so.14 is exporting libstdc++ symbols
Zhuo Peng created ARROW-5894: Summary: libgandiva.so.14 is exporting libstdc++ symbols Key: ARROW-5894 URL: https://issues.apache.org/jira/browse/ARROW-5894 Project: Apache Arrow Issue Type: Bug Components: C++ - Gandiva Affects Versions: 0.14.0 Reporter: Zhuo Peng For example: $ nm libgandiva.so.14 | grep "once_proxy" 018c0a10 T __once_proxy many other symbols are also exported which I guess shouldn't be (e.g. LLVM symbols) There seems to be no linker script for libgandiva.so (there was, but was never used and got deleted? [https://github.com/apache/arrow/blob/9265fe35b67db93f5af0b47e92e039c637ad5b3e/cpp/src/gandiva/symbols-helpers.map]). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-5749) [Python] Add Python binding for Table::CombineChunks()
[ https://issues.apache.org/jira/browse/ARROW-5749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16873711#comment-16873711 ] Zhuo Peng commented on ARROW-5749: -- [https://github.com/apache/arrow/pull/4712] > [Python] Add Python binding for Table::CombineChunks() > -- > > Key: ARROW-5749 > URL: https://issues.apache.org/jira/browse/ARROW-5749 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Zhuo Peng >Assignee: Zhuo Peng >Priority: Minor > Fix For: 0.14.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-5635) Support "compacting" a table
[ https://issues.apache.org/jira/browse/ARROW-5635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16866045#comment-16866045 ] Zhuo Peng commented on ARROW-5635: -- [https://github.com/apache/arrow/pull/4598] > Support "compacting" a table > > > Key: ARROW-5635 > URL: https://issues.apache.org/jira/browse/ARROW-5635 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Zhuo Peng >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > A column in a table might consists of multiple chunks. I'm proposing a > Table.Compact() method that returns a table whose columns are of just one > chunks, which is the concatenation of the corresponding column's chunks. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5635) Support "compacting" a table
Zhuo Peng created ARROW-5635: Summary: Support "compacting" a table Key: ARROW-5635 URL: https://issues.apache.org/jira/browse/ARROW-5635 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Zhuo Peng A column in a table might consists of multiple chunks. I'm proposing a Table.Compact() method that returns a table whose columns are of just one chunks, which is the concatenation of the corresponding column's chunks. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-5554) Add a python wrapper for arrow::Concatenate
[ https://issues.apache.org/jira/browse/ARROW-5554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16861143#comment-16861143 ] Zhuo Peng commented on ARROW-5554: -- [https://github.com/apache/arrow/pull/4519] > Add a python wrapper for arrow::Concatenate > --- > > Key: ARROW-5554 > URL: https://issues.apache.org/jira/browse/ARROW-5554 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Affects Versions: 0.14.0 >Reporter: Zhuo Peng >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5554) Add a python wrapper for arrow::Concatenate
Zhuo Peng created ARROW-5554: Summary: Add a python wrapper for arrow::Concatenate Key: ARROW-5554 URL: https://issues.apache.org/jira/browse/ARROW-5554 Project: Apache Arrow Issue Type: Improvement Components: Python Affects Versions: 0.14.0 Reporter: Zhuo Peng -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5528) Concatenate() crashes when concatenating empty binary arrays.
Zhuo Peng created ARROW-5528: Summary: Concatenate() crashes when concatenating empty binary arrays. Key: ARROW-5528 URL: https://issues.apache.org/jira/browse/ARROW-5528 Project: Apache Arrow Issue Type: Bug Components: C++ Affects Versions: 0.13.0 Reporter: Zhuo Peng Fix For: 0.14.0 [https://github.com/brills/arrow/commit/42063bb5297f34d9b98e831264c47add2da68591] -- This message was sent by Atlassian JIRA (v7.6.3#76005)