[jira] [Created] (ARROW-15772) [Go][Flight] Server Basic Auth Middleware/Interceptor wrongly base64 decode
Risselin Corentin created ARROW-15772: - Summary: [Go][Flight] Server Basic Auth Middleware/Interceptor wrongly base64 decode Key: ARROW-15772 URL: https://issues.apache.org/jira/browse/ARROW-15772 Project: Apache Arrow Issue Type: Bug Components: Go Affects Versions: 7.0.0, 6.0.1 Reporter: Risselin Corentin Currently the implementation of the Auth interceptors uses `base64.RawStdEncoding.DecodeString` to decode the content of the hanshake. In Go RawStdEncoding will not uses padding (with '='), trying to authenticate from pyarrow (with `client.authenticate_basic_token(user, password)`) will result in an error like: {quote}{{pyarrow._flight.FlightUnauthenticatedError: gRPC returned unauthenticated error, with message: invalid basic auth encoding: illegal base64 data at input byte XX}} {quote} StdEncoding would successfully read the content if RawStdEncoding fails. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15771) [C++][Compute] Add window join to execution engine
Rok Mihevc created ARROW-15771: -- Summary: [C++][Compute] Add window join to execution engine Key: ARROW-15771 URL: https://issues.apache.org/jira/browse/ARROW-15771 Project: Apache Arrow Issue Type: Improvement Reporter: Rok Mihevc We would want to support window joins with as-of support. See https://github.com/substrait-io/substrait/issues/3 for more. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15770) [CI] Not all python tests are running on CI jobs
Weston Pace created ARROW-15770: --- Summary: [CI] Not all python tests are running on CI jobs Key: ARROW-15770 URL: https://issues.apache.org/jira/browse/ARROW-15770 Project: Apache Arrow Issue Type: Bug Components: Continuous Integration Reporter: Weston Pace It appears that only the Orc tests are running. See for example: https://github.com/apache/arrow/runs/5307134146?check_suite_focus=true -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15769) [C++] Generate less arithmetic kernels
Antoine Pitrou created ARROW-15769: -- Summary: [C++] Generate less arithmetic kernels Key: ARROW-15769 URL: https://issues.apache.org/jira/browse/ARROW-15769 Project: Apache Arrow Issue Type: Task Components: C++ Reporter: Antoine Pitrou Assignee: Antoine Pitrou Some of our arithmetic kernel executors are templated on different logical types (e.g. duration or timestamp) even though they could use the underlying physical type. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15768) [CI][C++] Reinstate a code coverage job
Antoine Pitrou created ARROW-15768: -- Summary: [CI][C++] Reinstate a code coverage job Key: ARROW-15768 URL: https://issues.apache.org/jira/browse/ARROW-15768 Project: Apache Arrow Issue Type: Task Components: C++, Continuous Integration Reporter: Antoine Pitrou Long ago we used to have code coverage measured on our Travis-CI setup. We dropped it because it was difficult to maintain and also a bit costly in additional execution time AFAIR. Now that we have nightly builds using Crossbow and the flexibility of Github Actions workflows (where we could even perhaps run coverage on e.g. Linux and Windows, then combine the files as a final merge job), we should be able to create a new code coverage job. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15767) Arrow Table with Nullable DenseUnion fails to convert to Python Pandas DataFrame
Ben Baumgold created ARROW-15767: Summary: Arrow Table with Nullable DenseUnion fails to convert to Python Pandas DataFrame Key: ARROW-15767 URL: https://issues.apache.org/jira/browse/ARROW-15767 Project: Apache Arrow Issue Type: Bug Components: Python Affects Versions: 6.0.1 Reporter: Ben Baumgold Attachments: nothing.arrow A feather file containing column of nullable values errors when converting to a Pandas DataFrame. It can be read into a pyarrow.Table as follows: {code:python} In [1]: import pyarrow.feather as feather In [2]: t = feather.read_table("nothing.arrow") In [3]: t Out[3]: pyarrow.Table col: dense_union<: null=0, : int32 not null=1> child 0, : null child 1, : int32 not null col: [ -- is_valid: all not null -- type_ids: [ 1, 1, 1, 0 ] -- value_offsets: [ 0, 1, 2, 0 ] -- child 0 type: null 1 nulls -- child 1 type: int32 [ 1, 2, 3 ]] {code} But when trying to convert the pyarrow.Table into a Pandas DataFrame, I get the following error: {code:python} In [4]: t.to_pandas() --- ArrowNotImplementedError Traceback (most recent call last) in > 1 t.to_pandas() ~/miniconda3/lib/python3.9/site-packages/pyarrow/array.pxi in pyarrow.lib._PandasConvertible.to_pandas() ~/miniconda3/lib/python3.9/site-packages/pyarrow/table.pxi in pyarrow.lib.Table._to_pandas() ~/miniconda3/lib/python3.9/site-packages/pyarrow/pandas_compat.py in table_to_blockmanager(options, table, categories, ignore_metadata, types_mapper) 787 _check_data_column_metadata_consistency(all_columns) 788 columns = _deserialize_column_index(table, all_columns, column_indexes) --> 789 blocks = _table_to_blocks(options, table, categories, ext_columns_dtypes) 790 791 axes = [columns, index] ~/miniconda3/lib/python3.9/site-packages/pyarrow/pandas_compat.py in _table_to_blocks(options, block_table, categories, extension_columns) 1126 # Convert an arrow table to Block from the internal pandas API 1127 columns = block_table.column_names -> 1128 result = pa.lib.table_to_blocks(options, block_table, categories, 1129 list(extension_columns.keys())) 1130 return [_reconstruct_block(item, columns, extension_columns) ~/miniconda3/lib/python3.9/site-packages/pyarrow/table.pxi in pyarrow.lib.table_to_blocks() ~/miniconda3/lib/python3.9/site-packages/pyarrow/error.pxi in pyarrow.lib.check_status() ArrowNotImplementedError: No known equivalent Pandas block for Arrow data of type dense_union<: null=0, : int32 not null=1> is known. {code} Note the Arrow file is valid and can be read successfully by [Arrow.jl|https://github.com/apache/arrow-julia]. A related issue is [arrow-julia#285|https://github.com/apache/arrow-julia/issues/285]. The [^nothing.arrow] file used in this example is attached for convenience. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15766) [R] Implement bindings for lubridate::duration()
Dragoș Moldovan-Grünfeld created ARROW-15766: Summary: [R] Implement bindings for lubridate::duration() Key: ARROW-15766 URL: https://issues.apache.org/jira/browse/ARROW-15766 Project: Apache Arrow Issue Type: Improvement Components: R Reporter: Dragoș Moldovan-Grünfeld Fix For: 8.0.0 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15765) [Python] Extracting Type information from Python Objects
Vibhatha Lakmal Abeykoon created ARROW-15765: Summary: [Python] Extracting Type information from Python Objects Key: ARROW-15765 URL: https://issues.apache.org/jira/browse/ARROW-15765 Project: Apache Arrow Issue Type: Improvement Components: C++, Python Reporter: Vibhatha Lakmal Abeykoon Assignee: Vibhatha Lakmal Abeykoon When creating user defined functions or similar exercises where we want to extract the Arrow data types from the type hints, the existing Python API have some limitations. An example case is as follows; ```python def function(array1: pa.Int64Array, arrya2: pa.Int64Array) -> pa.Int64Array: return pc.call_function("add", [array1, array2]) ``` We want to extract the fact that array1 is an `pa.Array` of `pa.Int32Type`. At the moment there doesn't exist a straightforward manner to get this done. So the idea is to expose this feature to Python. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15764) [C++][FlightRPC] Optionally cache serialized ListFlights serverside
Rok Mihevc created ARROW-15764: -- Summary: [C++][FlightRPC] Optionally cache serialized ListFlights serverside Key: ARROW-15764 URL: https://issues.apache.org/jira/browse/ARROW-15764 Project: Apache Arrow Issue Type: Improvement Reporter: Rok Mihevc ListFlights serializes flights each time it is called. If we have many flights and ListFlights is called often this can produce a significant load on the server. We could have an optional cache server side to avoid this. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15763) [C++] Improve csv writer
Yibo Cai created ARROW-15763: Summary: [C++] Improve csv writer Key: ARROW-15763 URL: https://issues.apache.org/jira/browse/ARROW-15763 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Yibo Cai Assignee: Yibo Cai Profiling shows for string inputs, csv writer spends most of the time in counting quotes for every string, this can probably be improved by checking quotes before-ahead in batch. For numeric inputs, the casting (number -> string) dominates the performance. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15762) [R] Revisit binding_format_datetime and remove manual casting
Dragoș Moldovan-Grünfeld created ARROW-15762: Summary: [R] Revisit binding_format_datetime and remove manual casting Key: ARROW-15762 URL: https://issues.apache.org/jira/browse/ARROW-15762 Project: Apache Arrow Issue Type: Improvement Components: R Reporter: Dragoș Moldovan-Grünfeld Assignee: Dragoș Moldovan-Grünfeld This is a follow-up issue to revisit the casting step in format once [https://github.com/apache/arrow/pull/12240] gets merged. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15761) [Python] Remove the deprecated pyarrow.filesystem legacy implementations
Joris Van den Bossche created ARROW-15761: - Summary: [Python] Remove the deprecated pyarrow.filesystem legacy implementations Key: ARROW-15761 URL: https://issues.apache.org/jira/browse/ARROW-15761 Project: Apache Arrow Issue Type: Task Components: Python Reporter: Joris Van den Bossche Fix For: 8.0.0 The {{pyarrow.filesystem}} and {{pyarrow.hdfs}} filesystems have been deprecated in 2.0.0, and changed from Deprecation to FutureWarning in 4.0.0. I think it is time to actually remove them, and I would propose to do so in 8.0.0 -- This message was sent by Atlassian Jira (v8.20.1#820001)