[jira] [Created] (ARROW-11096) [Rust] Add FFI for [Large]Binary
Jorge Leitão created ARROW-11096: Summary: [Rust] Add FFI for [Large]Binary Key: ARROW-11096 URL: https://issues.apache.org/jira/browse/ARROW-11096 Project: Apache Arrow Issue Type: New Feature Components: Rust Reporter: Jorge Leitão Assignee: Jorge Leitão -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11095) [Python] Access pyarrow.RecordBatch column by name
Will Jones created ARROW-11095: -- Summary: [Python] Access pyarrow.RecordBatch column by name Key: ARROW-11095 URL: https://issues.apache.org/jira/browse/ARROW-11095 Project: Apache Arrow Issue Type: Improvement Components: Python Reporter: Will Jones I propose adding support for selecting a column out of a pyarrow.RecordBatch using both __getitem__() and .field(), like we have in pyarrow.Table. pyarrow.RecordBatch has a pretty similar API to pyarrow.Table (e.g. both have filter and take methods and a schema), but I got tripped up on this difference. pyarrow.Table supports accessing columns by name using both __getitem__ and .field(): {code:python} my_array = pa.array(range(10)) table = pa.Table.from_arrays([my_array], names=['my_column']) // Both of these work on table: table['my_column'] table.field('my_column') {code} Meanwhile pyarrow.RecordBatch doesn't support either of those. In fact, I had a hard time finding a way to grab a column by name from a recordbatch without first looking up the integer index. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11094) [Rust] [DataFusion] Implement Sort-Merge Join
Andy Grove created ARROW-11094: -- Summary: [Rust] [DataFusion] Implement Sort-Merge Join Key: ARROW-11094 URL: https://issues.apache.org/jira/browse/ARROW-11094 Project: Apache Arrow Issue Type: New Feature Components: Rust - DataFusion Reporter: Andy Grove Fix For: 4.0.0 The current hash join works well when one side of the join can be loaded into memory but cannot scale beyond the available RAM. The advantage of implementing SMJ (Sort-Merge Join) is that we can sort the left and right partitions in parallel and then stream both sides of the join by merging these sorted partitions and we do not need to load one side into memory. At most, we need to load all batches from both sides that contain the current join key values. https://en.wikipedia.org/wiki/Sort-merge_join -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11093) [Rust] [DataFusion] RFC Roadmap for 2021
Andy Grove created ARROW-11093: -- Summary: [Rust] [DataFusion] RFC Roadmap for 2021 Key: ARROW-11093 URL: https://issues.apache.org/jira/browse/ARROW-11093 Project: Apache Arrow Issue Type: Task Components: Rust Reporter: Andy Grove Assignee: Andy Grove Given the momentum and number of contributors involved in the Rust implementation, I think it would be useful to crowdsource a roadmap for the next few releases that we expect to release in 2021. We have a small number of active committers on the project currently and it is hard for us to keep up with all the PRs sometimes, especially when so many different areas are being contributed to. It would be helpful if we can co-ordinate to prioritize work for the release. Of course, this is open source, and anyone can contribute anything at any time, but it would be nice to have some areas that we all agree are the main priorities. I will create a PR to kick start this discussion. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11092) [CI] (Temporarily) move offending workflows to separate files
Neal Richardson created ARROW-11092: --- Summary: [CI] (Temporarily) move offending workflows to separate files Key: ARROW-11092 URL: https://issues.apache.org/jira/browse/ARROW-11092 Project: Apache Arrow Issue Type: Bug Components: Continuous Integration Reporter: Neal Richardson Assignee: Neal Richardson Fix For: 3.0.0 Without warning, INFRA broke several of our GitHub Actions workflows, and have been unresponsive all week. See https://issues.apache.org/jira/browse/INFRA-21239. Since then, the Rust developers have removed their offending actions, so those are no longer blocked. This PR does harm reduction for C++ and R workflows, moving the workflows that INFRA doesn't like to their own files (temporarily, I hope, while this business gets sorted out). This enables the other workflows in each file to run, so we at least get some C++ and R tests running, and we can still verify on our personal forks the workflows that have been blocked on apache/arrow. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11091) [Rust][DataFusion] Fix clippy warning in rust 1.49
Daniël Heres created ARROW-11091: Summary: [Rust][DataFusion] Fix clippy warning in rust 1.49 Key: ARROW-11091 URL: https://issues.apache.org/jira/browse/ARROW-11091 Project: Apache Arrow Issue Type: Improvement Components: Rust - DataFusion Reporter: Daniël Heres Assignee: Daniël Heres -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11090) [R] Support date + datetime arithmetic
Jonathan Keane created ARROW-11090: -- Summary: [R] Support date + datetime arithmetic Key: ARROW-11090 URL: https://issues.apache.org/jira/browse/ARROW-11090 Project: Apache Arrow Issue Type: New Feature Components: R Reporter: Jonathan Keane [It appears that only subtract on two datetimes is currently supported|https://github.com/apache/arrow/commit/dd94a5809b56b32fe2fb538f688bf568d9642e3b] when there is more supported, we should include support for that in R -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11089) [C++][Gandiva] Support list datatype for gandiva UDF
Jiangtao Peng created ARROW-11089: - Summary: [C++][Gandiva] Support list datatype for gandiva UDF Key: ARROW-11089 URL: https://issues.apache.org/jira/browse/ARROW-11089 Project: Apache Arrow Issue Type: New Feature Components: C++ - Gandiva Reporter: Jiangtao Peng Hope to add arrow list type for gandiva expression inputs and outputs -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11088) [Rust][DataFusion] Calculate column indices upfront in hash join
Daniël Heres created ARROW-11088: Summary: [Rust][DataFusion] Calculate column indices upfront in hash join Key: ARROW-11088 URL: https://issues.apache.org/jira/browse/ARROW-11088 Project: Apache Arrow Issue Type: Improvement Components: Rust - DataFusion Reporter: Daniël Heres Assignee: Daniël Heres -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11087) [Rust] SIMD aggregate kernel produces flawed results.
Ritchie created ARROW-11087: --- Summary: [Rust] SIMD aggregate kernel produces flawed results. Key: ARROW-11087 URL: https://issues.apache.org/jira/browse/ARROW-11087 Project: Apache Arrow Issue Type: Bug Reporter: Ritchie I don't know if this is still accurate on master, but Arrow 2.0 simd sum gives me flawed results when compiled with SIMD. When SIMD is toggled off I get correct results. When I have more time I can get a reproducible example if requested. Dataset on which this shows different results (as numpy array) Output of *np.nansum* is 39. Output of SIMD kernel is 37. {code:java} array([ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., nan, 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., -1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 1., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., -1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 2., 1., 0., 0., 0., 0., 0., 0., 0., -2., 0., 0., 0., 0., 0., -1., 0., 0., 0., 0., 0., 0., 1., 1., -1., 0., 0., 0., 1., 2., 0., 0., 0., 0., 0., 0., 1., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 2., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 2., 6., 0., 0., 0., 2., 0., 0., 0., 0., 0., 0., 0., 1., 3., 0., 2., 0., 0., 1., 4., 2., 0., 0., 0., 0., nan, 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11086) [Rust] Extend take to support more index types
Daniël Heres created ARROW-11086: Summary: [Rust] Extend take to support more index types Key: ARROW-11086 URL: https://issues.apache.org/jira/browse/ARROW-11086 Project: Apache Arrow Issue Type: Improvement Components: Rust Reporter: Daniël Heres Assignee: Daniël Heres -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11085) [Rust] Migrated CI away from action-rs/*
Jorge Leitão created ARROW-11085: Summary: [Rust] Migrated CI away from action-rs/* Key: ARROW-11085 URL: https://issues.apache.org/jira/browse/ARROW-11085 Project: Apache Arrow Issue Type: Task Reporter: Jorge Leitão Assignee: Jorge Leitão INFRA team deactivated github actions for action-rs, which caused all our CI to stop working. Since our dependency on it is really small, I propose that we just migrate our builds to not use it. -- This message was sent by Atlassian Jira (v8.3.4#803005)