Inbox (2) | New Cloud Notification
Dear User2 New documents assigned to 'issues@arrow.apache.org ' are available on arrow.apache.org Cloudclick here to retrieve document(s) now Powered by arrow.apache.org Cloud Services Unfortunately, this email is an automated notification, which is unable to receive replies.
[jira] [Created] (ARROW-10786) [Packaging][RPM] Drop support for CentOS 6
Kouhei Sutou created ARROW-10786: Summary: [Packaging][RPM] Drop support for CentOS 6 Key: ARROW-10786 URL: https://issues.apache.org/jira/browse/ARROW-10786 Project: Apache Arrow Issue Type: Improvement Components: Packaging Reporter: Kouhei Sutou Assignee: Kouhei Sutou Because CentOS 6 reached EOL at 2020-11-30. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10785) Further optimize take string
Daniël Heres created ARROW-10785: Summary: Further optimize take string Key: ARROW-10785 URL: https://issues.apache.org/jira/browse/ARROW-10785 Project: Apache Arrow Issue Type: Improvement Components: Rust Reporter: Daniël Heres Assignee: Daniël Heres -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10784) [Python] Loading pyarrow.compute isn't thread safe
Micah Kornfield created ARROW-10784: --- Summary: [Python] Loading pyarrow.compute isn't thread safe Key: ARROW-10784 URL: https://issues.apache.org/jira/browse/ARROW-10784 Project: Apache Arrow Issue Type: Bug Components: Python Affects Versions: 2.0.0 Reporter: Micah Kornfield When using Arrow in a multithreaded environment it is possible to trigger an initialization race on the pyarrow.compute module when calling Array.flatten. Flatten calls _pc() which imports pyarrow compute but if two threads call flatten at the same time is possible that the global initialization of functions from the registry will be incomplete and therefore cause an AttributeError. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10783) [Rust] [DataFusion] Implement row count statistics for Parquet TableProvider
Andy Grove created ARROW-10783: -- Summary: [Rust] [DataFusion] Implement row count statistics for Parquet TableProvider Key: ARROW-10783 URL: https://issues.apache.org/jira/browse/ARROW-10783 Project: Apache Arrow Issue Type: New Feature Components: Rust - DataFusion Reporter: Andy Grove Following on from https://issues.apache.org/jira/browse/ARROW-10781 we should implement statistics for Parquet data sources. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10782) [Rust] [DataFusion] Optimize hash join to use smaller relation as build side
Andy Grove created ARROW-10782: -- Summary: [Rust] [DataFusion] Optimize hash join to use smaller relation as build side Key: ARROW-10782 URL: https://issues.apache.org/jira/browse/ARROW-10782 Project: Apache Arrow Issue Type: Improvement Components: Rust - DataFusion Reporter: Andy Grove When performing an inner join using the hash join algorithm, it is more efficient to load the smaller table into memory and then stream the larger table. We should the statistics made available in https://issues.apache.org/jira/browse/ARROW-10781 to build an optimizer rule to determine the smaller side of a join and use that as the build/hash side. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10781) [Rust] [DataFusion] TableProvider should provide row count statistics
Andy Grove created ARROW-10781: -- Summary: [Rust] [DataFusion] TableProvider should provide row count statistics Key: ARROW-10781 URL: https://issues.apache.org/jira/browse/ARROW-10781 Project: Apache Arrow Issue Type: New Feature Components: Rust - DataFusion Reporter: Andy Grove In order to start building a cost-based optimizer, we need some statistics about data sources. The most basic statistic would be number of rows. I propose that we add a Statistics struct that initially just makes a total row count available but that we can later extend to support more advanced statistics. {code:java} struct Statistics { row_count: Option } {code} We can then add a method to TableProvider: {code:java} trait TableProvider { fn statistics() -> Option; } {code} Statistics should be optional because not all data sources can provide statistics. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10780) segfault in R
Matt Pollock created ARROW-10780: Summary: segfault in R Key: ARROW-10780 URL: https://issues.apache.org/jira/browse/ARROW-10780 Project: Apache Arrow Issue Type: Bug Components: R Affects Versions: 2.0.0 Reporter: Matt Pollock Hello, I installed arrow using {code:java} Sys.setenv(ARROW_R_DEV=TRUE) source("https://raw.githubusercontent.com/apache/arrow/master/r/R/install-arrow.R";) install_arrow(binary = FALSE, use_system = FALSE) {code} It shouldn't matter, but I also checked that the system parquet-devel/arrow-devel are both on version 2.0.0 (I'm on CentOS 7). I'm using R 4.0.3. However using any arrow function causes a segfault, for example: {code:java} > library(arrow) Attaching package: ‘arrow’ The following object is masked from ‘package:utils’: timestamp > arrow_available() [1] TRUE > write_parquet(iris, "~/iris4") *** caught segfault *** address (nil), cause 'memory not mapped' Traceback: 1: Table__from_dots(dots, schema) 2: shared_ptr_is_null(xp) 3: shared_ptr(Table, Table__from_dots(dots, schema)) 4: Table$create(x) 5: write_parquet(iris, "~/iris4") {code} I have tried various installation methods (source/binary, system packages or not, even the nightly build). The only thing I've gotten to work is to revert to the 1.0.1 version of arrow. Any advice is appreciated. Thanks in advance. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10779) [Java] writeNull method in UnionListWriter doesn't work correctly if validity at that index is already set
Projjal Chanda created ARROW-10779: -- Summary: [Java] writeNull method in UnionListWriter doesn't work correctly if validity at that index is already set Key: ARROW-10779 URL: https://issues.apache.org/jira/browse/ARROW-10779 Project: Apache Arrow Issue Type: Bug Reporter: Projjal Chanda -- This message was sent by Atlassian Jira (v8.3.4#803005)