[jira] [Created] (ARROW-11285) [Release][APT] Add support for Ubuntu Groovy
Kouhei Sutou created ARROW-11285: Summary: [Release][APT] Add support for Ubuntu Groovy Key: ARROW-11285 URL: https://issues.apache.org/jira/browse/ARROW-11285 Project: Apache Arrow Issue Type: Improvement Components: Packaging Reporter: Kouhei Sutou Assignee: Kouhei Sutou -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11284) [R] Support dplyr verb transmute()
Ian Cook created ARROW-11284: Summary: [R] Support dplyr verb transmute() Key: ARROW-11284 URL: https://issues.apache.org/jira/browse/ARROW-11284 Project: Apache Arrow Issue Type: Improvement Components: R Reporter: Ian Cook Assignee: Ian Cook Add support for the dplyr verb {{transmute()}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11283) [Julia] Fix install link
Jacob Quinn created ARROW-11283: --- Summary: [Julia] Fix install link Key: ARROW-11283 URL: https://issues.apache.org/jira/browse/ARROW-11283 Project: Apache Arrow Issue Type: Bug Affects Versions: 3.0.0 Reporter: Jacob Quinn Assignee: Jacob Quinn Fix For: 3.0.0 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11282) [Packaging][deb] Add missing libgflags-dev dependency
Kouhei Sutou created ARROW-11282: Summary: [Packaging][deb] Add missing libgflags-dev dependency Key: ARROW-11282 URL: https://issues.apache.org/jira/browse/ARROW-11282 Project: Apache Arrow Issue Type: Improvement Components: Packaging Reporter: Kouhei Sutou Assignee: Kouhei Sutou -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11281) [C++] Remove needless runtime RapidJSON dependency
Kouhei Sutou created ARROW-11281: Summary: [C++] Remove needless runtime RapidJSON dependency Key: ARROW-11281 URL: https://issues.apache.org/jira/browse/ARROW-11281 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Kouhei Sutou Assignee: Kouhei Sutou -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11280) [Release][APT] Add a workaround for C++ and packaging bugs
Kouhei Sutou created ARROW-11280: Summary: [Release][APT] Add a workaround for C++ and packaging bugs Key: ARROW-11280 URL: https://issues.apache.org/jira/browse/ARROW-11280 Project: Apache Arrow Issue Type: Improvement Components: Developer Tools Reporter: Kouhei Sutou Assignee: Kouhei Sutou -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11279) [Rust][Parquet]
R J created ARROW-11279: --- Summary: [Rust][Parquet] Key: ARROW-11279 URL: https://issues.apache.org/jira/browse/ARROW-11279 Project: Apache Arrow Issue Type: Improvement Components: Rust Reporter: R J In the rust implementation of an Arrow RecordBatch writer to parquet (3.0.0-SNAPSHOT), the ArrowWriter::write call potentially allocates more memory than required. For a RecordBatch with m rows and n columns, ArrowWriter::write allocates m*n definition levels, leading to m times the required memory usage. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11278) [Release][NodeJS] Don't touch ~/.bash_profile
Kouhei Sutou created ARROW-11278: Summary: [Release][NodeJS] Don't touch ~/.bash_profile Key: ARROW-11278 URL: https://issues.apache.org/jira/browse/ARROW-11278 Project: Apache Arrow Issue Type: Improvement Components: Developer Tools Reporter: Kouhei Sutou Assignee: Kouhei Sutou -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11277) [C++] Fix compilation error in dataset expressions on macOS 10.11
Neal Richardson created ARROW-11277: --- Summary: [C++] Fix compilation error in dataset expressions on macOS 10.11 Key: ARROW-11277 URL: https://issues.apache.org/jira/browse/ARROW-11277 Project: Apache Arrow Issue Type: Bug Components: C++ Reporter: Neal Richardson Assignee: Ben Kietzman See https://github.com/autobrew/homebrew-core/pull/61#issuecomment-761605455 R binary packages for macOS are built with an old SDK, so this is needed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11276) [Rust] [DataFusion] Make MemoryStream public
Andy Grove created ARROW-11276: -- Summary: [Rust] [DataFusion] Make MemoryStream public Key: ARROW-11276 URL: https://issues.apache.org/jira/browse/ARROW-11276 Project: Apache Arrow Issue Type: Improvement Components: Rust - DataFusion Reporter: Andy Grove I found the need to take a copy of MemoryStream for use in another project. It would be nice if we could expose this as a supported public API so that other projects building physical operators can re-use it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11275) [Packaging][wheel][Linux] Fix paths for Gemfury
Kouhei Sutou created ARROW-11275: Summary: [Packaging][wheel][Linux] Fix paths for Gemfury Key: ARROW-11275 URL: https://issues.apache.org/jira/browse/ARROW-11275 Project: Apache Arrow Issue Type: Bug Reporter: Kouhei Sutou Assignee: Kouhei Sutou -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11274) [Packaging][wheel][Windows] Fix wheels path for Gemfury
Kouhei Sutou created ARROW-11274: Summary: [Packaging][wheel][Windows] Fix wheels path for Gemfury Key: ARROW-11274 URL: https://issues.apache.org/jira/browse/ARROW-11274 Project: Apache Arrow Issue Type: Bug Components: Packaging Reporter: Kouhei Sutou Assignee: Kouhei Sutou -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11273) [Release][deb] Remove unsupported Debian GNU/Linux stretch
Kouhei Sutou created ARROW-11273: Summary: [Release][deb] Remove unsupported Debian GNU/Linux stretch Key: ARROW-11273 URL: https://issues.apache.org/jira/browse/ARROW-11273 Project: Apache Arrow Issue Type: Improvement Components: Developer Tools Reporter: Kouhei Sutou Assignee: Kouhei Sutou -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11272) [Release][wheel] Remove unsupported Python 3.5 and manylinux1
Kouhei Sutou created ARROW-11272: Summary: [Release][wheel] Remove unsupported Python 3.5 and manylinux1 Key: ARROW-11272 URL: https://issues.apache.org/jira/browse/ARROW-11272 Project: Apache Arrow Issue Type: Improvement Components: Developer Tools Reporter: Kouhei Sutou Assignee: Kouhei Sutou -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11271) [Rust] [Parquet] List schema to Arrow parser misinterpreting child nullability
Neville Dipale created ARROW-11271: -- Summary: [Rust] [Parquet] List schema to Arrow parser misinterpreting child nullability Key: ARROW-11271 URL: https://issues.apache.org/jira/browse/ARROW-11271 Project: Apache Arrow Issue Type: Bug Components: Rust Affects Versions: 2.0.0 Reporter: Neville Dipale Assignee: Neville Dipale We currently do not propagate child nullability correctly when reading parquet files from Spark 3.0.1 (parquet-mr 1.10.1). For example, the below taken from [https://github.com/apache/parquet-format/blob/master/LogicalTypes.md] is currently interpreted incorrectly: {code:java} // List (list nullable, elements non-null) optional group my_list (LIST) { repeated group list { required binary element (UTF8); } }{code} The Arrow type should be: {code:java} Field::new( "my_list", DataType::List( box Field::new("element", DataType::Utf8, nullable: false), ), nullable: true ){code} but we currently end up with {code:java} Field::new( "my_list", DataType::List( box Field::new("list", DataType::Utf8, nullable: true), ), nullable: true ) {code} This doesn't seem to be an issue with the master branch as of opening this issue, so it might not be severe enough to try force into the 3.0.0 release. I tested null and non-null Spark files, and was able to read them correctly. This becomes an issue with nested lists, which I'm working on. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11270) [Rust] Use slices for simple array data buffer access
Tyrel Rink created ARROW-11270: -- Summary: [Rust] Use slices for simple array data buffer access Key: ARROW-11270 URL: https://issues.apache.org/jira/browse/ARROW-11270 Project: Apache Arrow Issue Type: Improvement Components: Rust Reporter: Tyrel Rink Assignee: Tyrel Rink Using an approach similar to ARROW-10989, migrate typed array API's to use slices where they can. This impacts the API of: * GenericBinaryArray<> * GenericListArray<> * GenericStringArray<> This also does bounds checking to the value() function on each of the above arrays (as well as PrimitiveArray<> ). The new PrimitiveArray bounds checks changes have a negative performance impact on various benchmarks that still use the .Value(...) function on PrimitiveArray. But that should be resolvable by using the PrimitiveArray.values() instead (whether within this PR or a future PR). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11269) [Rust] Unable to read Parquet file because of mismatch
Max Burke created ARROW-11269: - Summary: [Rust] Unable to read Parquet file because of mismatch Key: ARROW-11269 URL: https://issues.apache.org/jira/browse/ARROW-11269 Project: Apache Arrow Issue Type: Bug Components: Rust Affects Versions: 3.0.0 Reporter: Max Burke Attachments: 0100c937-7c1c-78c4-1f4b-156ef04e79f0.parquet The issue seems to stem from the new(-ish) behavior of the Arrow Parquet reader where the embedded arrow schema is used instead of deriving the schema from the Parquet columns. However it seems like some cases still derive the schema type from the column types, leading to the Arrow record batch reader erroring out that the column types must match the schema types. In our case, the column type is an int96 datetime (ns) type, and the Arrow type in the embedded schema is DataType::Timestamp(TimeUnit::Nanoseconds, Some("UTC")). However, the code that constructs the Arrays seems to re-derive this column type as DataType::Timestamp(TimeUnit::Nanoseconds, None) (because the Parquet schema has no timezone information). And so, Parquet files that we were able to read successfully with our branch of Arrow circa October are now unreadable. I've attached an example of a Parquet file that demonstrates the problem. This file was created in Python (as most of our Parquet files are). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11268) [Rust][DataFusion] Support specifying repartitions in mem table
Daniël Heres created ARROW-11268: Summary: [Rust][DataFusion] Support specifying repartitions in mem table Key: ARROW-11268 URL: https://issues.apache.org/jira/browse/ARROW-11268 Project: Apache Arrow Issue Type: Improvement Components: Rust - DataFusion Reporter: Daniël Heres Assignee: Daniël Heres -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11267) [Rust]: Comparison of list arrays with differing offsets fails
Jörn Horstmann created ARROW-11267: -- Summary: [Rust]: Comparison of list arrays with differing offsets fails Key: ARROW-11267 URL: https://issues.apache.org/jira/browse/ARROW-11267 Project: Apache Arrow Issue Type: Bug Components: Rust Reporter: Jörn Horstmann Found this while reviewing the fix for ARROW-11239. The reason for the failure seems to be related to the combining of null bitmaps of parent/child data. When I changed `create_list_array` to not include null buffers the test passes. {code:java} #[test] fn test_list_different_offsets() { let a = create_list_array(&[Some(&[0, 0]), Some(&[1, 2]), Some(&[3, 4])]); let b = create_list_array(&[Some(&[1, 2]), Some(&[3, 4]), Some(&[5, 6])]); let a_slice = a.slice(1, 2); let b_slice = b.slice(0, 2); test_equal(_slice, _slice, true); } {code} [~jorgecarleitao] [~nevi_me] FYI -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11266) [Rust][DataFusion] Implement vectorized hashing for hash aggregate
Daniël Heres created ARROW-11266: Summary: [Rust][DataFusion] Implement vectorized hashing for hash aggregate Key: ARROW-11266 URL: https://issues.apache.org/jira/browse/ARROW-11266 Project: Apache Arrow Issue Type: Improvement Components: Rust - DataFusion Reporter: Daniël Heres Assignee: Daniël Heres -- This message was sent by Atlassian Jira (v8.3.4#803005)