[jira] [Created] (ARROW-11123) [Rust] Use cast kernel to simplify csv parser
Daniël Heres created ARROW-11123: Summary: [Rust] Use cast kernel to simplify csv parser Key: ARROW-11123 URL: https://issues.apache.org/jira/browse/ARROW-11123 Project: Apache Arrow Issue Type: Improvement Reporter: Daniël Heres Assignee: Daniël Heres We can use the cast kernel to simplify the parser implementation -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11122) [Rust] Add FFI for date and time
Jorge Leitão created ARROW-11122: Summary: [Rust] Add FFI for date and time Key: ARROW-11122 URL: https://issues.apache.org/jira/browse/ARROW-11122 Project: Apache Arrow Issue Type: New Feature Reporter: Jorge Leitão Assignee: Jorge Leitão -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11121) [Developer] Use pull_request_target for PR JIRA integration
Kouhei Sutou created ARROW-11121: Summary: [Developer] Use pull_request_target for PR JIRA integration Key: ARROW-11121 URL: https://issues.apache.org/jira/browse/ARROW-11121 Project: Apache Arrow Issue Type: Improvement Components: Developer Tools Reporter: Kouhei Sutou -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11120) [Python][R] Prove out plumbing to pass data between Python and R using rpy2
Wes McKinney created ARROW-11120: Summary: [Python][R] Prove out plumbing to pass data between Python and R using rpy2 Key: ARROW-11120 URL: https://issues.apache.org/jira/browse/ARROW-11120 Project: Apache Arrow Issue Type: Improvement Components: Python, R Reporter: Wes McKinney Per discussion on the mailing list, we should see what is required (if anything) to be able to pass data structures using the C interface between Python and R from the perspective of the Python user using rpy2. rpy2 is sort of the Python version of reticulate. Unit tests will then validate that it's working -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11119) [Rust] Expose functions to read a CSV StringRecord into a RecordBatch.
Jorge Leitão created ARROW-9: Summary: [Rust] Expose functions to read a CSV StringRecord into a RecordBatch. Key: ARROW-9 URL: https://issues.apache.org/jira/browse/ARROW-9 Project: Apache Arrow Issue Type: New Feature Reporter: Jorge Leitão Assignee: Jorge Leitão -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11118) [C++] Add union support in ORC reader & writer
Ying Zhou created ARROW-8: - Summary: [C++] Add union support in ORC reader & writer Key: ARROW-8 URL: https://issues.apache.org/jira/browse/ARROW-8 Project: Apache Arrow Issue Type: New Feature Reporter: Ying Zhou Currently the ORC reader does not has support for the ORC UNION type which led to the ORC writer to be under-tested for ORC DENSE_UNION and SPARSE_UNION types. To fix this problem union support in the ORC reader needs to be added, union support in the ORC writer needs to be added and tested. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11117) [C++] ORC Reader uses wrong types
Ying Zhou created ARROW-7: - Summary: [C++] ORC Reader uses wrong types Key: ARROW-7 URL: https://issues.apache.org/jira/browse/ARROW-7 Project: Apache Arrow Issue Type: Bug Components: C++ Reporter: Ying Zhou The Arrow C++ ORC reader does not process types correctly. In particular it does the following: 1. It converts the ORC STRING type to the Arrow STRING type despite the fact that all ORC STRINGs are large. 2. It converts the ORC LIST type to the Arrow LIST type despite the fact that all ORC LISTs are large. 3. It converts the ORC MAP type to LISTS of STRUCTS with hardcoded field names while an actual MAP type exists in Arrow (note that the ORC MAPs are large so we need to filter out large ones when converting). These issues need to be fixed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11116) [Rust][DataFusion] More efficient LEFT join implementation
Daniël Heres created ARROW-6: Summary: [Rust][DataFusion] More efficient LEFT join implementation Key: ARROW-6 URL: https://issues.apache.org/jira/browse/ARROW-6 Project: Apache Arrow Issue Type: Improvement Components: Rust - DataFusion Reporter: Daniël Heres Currently, the left join implementation keeps a HashSet> to mark each key as visited. However, a more efficient choice would be to keep a bitmap or a boolean marker for each key or index and mark the row as visited, avoiding unnecessary hashing, copying / memory usage. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11115) Implement dot-product in a compute kernel
Erik De Smedt created ARROW-5: - Summary: Implement dot-product in a compute kernel Key: ARROW-5 URL: https://issues.apache.org/jira/browse/ARROW-5 Project: Apache Arrow Issue Type: New Feature Components: Rust Reporter: Erik De Smedt An efficient implementation of the [dot-product|https://en.wikipedia.org/wiki/Dot_product] is useful for many machine-learning applications. I would propose to treat null as zero for the implementation. This behavior is sensible because it corresponds to dropping any observation where null-data is present. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11114) [Java] Metadata serialization is broken for Field class
Nick B created ARROW-4: -- Summary: [Java] Metadata serialization is broken for Field class Key: ARROW-4 URL: https://issues.apache.org/jira/browse/ARROW-4 Project: Apache Arrow Issue Type: Bug Components: Java Affects Versions: 2.0.0 Reporter: Nick B org.apache.arrow.vector.types.pojo.Field uses asymmetric serialization and deserialization for the metadata field, causing it to fail to deserialize with the following error: {noformat} Exception in thread "main" com.fasterxml.jackson.databind.exc.MismatchedInputException: Cannot deserialize instance of `java.util.ArrayList` out of START_OBJECT tokenException in thread "main" com.fasterxml.jackson.databind.exc.MismatchedInputException: Cannot deserialize instance of `java.util.ArrayList` out of START_OBJECT token at [Source: (File); line: 10, column: 20] (through reference chain: org.apache.arrow.vector.types.pojo.Schema["fields"]->java.util.ArrayList[0]->org.apache.arrow.vector.types.pojo.Field["metadata"]) {noformat} This is because the class [serializes metadata|https://github.com/apache/arrow/blob/dfef236f7587e4168ac1e07bd09e42d9373beb70/java/vector/src/main/java/org/apache/arrow/vector/types/pojo/Field.java#L274] as {{Map}} but [expects to deserialize it|https://github.com/apache/arrow/blob/dfef236f7587e4168ac1e07bd09e42d9373beb70/java/vector/src/main/java/org/apache/arrow/vector/types/pojo/Field.java#L87] as {{List>}}. MCVE: [https://gist.github.com/nbruno/983cb7faf41dc20a0810ae80fe33562d] -- This message was sent by Atlassian Jira (v8.3.4#803005)