[jira] [Created] (ARROW-11123) [Rust] Use cast kernel to simplify csv parser

2021-01-03 Thread Jira
Daniël Heres created ARROW-11123:


 Summary: [Rust] Use cast kernel to simplify csv parser
 Key: ARROW-11123
 URL: https://issues.apache.org/jira/browse/ARROW-11123
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Daniël Heres
Assignee: Daniël Heres


We can use the cast kernel to simplify the parser implementation



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11122) [Rust] Add FFI for date and time

2021-01-03 Thread Jira
Jorge Leitão created ARROW-11122:


 Summary: [Rust] Add FFI for date and time
 Key: ARROW-11122
 URL: https://issues.apache.org/jira/browse/ARROW-11122
 Project: Apache Arrow
  Issue Type: New Feature
Reporter: Jorge Leitão
Assignee: Jorge Leitão






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11121) [Developer] Use pull_request_target for PR JIRA integration

2021-01-03 Thread Kouhei Sutou (Jira)
Kouhei Sutou created ARROW-11121:


 Summary: [Developer] Use pull_request_target for PR JIRA 
integration
 Key: ARROW-11121
 URL: https://issues.apache.org/jira/browse/ARROW-11121
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Developer Tools
Reporter: Kouhei Sutou






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11120) [Python][R] Prove out plumbing to pass data between Python and R using rpy2

2021-01-03 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-11120:


 Summary: [Python][R] Prove out plumbing to pass data between 
Python and R using rpy2
 Key: ARROW-11120
 URL: https://issues.apache.org/jira/browse/ARROW-11120
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python, R
Reporter: Wes McKinney


Per discussion on the mailing list, we should see what is required (if 
anything) to be able to pass data structures using the C interface between 
Python and R from the perspective of the Python user using rpy2. rpy2 is sort 
of the Python version of reticulate. Unit tests will then validate that it's 
working



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11119) [Rust] Expose functions to read a CSV StringRecord into a RecordBatch.

2021-01-03 Thread Jira
Jorge Leitão created ARROW-9:


 Summary: [Rust] Expose functions to read a CSV StringRecord into a 
RecordBatch.
 Key: ARROW-9
 URL: https://issues.apache.org/jira/browse/ARROW-9
 Project: Apache Arrow
  Issue Type: New Feature
Reporter: Jorge Leitão
Assignee: Jorge Leitão






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11118) [C++] Add union support in ORC reader & writer

2021-01-03 Thread Ying Zhou (Jira)
Ying Zhou created ARROW-8:
-

 Summary: [C++] Add union support in ORC reader & writer
 Key: ARROW-8
 URL: https://issues.apache.org/jira/browse/ARROW-8
 Project: Apache Arrow
  Issue Type: New Feature
Reporter: Ying Zhou


Currently the ORC reader does not has support for the ORC UNION type which led 
to the ORC writer to be under-tested for ORC DENSE_UNION and SPARSE_UNION 
types. To fix this problem union support in the ORC reader needs to be added, 
union support in the ORC writer needs to be added and tested.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11117) [C++] ORC Reader uses wrong types

2021-01-03 Thread Ying Zhou (Jira)
Ying Zhou created ARROW-7:
-

 Summary: [C++] ORC Reader uses wrong types
 Key: ARROW-7
 URL: https://issues.apache.org/jira/browse/ARROW-7
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Reporter: Ying Zhou


The Arrow C++ ORC reader does not process types correctly. In particular it 
does the following:
1. It converts the ORC STRING type to the Arrow STRING type despite the fact 
that all ORC STRINGs are large.

2. It converts the ORC LIST type to the Arrow LIST type despite the fact that 
all ORC LISTs are large.

3. It converts the ORC MAP type to LISTS of STRUCTS with hardcoded field names 
while an actual MAP type exists in Arrow (note that the ORC MAPs are large so 
we need to filter out large ones when converting). 

These issues need to be fixed.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11116) [Rust][DataFusion] More efficient LEFT join implementation

2021-01-03 Thread Jira
Daniël Heres created ARROW-6:


 Summary: [Rust][DataFusion] More efficient LEFT join implementation
 Key: ARROW-6
 URL: https://issues.apache.org/jira/browse/ARROW-6
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust - DataFusion
Reporter: Daniël Heres


Currently, the left join implementation keeps a HashSet> to mark each 
key as visited.

However, a more efficient choice would be to keep a bitmap or a boolean marker 
for each key or index and mark the row as visited, avoiding unnecessary 
hashing, copying / memory usage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11115) Implement dot-product in a compute kernel

2021-01-03 Thread Erik De Smedt (Jira)
Erik De Smedt created ARROW-5:
-

 Summary: Implement dot-product in a compute kernel
 Key: ARROW-5
 URL: https://issues.apache.org/jira/browse/ARROW-5
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Rust
Reporter: Erik De Smedt


An efficient implementation of the 
[dot-product|https://en.wikipedia.org/wiki/Dot_product] is useful for many 
machine-learning applications.

I would propose to treat null as zero for the implementation. This behavior is 
sensible because it corresponds to dropping any observation where null-data is 
present. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11114) [Java] Metadata serialization is broken for Field class

2021-01-03 Thread Nick B (Jira)
Nick B created ARROW-4:
--

 Summary: [Java] Metadata serialization is broken for Field class
 Key: ARROW-4
 URL: https://issues.apache.org/jira/browse/ARROW-4
 Project: Apache Arrow
  Issue Type: Bug
  Components: Java
Affects Versions: 2.0.0
Reporter: Nick B


org.apache.arrow.vector.types.pojo.Field uses asymmetric serialization and 
deserialization for the metadata field, causing it to fail to deserialize with 
the following error:
{noformat}
Exception in thread "main" 
com.fasterxml.jackson.databind.exc.MismatchedInputException: Cannot deserialize 
instance of `java.util.ArrayList` out of START_OBJECT tokenException in thread 
"main" com.fasterxml.jackson.databind.exc.MismatchedInputException: Cannot 
deserialize instance of `java.util.ArrayList` out of START_OBJECT token at 
[Source: (File); line: 10, column: 20] (through reference chain: 
org.apache.arrow.vector.types.pojo.Schema["fields"]->java.util.ArrayList[0]->org.apache.arrow.vector.types.pojo.Field["metadata"])
 {noformat}
This is because the class [serializes 
metadata|https://github.com/apache/arrow/blob/dfef236f7587e4168ac1e07bd09e42d9373beb70/java/vector/src/main/java/org/apache/arrow/vector/types/pojo/Field.java#L274]
 as {{Map}} but [expects to deserialize 
it|https://github.com/apache/arrow/blob/dfef236f7587e4168ac1e07bd09e42d9373beb70/java/vector/src/main/java/org/apache/arrow/vector/types/pojo/Field.java#L87]
 as {{List>}}. 

MCVE: [https://gist.github.com/nbruno/983cb7faf41dc20a0810ae80fe33562d]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)