[jira] [Created] (ARROW-9095) [Rust] Fix NullArray to comply with spec

2020-06-10 Thread Neville Dipale (Jira)
Neville Dipale created ARROW-9095:
-

 Summary: [Rust] Fix NullArray to comply with spec
 Key: ARROW-9095
 URL: https://issues.apache.org/jira/browse/ARROW-9095
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Rust
Affects Versions: 0.17.0
Reporter: Neville Dipale


When I implemented the NullArray, I didn't comply with the spec under the 
premise that I'd handle reading and writing IPC in a spec-compliant way as that 
looked like the easier approach.

After some integration testing, I realised that I wasn't doing it correctly, so 
it's better to comply with the spec by not allocating any buffers for the array.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9053) [Rust] Add sort for lists and structs

2020-06-06 Thread Neville Dipale (Jira)
Neville Dipale created ARROW-9053:
-

 Summary: [Rust] Add sort for lists and structs
 Key: ARROW-9053
 URL: https://issues.apache.org/jira/browse/ARROW-9053
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Rust
Reporter: Neville Dipale






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9007) [Rust] Support appending arrays by merging array data

2020-06-02 Thread Neville Dipale (Jira)
Neville Dipale created ARROW-9007:
-

 Summary: [Rust] Support appending arrays by merging array data
 Key: ARROW-9007
 URL: https://issues.apache.org/jira/browse/ARROW-9007
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Rust
Affects Versions: 0.17.0
Reporter: Neville Dipale


ARROW-9005 introduces a concat kernel which allows for concatenating multiple 
arrays of the same type into a single array. This is useful for sorting on 
multiple arrays, among other things.

The concat kernel is implemented for most array types, but not yet for nested 
arrays (lists, structs, etc).

This Jira is for creating a way of appending/merging all array types, so that 
concat (and functionality that depends on it) can support all array types.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8883) [Rust] [Integration Testing] Disable unsupported tests

2020-05-21 Thread Neville Dipale (Jira)
Neville Dipale created ARROW-8883:
-

 Summary: [Rust] [Integration Testing] Disable unsupported tests
 Key: ARROW-8883
 URL: https://issues.apache.org/jira/browse/ARROW-8883
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Integration, Rust
Affects Versions: 0.17.0
Reporter: Neville Dipale


Some of the integration test failures can be avoided by disabling unsupported 
tests, like large lists and nested types



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8881) [Rust] Add large list and binary support

2020-05-21 Thread Neville Dipale (Jira)
Neville Dipale created ARROW-8881:
-

 Summary: [Rust] Add large list and binary support
 Key: ARROW-8881
 URL: https://issues.apache.org/jira/browse/ARROW-8881
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Rust
Affects Versions: 0.17.0
Reporter: Neville Dipale


Rust does not yet support large lists and large binary arrays. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8308) [Rust] [Flight] Implement DoExchange on examples

2020-04-01 Thread Neville Dipale (Jira)
Neville Dipale created ARROW-8308:
-

 Summary: [Rust] [Flight] Implement DoExchange on examples
 Key: ARROW-8308
 URL: https://issues.apache.org/jira/browse/ARROW-8308
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Neville Dipale


The gRPC server examples in Rust require all trait members to be exhaustively 
implemented. The recent `DoExchange` endpoint to the Flight service is causing 
failures in Rust.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7924) [Rust] Add sort for float types

2020-02-23 Thread Neville Dipale (Jira)
Neville Dipale created ARROW-7924:
-

 Summary: [Rust] Add sort for float types
 Key: ARROW-7924
 URL: https://issues.apache.org/jira/browse/ARROW-7924
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Rust
Reporter: Neville Dipale


Floats need a different sort approach than other primitives, and this ticket 
will implement them separately



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7705) [Rust] Initial sort implementation

2020-01-28 Thread Neville Dipale (Jira)
Neville Dipale created ARROW-7705:
-

 Summary: [Rust] Initial sort implementation
 Key: ARROW-7705
 URL: https://issues.apache.org/jira/browse/ARROW-7705
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Rust
Reporter: Neville Dipale


An initial sort implementation that allows sorting an array by various options 
(e.g. sort order). This is mainly to iterate on the design and inner workings 
of a sort algorithm.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7704) [Rust] Support sort

2020-01-28 Thread Neville Dipale (Jira)
Neville Dipale created ARROW-7704:
-

 Summary: [Rust] Support sort
 Key: ARROW-7704
 URL: https://issues.apache.org/jira/browse/ARROW-7704
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Rust
Reporter: Neville Dipale


This lays out the work needed to support sorting arrays and record batches



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7620) [Rust] Windows builds failing due to flatbuffer compile error

2020-01-20 Thread Neville Dipale (Jira)
Neville Dipale created ARROW-7620:
-

 Summary: [Rust] Windows builds failing due to flatbuffer compile 
error
 Key: ARROW-7620
 URL: https://issues.apache.org/jira/browse/ARROW-7620
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust
Reporter: Neville Dipale


I've noticed now on a few PRs whose tests should otherwise pass, that the Rust 
Windows tests are failing due to `*_generated.rs` not being found while trying 
to rename the generated flatbuffer files.

An example is at 
[https://github.com/apache/arrow/pull/6227/checks?check_run_id=397505832]

 

    + flatc --rust -o arrow/src/ipc/gen/ ../format/File.fbs 
../format/Message.fbs ../format/Schema.fbs ../format/SparseTensor.fbs 
../format/Tensor.fbs

    + find arrow/src/ipc/gen/ -name '*_generated.rs' -exec sed -i 
s/type__type/type_type/g '{}' ';'

    File not found - *_generated.rs



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7521) [Rust] Remove tuple on FixedSizeList datatype

2020-01-08 Thread Neville Dipale (Jira)
Neville Dipale created ARROW-7521:
-

 Summary: [Rust] Remove tuple on FixedSizeList datatype
 Key: ARROW-7521
 URL: https://issues.apache.org/jira/browse/ARROW-7521
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Neville Dipale


The FixedSizeList datatype takes a tuple of Box and length, but this 
could be simplified to take the two values without a tuple.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7475) [Rust] Create Arrow Stream writer

2019-12-29 Thread Neville Dipale (Jira)
Neville Dipale created ARROW-7475:
-

 Summary: [Rust] Create Arrow Stream writer
 Key: ARROW-7475
 URL: https://issues.apache.org/jira/browse/ARROW-7475
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Rust
Reporter: Neville Dipale






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7460) [Rust] Improve arithmetic kernels with autovec

2019-12-22 Thread Neville Dipale (Jira)
Neville Dipale created ARROW-7460:
-

 Summary: [Rust] Improve arithmetic kernels with autovec
 Key: ARROW-7460
 URL: https://issues.apache.org/jira/browse/ARROW-7460
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Affects Versions: 0.15.1
Reporter: Neville Dipale


In a comment to an open ticket for optimising a cast kernel by using SIMD, 
[~andy-thomason] mentioned that LLVM does autovec well for Rust.

I'd like to explore whether we could improve the kernel performance by 
simplifying the loops enough to allow the compiler to vectorise.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7364) [Rust] Add cast options to cast kernel

2019-12-10 Thread Neville Dipale (Jira)
Neville Dipale created ARROW-7364:
-

 Summary: [Rust] Add cast options to cast kernel
 Key: ARROW-7364
 URL: https://issues.apache.org/jira/browse/ARROW-7364
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Neville Dipale


The cast kernels currently do not take explicit options, but instead convert 
overflows and invalid uft8 to nulls. We can create options that customise the 
behaviour, similarly to CastOptions in CPP 
([https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/kernels/cast.h#L38])



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7324) [Rust] Add Timezone to Timestamp

2019-12-04 Thread Neville Dipale (Jira)
Neville Dipale created ARROW-7324:
-

 Summary: [Rust] Add Timezone to Timestamp
 Key: ARROW-7324
 URL: https://issues.apache.org/jira/browse/ARROW-7324
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Rust
Reporter: Neville Dipale


Proposal to add timestamp to timezone type



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7207) [Rust] Update Generated Flatbuffer Files

2019-11-19 Thread Neville Dipale (Jira)
Neville Dipale created ARROW-7207:
-

 Summary: [Rust] Update Generated Flatbuffer Files
 Key: ARROW-7207
 URL: https://issues.apache.org/jira/browse/ARROW-7207
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Rust
Reporter: Neville Dipale


We last built the fbs files early in the year, and since then there have been 
some changes like LargeLists. We should update the generated Rust files to 
incorporate these changes



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7194) [Rust] CSV Writer causing recursion errors

2019-11-16 Thread Neville Dipale (Jira)
Neville Dipale created ARROW-7194:
-

 Summary: [Rust] CSV Writer causing recursion errors
 Key: ARROW-7194
 URL: https://issues.apache.org/jira/browse/ARROW-7194
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust
Reporter: Neville Dipale


As reported in [https://github.com/apache/arrow/pull/5805], the CSV writer's 
use of std::io::Write is causing recursion issues.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6944) [Rust] Add StringType

2019-10-19 Thread Neville Dipale (Jira)
Neville Dipale created ARROW-6944:
-

 Summary: [Rust] Add StringType
 Key: ARROW-6944
 URL: https://issues.apache.org/jira/browse/ARROW-6944
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Rust
Reporter: Neville Dipale


Create a separate String type which uses UTF8, and restrict the BinaryArray to 
opaque binary data



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6928) [Rust] Add FixedSizeList type

2019-10-17 Thread Neville Dipale (Jira)
Neville Dipale created ARROW-6928:
-

 Summary: [Rust] Add FixedSizeList type
 Key: ARROW-6928
 URL: https://issues.apache.org/jira/browse/ARROW-6928
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Rust
Reporter: Neville Dipale


Support FixedSizeList, which is required for integration testing



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6650) [Rust] [Integration] Add method to generate JSON from RecordBatch

2019-09-20 Thread Neville Dipale (Jira)
Neville Dipale created ARROW-6650:
-

 Summary: [Rust] [Integration] Add method to generate JSON from 
RecordBatch
 Key: ARROW-6650
 URL: https://issues.apache.org/jira/browse/ARROW-6650
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Integration, Rust
Affects Versions: 0.14.1
Reporter: Neville Dipale


[~emkornfi...@gmail.com] recommended that we use the integration IPC files. To 
be able to compare against the JSON files that are used, we need to be able to 
generate a JSON represention of Arrow data in Rust.

We can already do this for schemas, and this ticket is for supporting 
converting RecordBatch to JSON.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-5408) [Rust] Create struct array builder that creates null buffers

2019-05-23 Thread Neville Dipale (JIRA)
Neville Dipale created ARROW-5408:
-

 Summary: [Rust] Create struct array builder that creates null 
buffers
 Key: ARROW-5408
 URL: https://issues.apache.org/jira/browse/ARROW-5408
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Neville Dipale


We currently have a way of creating a struct array from a list of (field, 
array) tuples. This does not create null buffers for the struct (because no 
index is null). While this works fine for Rust, it often leads to incompatible 
data with IPC data and kernel function outputs.

Having a function that caters for nulls, or expanding the current one, would 
alleviate this issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5400) [Rust] Test/ensure that reader and writer support zero-length record batches

2019-05-23 Thread Neville Dipale (JIRA)
Neville Dipale created ARROW-5400:
-

 Summary: [Rust] Test/ensure that reader and writer support 
zero-length record batches
 Key: ARROW-5400
 URL: https://issues.apache.org/jira/browse/ARROW-5400
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Rust
Reporter: Neville Dipale






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5399) [Rust] [Testing] Add IPC test files to arrow-testing

2019-05-23 Thread Neville Dipale (JIRA)
Neville Dipale created ARROW-5399:
-

 Summary: [Rust] [Testing] Add IPC test files to arrow-testing
 Key: ARROW-5399
 URL: https://issues.apache.org/jira/browse/ARROW-5399
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Rust
Reporter: Neville Dipale


We're generating a lot of files for testing, which should ideally live in 
arrow-testing



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5367) [Rust] Add temporal kernels

2019-05-18 Thread Neville Dipale (JIRA)
Neville Dipale created ARROW-5367:
-

 Summary: [Rust] Add temporal kernels
 Key: ARROW-5367
 URL: https://issues.apache.org/jira/browse/ARROW-5367
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Rust
Reporter: Neville Dipale


When creating temporal arrays, we added a sample function that extracts the 
hour from a temporal array. This ticket is to add support for other common 
temporal functions like minute, second, hour, and might include temporal 
arithmetic as adding dates and times, calculating durations etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5366) [Rust] Implement Duration and Interval Types

2019-05-18 Thread Neville Dipale (JIRA)
Neville Dipale created ARROW-5366:
-

 Summary: [Rust] Implement Duration and Interval Types
 Key: ARROW-5366
 URL: https://issues.apache.org/jira/browse/ARROW-5366
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Rust
Reporter: Neville Dipale


This should ideally include covering:
 * data types
 * arrays and builders
 * adding to kernels (e.g. including support in cast)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5360) [Rust] Builds are broken by rustyline on nightly 2019-05-16+

2019-05-17 Thread Neville Dipale (JIRA)
Neville Dipale created ARROW-5360:
-

 Summary: [Rust] Builds are broken by rustyline on nightly 
2019-05-16+
 Key: ARROW-5360
 URL: https://issues.apache.org/jira/browse/ARROW-5360
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust - DataFusion
Reporter: Neville Dipale


Rust builds are broken on nightly since 2019-05-16. Please see 
[https://github.com/kkawakam/rustyline/issues/217]

The issue might need to be fixed on the rustyline crate.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5352) [Rust] BinaryArray filter loses replaces nulls with empty strings

2019-05-16 Thread Neville Dipale (JIRA)
Neville Dipale created ARROW-5352:
-

 Summary: [Rust] BinaryArray filter loses replaces nulls with empty 
strings
 Key: ARROW-5352
 URL: https://issues.apache.org/jira/browse/ARROW-5352
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust
Affects Versions: 0.13.0
Reporter: Neville Dipale


The filter implementation for BinaryArray discards nullness of data. 
BinaryArrays that are null (seem to) always return an empty string slice when 
getting a value, so the way filter works might be a bug depending on what Arrow 
developers' or users' intentions are.

I think we should either preserve nulls (and their count) or document this as 
intended behaviour.

Below is a test case that reproduces the bug.
{code:java}
#[test]
fn test_filter_binary_array_with_nulls() {
let mut a: BinaryBuilder = BinaryBuilder::new(100);
a.append_null().unwrap();
a.append_string("a string").unwrap();
a.append_null().unwrap();
a.append_string("with nulls").unwrap();
let array = a.finish();
let b = BooleanArray::from(vec![true, true, true, true]);
let c = filter(, ).unwrap();
let d:  = c.as_any().downcast_ref::().unwrap();
// I didn't expect this behaviour
assert_eq!("", d.get_string(0));
// fails here
assert!(d.is_null(0));
assert_eq!(4, d.len());
// fails here
assert_eq!(2, d.null_count());
assert_eq!("a string", d.get_string(1));
// fails here
assert!(d.is_null(2));
assert_eq!("with nulls", d.get_string(3));
}
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5351) [Rust] Add support for take kernel functions

2019-05-16 Thread Neville Dipale (JIRA)
Neville Dipale created ARROW-5351:
-

 Summary: [Rust] Add support for take kernel functions
 Key: ARROW-5351
 URL: https://issues.apache.org/jira/browse/ARROW-5351
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Neville Dipale


Similar to https://issues.apache.org/jira/browse/ARROW-772, a take function 
would allow us random-access on arrays, which is useful for sorting and 
(potentially) filtering.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5350) [Rust] Support filtering on nested array types

2019-05-16 Thread Neville Dipale (JIRA)
Neville Dipale created ARROW-5350:
-

 Summary: [Rust] Support filtering on nested array types
 Key: ARROW-5350
 URL: https://issues.apache.org/jira/browse/ARROW-5350
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Neville Dipale


We currently only filter on primitive types, but not on lists and structs. Add 
the ability to filter on nested array types



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5303) [Rust] Add SIMD vectorization of numeric casts

2019-05-12 Thread Neville Dipale (JIRA)
Neville Dipale created ARROW-5303:
-

 Summary: [Rust] Add SIMD vectorization of numeric casts
 Key: ARROW-5303
 URL: https://issues.apache.org/jira/browse/ARROW-5303
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Affects Versions: 0.13.0
Reporter: Neville Dipale


To improve the performance of cast kernels, we need SIMD support in numeric 
casts.

An initial exploration shows that we can't trivially add SIMD casts between our 
Arrow T::Simd types, because `packed_simd` only supports a cast between T::Simd 
types that have the same number of lanes.

This means that adding casts from f64 to i64 (same lane length) satisfies the 
bound trait `where TO::Simd : packed_simd::FromCast`, but f64 to 
i32 (different lane length) doesn't.

We would benefit from investigating work-arounds to this limitation. Please see 
[github::nevi_me::arrow/\{branch:simd-cast}/../kernels/cast.rs|[https://github.com/nevi-me/arrow/blob/simd-cast/rust/arrow/src/compute/kernels/cast.rs#L601]]
 for an example implementation that's limited by the differences in lane length.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5191) [Rust] Expose schema in readers (CSV, JSON) without reading batches

2019-04-21 Thread Neville Dipale (JIRA)
Neville Dipale created ARROW-5191:
-

 Summary: [Rust] Expose schema in readers (CSV, JSON) without 
reading batches
 Key: ARROW-5191
 URL: https://issues.apache.org/jira/browse/ARROW-5191
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Neville Dipale


It's sometimes convenient to be able to view a datasource's schema without 
reading the first record batch. This is a proposal to create a `pub fn 
schema() -> Arc` on the various readers that we support.

I think this would also enable schema inference in datafusion



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5188) [Rust] Add temporal builders for StructArray

2019-04-19 Thread Neville Dipale (JIRA)
Neville Dipale created ARROW-5188:
-

 Summary: [Rust] Add temporal builders for StructArray
 Key: ARROW-5188
 URL: https://issues.apache.org/jira/browse/ARROW-5188
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Rust
Reporter: Neville Dipale


StructBuilder currently doesn't have builders for temporal arrays.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5187) [Rust] Ability to flatten StructArray into a RecordBatch

2019-04-19 Thread Neville Dipale (JIRA)
Neville Dipale created ARROW-5187:
-

 Summary: [Rust] Ability to flatten StructArray into a RecordBatch
 Key: ARROW-5187
 URL: https://issues.apache.org/jira/browse/ARROW-5187
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Rust
Affects Versions: 0.13.0
Reporter: Neville Dipale


Add the ability to flatten a schema into a record batch.

StructBuilder and StructArray have convenient methods to build multiple arrays. 
Being able to use these convenient methods and then convert the result to a 
record batch reduces the amount of boilerplate when creating Arrow data from 
sources like databases.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5182) [Rust] Create Arrow File writer

2019-04-17 Thread Neville Dipale (JIRA)
Neville Dipale created ARROW-5182:
-

 Summary: [Rust] Create Arrow File writer
 Key: ARROW-5182
 URL: https://issues.apache.org/jira/browse/ARROW-5182
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Rust
Reporter: Neville Dipale






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5181) [Rust] Create Arrow File reader

2019-04-17 Thread Neville Dipale (JIRA)
Neville Dipale created ARROW-5181:
-

 Summary: [Rust] Create Arrow File reader
 Key: ARROW-5181
 URL: https://issues.apache.org/jira/browse/ARROW-5181
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Rust
Reporter: Neville Dipale


Initial support for reading the Arrow File format



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5180) [Rust] IPC Support

2019-04-17 Thread Neville Dipale (JIRA)
Neville Dipale created ARROW-5180:
-

 Summary: [Rust] IPC Support
 Key: ARROW-5180
 URL: https://issues.apache.org/jira/browse/ARROW-5180
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Rust
Reporter: Neville Dipale


The overall ticket to keep track of initial IPC support



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4968) [Rust] StructArray builder and From<> methods should check that field types match schema

2019-03-19 Thread Neville Dipale (JIRA)
Neville Dipale created ARROW-4968:
-

 Summary: [Rust] StructArray builder and From<> methods should 
check that field types match schema
 Key: ARROW-4968
 URL: https://issues.apache.org/jira/browse/ARROW-4968
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust
Affects Versions: 0.13.0
Reporter: Neville Dipale


Similar to how we assert that array data types are equal to their field types, 
we should do the same for StructArray and StructBuilder where necessary



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4914) [Rust] Array slice returns incorrect bitmask

2019-03-16 Thread Neville Dipale (JIRA)
Neville Dipale created ARROW-4914:
-

 Summary: [Rust] Array slice returns incorrect bitmask
 Key: ARROW-4914
 URL: https://issues.apache.org/jira/browse/ARROW-4914
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust
Affects Versions: 0.13.0
Reporter: Neville Dipale


Slicing arrays changes the offset, length and null count of their array data, 
but the bitmask is not changed.

This results in the correct null count, but the array values might be marked 
incorrectly as valid/invalid based on the old bitmask positions before the 
offset.

To reproduce, create an array with some null values, slice the array, and then 
dbg!() it (after downcasting).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4886) [Rust] Inconsistent behaviour with casting sliced primitive array to list array

2019-03-14 Thread Neville Dipale (JIRA)
Neville Dipale created ARROW-4886:
-

 Summary: [Rust] Inconsistent behaviour with casting sliced 
primitive array to list array
 Key: ARROW-4886
 URL: https://issues.apache.org/jira/browse/ARROW-4886
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust
Affects Versions: 0.12.0
Reporter: Neville Dipale


[~csun] I was going through the C++ cast implementation to see if I've missed 
anything, and I noticed that ListCastKernel 
([https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/kernels/cast.cc#L665])
 doesn't support casting non-zero-offset arrays. So I investigated what happens 
in Rust ARROW-4865. I found an inconsistency where inheriting the incoming 
array's offset could lead us to read invalid data.

I tried fixing it, but found that a buffer that I expected to be invalid was 
being returned as valid, but returning invalid data.

I've currently disabled casting primitive to array where the offset is not 
zero, and I'd like to wait for ARROW-4853 so I can see how sliced lists behave, 
and fix this inconsistency. That might only happen in 0.14, so I'm fine with 
that.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4865) [Rust] Support casting lists and primitives to lists

2019-03-14 Thread Neville Dipale (JIRA)
Neville Dipale created ARROW-4865:
-

 Summary: [Rust] Support casting lists and primitives to lists
 Key: ARROW-4865
 URL: https://issues.apache.org/jira/browse/ARROW-4865
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Affects Versions: 0.12.0
Reporter: Neville Dipale


This adds support for casting between list arrays and from primitive arrays to 
single-value list arrays



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4854) [Rust] Use Array Slice for limit kernel

2019-03-13 Thread Neville Dipale (JIRA)
Neville Dipale created ARROW-4854:
-

 Summary: [Rust] Use Array Slice for limit kernel
 Key: ARROW-4854
 URL: https://issues.apache.org/jira/browse/ARROW-4854
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Affects Versions: 0.13.0
Reporter: Neville Dipale


We currently reconstruct an array when taking a limit from it, we can improve 
performance by using slice from ARROW-3954



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4853) [Rust] Array slice doesn't work on ListArray and StructArray

2019-03-13 Thread Neville Dipale (JIRA)
Neville Dipale created ARROW-4853:
-

 Summary: [Rust] Array slice doesn't work on ListArray and 
StructArray
 Key: ARROW-4853
 URL: https://issues.apache.org/jira/browse/ARROW-4853
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust
Reporter: Neville Dipale


-ARROW-3954- added the ability to slice arrays. It's been implemented on the 
Array trait, so callers might expect it to also work on ListArray and 
StructArray.

It looks like for ListArray, the offset buffer is sliced, but the child_data 
buffer is not modified. This leads to an assertion failure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4805) [Rust] Write temporal arrays to CSV

2019-03-07 Thread Neville Dipale (JIRA)
Neville Dipale created ARROW-4805:
-

 Summary: [Rust] Write temporal arrays to CSV
 Key: ARROW-4805
 URL: https://issues.apache.org/jira/browse/ARROW-4805
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Rust
Affects Versions: 0.12.0
Reporter: Neville Dipale


The CSV writer should start supporting writing temporal arrays back to disk.

To be consistent with norms, we should look at what other libraries do for date 
and time where the resolution is greater than seconds, and potentially deal 
with the below:
 * Is there optionality to how dates are written, or should it always be 
DD/MM/.
 * Should / or - be used?
 * Should time types be written as HH:MM:SS.ms, or 12345ms, 12345us, 12345ns?
 * Should timestamps always be written in the ISO8601 JSONlike format?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4806) [Rust] Support casting temporal arrays in cast kernels

2019-03-07 Thread Neville Dipale (JIRA)
Neville Dipale created ARROW-4806:
-

 Summary: [Rust] Support casting temporal arrays in cast kernels
 Key: ARROW-4806
 URL: https://issues.apache.org/jira/browse/ARROW-4806
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Rust
Affects Versions: 0.12.0
Reporter: Neville Dipale


[ARROW-3882] is too far in the review process to add temporal casts.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4803) [Rust] Read temporal values from JSON

2019-03-07 Thread Neville Dipale (JIRA)
Neville Dipale created ARROW-4803:
-

 Summary: [Rust] Read temporal values from JSON
 Key: ARROW-4803
 URL: https://issues.apache.org/jira/browse/ARROW-4803
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Rust
Affects Versions: 0.12.0
Reporter: Neville Dipale


Ability to parse strings that look like timestamps to timestamp type. Need to 
consider whether only timestamp type should be supported as most JSON libraries 
stick to ISO8601. It might also be inefficient to use regex for timestamps, so 
the user should provide a hint of which columns to convert to timestamps



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4804) [Rust] Read temporal values from CSV

2019-03-07 Thread Neville Dipale (JIRA)
Neville Dipale created ARROW-4804:
-

 Summary: [Rust] Read temporal values from CSV
 Key: ARROW-4804
 URL: https://issues.apache.org/jira/browse/ARROW-4804
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Rust
Affects Versions: 0.12.0
Reporter: Neville Dipale


CSV reader should support reading temporal values.

Should support timestamp, date and time, with sane defaults provided for schema 
inference.

To keep inference performant. user should provide a Vec of which 
columns to try convert to a temporal array



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4769) [Rust] Improve array limit function where max records > len

2019-03-04 Thread Neville Dipale (JIRA)
Neville Dipale created ARROW-4769:
-

 Summary: [Rust] Improve array limit function where max records > 
len
 Key: ARROW-4769
 URL: https://issues.apache.org/jira/browse/ARROW-4769
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Affects Versions: 0.12.0
Reporter: Neville Dipale


When we have an array of n records, and we want to take a limit that's higher 
or equat to n, we still iterate through the array values and create a new array.

We could improve this by returning a copy of the array as-is.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4680) [CI] [Rust] Travis CI builds fail with latest Rust 1.34.0-nightly (2019-02-25)

2019-02-25 Thread Neville Dipale (JIRA)
Neville Dipale created ARROW-4680:
-

 Summary: [CI] [Rust] Travis CI builds fail with latest Rust 
1.34.0-nightly (2019-02-25)
 Key: ARROW-4680
 URL: https://issues.apache.org/jira/browse/ARROW-4680
 Project: Apache Arrow
  Issue Type: Bug
  Components: Continuous Integration, Rust
Reporter: Neville Dipale


There's an unstable feature that's now marked for stabilisation in 1.34, and as 
a result Travis builds are failing. This is affecting all PRs that have been 
created or updated from 26 Feb 2019.

AppVeyor only emits failures.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4556) [Rust] Preserve order of JSON inferred schema

2019-02-12 Thread Neville Dipale (JIRA)
Neville Dipale created ARROW-4556:
-

 Summary: [Rust] Preserve order of JSON inferred schema
 Key: ARROW-4556
 URL: https://issues.apache.org/jira/browse/ARROW-4556
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Rust
Reporter: Neville Dipale


serde_json has the ability to preserve order of JSON records read. This feature 
might be necessary to ensure that schema inference returns a consistent order 
of fields each time.

I'd like to add it separately as I'd also need to update JSON tests in 
datatypes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4544) [Rust] Read nested JSON structs into StructArrays

2019-02-12 Thread Neville Dipale (JIRA)
Neville Dipale created ARROW-4544:
-

 Summary: [Rust] Read nested JSON structs into StructArrays
 Key: ARROW-4544
 URL: https://issues.apache.org/jira/browse/ARROW-4544
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Rust
Reporter: Neville Dipale


_Adding this as a separate task as it's a bit involved._

Add the ability to read in JSON structs that are children of the JSON record 
being read.
The main concern here is deeply nested structures, which will require a 
performant and reusable basic JSON reader before dealing with recursion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4540) [Rust] Add basic JSON reader

2019-02-12 Thread Neville Dipale (JIRA)
Neville Dipale created ARROW-4540:
-

 Summary: [Rust] Add basic JSON reader
 Key: ARROW-4540
 URL: https://issues.apache.org/jira/browse/ARROW-4540
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Rust
Reporter: Neville Dipale


This is the first step in getting a JSON reader working in Rust



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4534) [Rust] Build JSON reader for reading record batches from line-delimited JSON files

2019-02-11 Thread Neville Dipale (JIRA)
Neville Dipale created ARROW-4534:
-

 Summary: [Rust] Build JSON reader for reading record batches from 
line-delimited JSON files
 Key: ARROW-4534
 URL: https://issues.apache.org/jira/browse/ARROW-4534
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Rust
Affects Versions: 0.12.0
Reporter: Neville Dipale


Similar to ARROW-694, this is an umbrella issue for supporting reading JSON 
line-delimited files in Arrow.

I have a reference implementation at 
[https://github.com/nevi-me/rust-dataframe/blob/io/json/src/io/json.rs,] where 
I'm building a Rust-based dataframe library using Arrow.

I'd like us to have feature parity with CPP at some point.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4463) [Rust] Support read:write of Feather files

2019-02-03 Thread Neville Dipale (JIRA)
Neville Dipale created ARROW-4463:
-

 Summary: [Rust] Support read:write of Feather files
 Key: ARROW-4463
 URL: https://issues.apache.org/jira/browse/ARROW-4463
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Rust
Affects Versions: 0.12.0
Reporter: Neville Dipale


As an Arrow developer/user, I'd like to be able to read and write Feather files.

The current I/O story in Rust isn't great, we don't yet fully support reading 
and writing between Parquet, we can only read CSV but not yet writing. This is 
an inconvenience (at least for me).

I propose supporting the Feather format in Rust, initially with the following 
limitations:
 * No date/time support until ARROW-4386 (and potentially more work) lands
 * Reading categorical data (from other languages) but not writing them
 * Reading and writing from and to single record batches. We don't yet support 
slicing of arrays ARROW-3954

If the above are accept(ed|able), we can enhance the Feather support as the 
dependencies on the above limitations are lifted. 

We can also refactor the Feather code as we work on more IPC in Rust.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4449) [Rust] Convert File to T: Read + Seek for schema inference

2019-02-01 Thread Neville Dipale (JIRA)
Neville Dipale created ARROW-4449:
-

 Summary: [Rust] Convert File to T: Read + Seek for schema inference
 Key: ARROW-4449
 URL: https://issues.apache.org/jira/browse/ARROW-4449
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Rust
Affects Versions: 0.12.0
Reporter: Neville Dipale
Assignee: Neville Dipale


Arrow-4376 allowed us to read csv from a record iterator. We still require a 
`File` when inferring schemas.

We propose changing from a File to something more generic. See discussion: 
https://github.com/apache/arrow/pull/3508#issuecomment-457986171



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)