[jira] [Commented] (ARROW-8258) [Rust] [Parquet] ArrowReader fails on some timestamp types

2020-03-29 Thread Renjie Liu (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-8258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17070666#comment-17070666
 ] 

Renjie Liu commented on ARROW-8258:
---

I think the root cause is here 
[https://github.com/apache/arrow/blob/master/rust/parquet/src/arrow/array_reader.rs#L220]

The array reader only did conversion of data buffer, but left data type 
incorrect. I'll submit a PR to fix it this week.

> [Rust] [Parquet] ArrowReader fails on some timestamp types
> --
>
> Key: ARROW-8258
> URL: https://issues.apache.org/jira/browse/ARROW-8258
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
> Fix For: 0.17.0
>
>
> I discovered this bug with this query
> {code:java}
> > SELECT tpep_pickup_datetime FROM taxi LIMIT 1;
> General("InvalidArgumentError(\"column types must match schema types, 
> expected Timestamp(Microsecond, None) but found UInt64 at column index 0\")") 
> {code}
> The parquet reader detects this schema when reading from the file:
> {code:java}
> Schema { 
>   fields: [
> Field { name: "tpep_pickup_datetime", data_type: Timestamp(Microsecond, 
> None), nullable: true, dict_id: 0, dict_is_ordered: false }
>   ], 
>   metadata: {} 
> } {code}
> The struct array read from the file contains:
> {code:java}
> [PrimitiveArray
> [
>   156731800800,
>   156731935700,
>   156732009200,
>   156732115100, {code}
>  When the Parquet arrow reader creates the record batch, the following 
> validation logic fails:
> {code:java}
> for i in 0..columns.len() {
> if columns[i].len() != len {
> return Err(ArrowError::InvalidArgumentError(
> "all columns in a record batch must have the same 
> length".to_string(),
> ));
> }
> if columns[i].data_type() != schema.field(i).data_type() {
> return Err(ArrowError::InvalidArgumentError(format!(
> "column types must match schema types, expected {:?} but found 
> {:?} at column index {}",
> schema.field(i).data_type(),
> columns[i].data_type(),
> i)));
> }
> }
>  {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-3085) [Rust] Add an adapter for parquet.

2019-12-11 Thread Renjie Liu (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-3085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16994054#comment-16994054
 ] 

Renjie Liu commented on ARROW-3085:
---

Yes, I think so.

> [Rust] Add an adapter for parquet.
> --
>
> Key: ARROW-3085
> URL: https://issues.apache.org/jira/browse/ARROW-3085
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust
>Reporter: Renjie Liu
>Assignee: Renjie Liu
>Priority: Major
>  Labels: parquet
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-4059) [Rust] Parquet/Arrow Integration

2019-12-11 Thread Renjie Liu (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-4059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16994052#comment-16994052
 ] 

Renjie Liu commented on ARROW-4059:
---

[~nevi_me] No, we can close this now.

> [Rust] Parquet/Arrow Integration
> 
>
> Key: ARROW-4059
> URL: https://issues.apache.org/jira/browse/ARROW-4059
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust
>Reporter: Chao Sun
>Priority: Major
>
> This is the umbrella JIRA for implementing Parquet/Arrow integration. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7348) [Rust] Add api to return references of buffer of null bitmap.

2019-12-08 Thread Renjie Liu (Jira)
Renjie Liu created ARROW-7348:
-

 Summary: [Rust] Add api to return references of buffer of null 
bitmap.
 Key: ARROW-7348
 URL: https://issues.apache.org/jira/browse/ARROW-7348
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Renjie Liu
Assignee: Renjie Liu






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7312) [Rust] ArrowError should implement std::error:Error

2019-12-04 Thread Renjie Liu (Jira)
Renjie Liu created ARROW-7312:
-

 Summary: [Rust] ArrowError should implement std::error:Error
 Key: ARROW-7312
 URL: https://issues.apache.org/jira/browse/ARROW-7312
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Renjie Liu
Assignee: Renjie Liu


ArrowError should implement this trait so that other crates can handle error 
from this crate more friendly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6890) [Rust] [Parquet] ArrowReader fails with seg fault

2019-12-04 Thread Renjie Liu (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16987900#comment-16987900
 ] 

Renjie Liu commented on ARROW-6890:
---

[~andygrove] Are you going to retry with new version of arrow reader?

> [Rust] [Parquet] ArrowReader fails with seg fault
> -
>
> Key: ARROW-6890
> URL: https://issues.apache.org/jira/browse/ARROW-6890
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Affects Versions: 1.0.0
>Reporter: Andy Grove
>Assignee: Renjie Liu
>Priority: Major
> Fix For: 1.0.0
>
>
> ArrowReader fails with seg fault when trying to read an unsupported type, 
> like Utf8. We should have it return an Err instead of causing a segmentation 
> fault.
>  
> See [https://github.com/apache/arrow/pull/5641] for a reproducible test.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-3706) [Rust] Add record batch reader trait.

2019-11-25 Thread Renjie Liu (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-3706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16982219#comment-16982219
 ] 

Renjie Liu commented on ARROW-3706:
---

[~nevi_me] Yes, please close it.

> [Rust] Add record batch reader trait.
> -
>
> Key: ARROW-3706
> URL: https://issues.apache.org/jira/browse/ARROW-3706
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust
>Reporter: Renjie Liu
>Assignee: Renjie Liu
>Priority: Major
>
> Add an RecordBatchReader trait.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7113) [Rust] Buffer should accept memory owned by others

2019-11-12 Thread Renjie Liu (Jira)
Renjie Liu created ARROW-7113:
-

 Summary: [Rust] Buffer should accept memory owned by others
 Key: ARROW-7113
 URL: https://issues.apache.org/jira/browse/ARROW-7113
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Renjie Liu
Assignee: Renjie Liu


Currently rust Buffer always assume that the memory passed to it is owned by 
itself, and frees the memory when Buffer is dropped. This is inconvenient when 
used in cross language environments such as jni. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6971) [Rust] Replace "RecordBatchReader" with "BatchIterator"

2019-10-22 Thread Renjie Liu (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16957506#comment-16957506
 ] 

Renjie Liu commented on ARROW-6971:
---

I have discussed with [~andygrove] . RecordBatchReader can't impl Sync+Send 
because it used some unsafe techniques. 

> [Rust] Replace "RecordBatchReader" with "BatchIterator"
> ---
>
> Key: ARROW-6971
> URL: https://issues.apache.org/jira/browse/ARROW-6971
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust, Rust - DataFusion
>Affects Versions: 0.15.0
>Reporter: Paddy Horan
>Assignee: Paddy Horan
>Priority: Minor
> Fix For: 1.0.0
>
>
> As part of the recent reader work we introduced 
> {code:java}
> // arrow::record_batch::RecordBatchReader{code}
> but in datafusion we have
> {code:java}
> // datafusion::physical_plan::BatchIterator
> {code}
> These two trait are almost identical (BatchIterator implements Send + Sync 
> whereas RecordBatchReader does not).  I propose we replace RecordBatchReader 
> with BatchIterator (i.e. move it to arrow as it's generally useful outside of 
> datafusion) and update parquet and data fusion accordingly.
> [~andygrove] [~liurenjie1024] do you see any issues with this? 
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6948) [Rust] [Parquet] Fix bool array support in arrow reader.

2019-10-20 Thread Renjie Liu (Jira)
Renjie Liu created ARROW-6948:
-

 Summary: [Rust] [Parquet] Fix bool array support in arrow reader.
 Key: ARROW-6948
 URL: https://issues.apache.org/jira/browse/ARROW-6948
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Renjie Liu
Assignee: Renjie Liu






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-4392) [Rust] Implement high-level Parquet writer

2019-10-11 Thread Renjie Liu (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-4392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949180#comment-16949180
 ] 

Renjie Liu commented on ARROW-4392:
---

[~andygrove] Yes it looks interesting to me. I'll take this if [~csun] is not 
available.

> [Rust] Implement high-level Parquet writer
> --
>
> Key: ARROW-4392
> URL: https://issues.apache.org/jira/browse/ARROW-4392
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
> Fix For: 1.0.0
>
>
> We only have low-level parquet writer at the moment, which requires user to 
> specify values, definition levels, repetition levels, etc. This is 
> inconvenient. Instead, we should offer high-level Parquet writer that hide 
> these details.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6845) Setup process to generate random data for integration tests

2019-10-10 Thread Renjie Liu (Jira)
Renjie Liu created ARROW-6845:
-

 Summary: Setup process to generate random data for integration 
tests
 Key: ARROW-6845
 URL: https://issues.apache.org/jira/browse/ARROW-6845
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Renjie Liu
Assignee: Renjie Liu






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6774) [Rust] Reading parquet file is slow

2019-10-05 Thread Renjie Liu (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16945027#comment-16945027
 ] 

Renjie Liu commented on ARROW-6774:
---

This is part of a reader for reading parquet files into arrow arrays. It's 
almost complete, and we have still one PR 
([https://github.com/apache/arrow/pull/5523]) waiting for review, which 
contains documentations and examples.

> [Rust] Reading parquet file is slow
> ---
>
> Key: ARROW-6774
> URL: https://issues.apache.org/jira/browse/ARROW-6774
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Affects Versions: 0.15.0
>Reporter: Adam Lippai
>Priority: Major
>
> Using the example at 
> [https://github.com/apache/arrow/tree/master/rust/parquet] is slow.
> The snippet 
> {code:none}
> let reader = SerializedFileReader::new(file).unwrap();
> let mut iter = reader.get_row_iter(None).unwrap();
> let start = Instant::now();
> while let Some(record) = iter.next() {}
> let duration = start.elapsed();
> println!("{:?}", duration);
> {code}
> Runs for 17sec for a ~160MB parquet file.
> If there is a more effective way to load a parquet file, it would be nice to 
> add it to the readme.
> P.S.: My goal is to construct an ndarray from it, I'd be happy for any tips.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6700) [Rust] [DataFusion] Use new parquet arrow reader

2019-09-25 Thread Renjie Liu (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16938198#comment-16938198
 ] 

Renjie Liu commented on ARROW-6700:
---

[~andygrove] Could you assign this ticket to me?

> [Rust] [DataFusion] Use new parquet arrow reader
> 
>
> Key: ARROW-6700
> URL: https://issues.apache.org/jira/browse/ARROW-6700
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust, Rust - DataFusion
>Reporter: Andy Grove
>Priority: Major
> Fix For: 1.0.0
>
>
> Once [https://github.com/apache/arrow/pull/5378] is merged, DataFusion should 
> be updated to use this new array reader support instead of the current 
> parquet reader code in the DataFusion crate.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6069) [Rust] [Parquet] Implement Converter to convert record reader to arrow primitive array.

2019-07-30 Thread Renjie Liu (JIRA)
Renjie Liu created ARROW-6069:
-

 Summary: [Rust] [Parquet] Implement Converter to convert record 
reader to arrow primitive array.
 Key: ARROW-6069
 URL: https://issues.apache.org/jira/browse/ARROW-6069
 Project: Apache Arrow
  Issue Type: Sub-task
Reporter: Renjie Liu
Assignee: Renjie Liu






--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (ARROW-5901) [Rust] Implement PartialEq to compare array and json values

2019-07-10 Thread Renjie Liu (JIRA)
Renjie Liu created ARROW-5901:
-

 Summary: [Rust] Implement PartialEq to compare array and json 
values
 Key: ARROW-5901
 URL: https://issues.apache.org/jira/browse/ARROW-5901
 Project: Apache Arrow
  Issue Type: New Feature
Reporter: Renjie Liu
Assignee: Renjie Liu


Useful in tests



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5823) [Rust] Fix build break.

2019-07-02 Thread Renjie Liu (JIRA)
Renjie Liu created ARROW-5823:
-

 Summary: [Rust] Fix build break.
 Key: ARROW-5823
 URL: https://issues.apache.org/jira/browse/ARROW-5823
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Renjie Liu
Assignee: Renjie Liu


Rust build breaks because some changes in array builder. However this error is 
not detected in ci scripts because missing --all-targets in cargo build command.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5792) [Rust] [Parquet] A visitor trait for parquet types.

2019-06-29 Thread Renjie Liu (JIRA)
Renjie Liu created ARROW-5792:
-

 Summary: [Rust] [Parquet] A visitor trait for parquet types.
 Key: ARROW-5792
 URL: https://issues.apache.org/jira/browse/ARROW-5792
 Project: Apache Arrow
  Issue Type: New Feature
Reporter: Renjie Liu
Assignee: Renjie Liu


Useful in dealing with parquet types.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5755) [Rust] [Parquet] Add derived clone for Type

2019-06-27 Thread Renjie Liu (JIRA)
Renjie Liu created ARROW-5755:
-

 Summary: [Rust] [Parquet] Add derived clone for Type
 Key: ARROW-5755
 URL: https://issues.apache.org/jira/browse/ARROW-5755
 Project: Apache Arrow
  Issue Type: New Feature
Reporter: Renjie Liu
Assignee: Renjie Liu


Add clone for Type



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5463) [Rust] Implement AsRef for Buffer

2019-05-31 Thread Renjie Liu (JIRA)
Renjie Liu created ARROW-5463:
-

 Summary: [Rust] Implement AsRef for Buffer
 Key: ARROW-5463
 URL: https://issues.apache.org/jira/browse/ARROW-5463
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Rust
Reporter: Renjie Liu
Assignee: Renjie Liu


Implement AsRef ArrowNativeType for Buffer



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5316) [Rust] Interfaces for gandiva bindings.

2019-05-14 Thread Renjie Liu (JIRA)
Renjie Liu created ARROW-5316:
-

 Summary: [Rust] Interfaces for gandiva bindings.
 Key: ARROW-5316
 URL: https://issues.apache.org/jira/browse/ARROW-5316
 Project: Apache Arrow
  Issue Type: Sub-task
Reporter: Renjie Liu
Assignee: Renjie Liu


Create interfaces to demonstrate high level design and ideas.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5315) [Rust] Gandiva binding.

2019-05-14 Thread Renjie Liu (JIRA)
Renjie Liu created ARROW-5315:
-

 Summary: [Rust] Gandiva binding.
 Key: ARROW-5315
 URL: https://issues.apache.org/jira/browse/ARROW-5315
 Project: Apache Arrow
  Issue Type: New Feature
Reporter: Renjie Liu


Add gandiva binding for rust.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5298) [Rust] Add debug implementation for Buffer

2019-05-09 Thread Renjie Liu (JIRA)
Renjie Liu created ARROW-5298:
-

 Summary: [Rust] Add debug implementation for Buffer
 Key: ARROW-5298
 URL: https://issues.apache.org/jira/browse/ARROW-5298
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Renjie Liu
Assignee: Renjie Liu


Default debug implementation is not good enough for debugging.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5281) [Rust] [Parquet] Move DataPageBuilder to test_common

2019-05-07 Thread Renjie Liu (JIRA)
Renjie Liu created ARROW-5281:
-

 Summary: [Rust] [Parquet] Move DataPageBuilder to test_common
 Key: ARROW-5281
 URL: https://issues.apache.org/jira/browse/ARROW-5281
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Renjie Liu
Assignee: Renjie Liu


DataPageBuilder is a helpful tool for mocking test page data, it's worthy to 
move it to test_common so that other parts can reuse it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5281) [Rust] [Parquet] Move DataPageBuilder to test_common

2019-05-07 Thread Renjie Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Renjie Liu updated ARROW-5281:
--
Component/s: Rust

> [Rust] [Parquet] Move DataPageBuilder to test_common
> 
>
> Key: ARROW-5281
> URL: https://issues.apache.org/jira/browse/ARROW-5281
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Renjie Liu
>Assignee: Renjie Liu
>Priority: Major
>
> DataPageBuilder is a helpful tool for mocking test page data, it's worthy to 
> move it to test_common so that other parts can reuse it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5162) [Rust] [Parquet] Rename mod reader to arrow.

2019-04-11 Thread Renjie Liu (JIRA)
Renjie Liu created ARROW-5162:
-

 Summary: [Rust] [Parquet] Rename mod reader to arrow.
 Key: ARROW-5162
 URL: https://issues.apache.org/jira/browse/ARROW-5162
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Renjie Liu
Assignee: Renjie Liu


Rename mod to arrow.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5127) [Rust] [Parquet] Add page iterator

2019-04-05 Thread Renjie Liu (JIRA)
Renjie Liu created ARROW-5127:
-

 Summary: [Rust] [Parquet] Add page iterator
 Key: ARROW-5127
 URL: https://issues.apache.org/jira/browse/ARROW-5127
 Project: Apache Arrow
  Issue Type: Sub-task
Reporter: Renjie Liu
Assignee: Renjie Liu


Adds a page iterator for column reader.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5126) [Rust] [Parquet] Convert parquet column desc to arrow data type

2019-04-05 Thread Renjie Liu (JIRA)
Renjie Liu created ARROW-5126:
-

 Summary: [Rust] [Parquet] Convert parquet column desc to arrow 
data type
 Key: ARROW-5126
 URL: https://issues.apache.org/jira/browse/ARROW-5126
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Rust
Reporter: Renjie Liu
Assignee: Renjie Liu






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4634) [Rust] [Parquet] Reorganize test_common mod to allow more test util codes.

2019-02-19 Thread Renjie Liu (JIRA)
Renjie Liu created ARROW-4634:
-

 Summary: [Rust] [Parquet] Reorganize test_common mod to allow more 
test util codes.
 Key: ARROW-4634
 URL: https://issues.apache.org/jira/browse/ARROW-4634
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Renjie Liu
Assignee: Renjie Liu


Currently test_common mod is just one file, and when we need to add more test 
utils into it, things may messed up, so I propose to make test_common a 
directory with multi sub mods.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4525) [Rust] [Parquet] Convert ArrowError to ParquetError

2019-02-10 Thread Renjie Liu (JIRA)
Renjie Liu created ARROW-4525:
-

 Summary: [Rust] [Parquet] Convert ArrowError to ParquetError
 Key: ARROW-4525
 URL: https://issues.apache.org/jira/browse/ARROW-4525
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust
Reporter: Renjie Liu
Assignee: Renjie Liu


We need to enable conversion from ArrowError to ParquetError. This is useful 
when integrating arrow with parquet, e.g. when reading parquet data into arrow.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-4061) [Rust] [Parquet] Implement "spaced" version for non-dictionary encoding/decoding

2019-01-27 Thread Renjie Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-4061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16753720#comment-16753720
 ] 

Renjie Liu commented on ARROW-4061:
---

[~csun] Thanks.

> [Rust] [Parquet] Implement "spaced" version for non-dictionary 
> encoding/decoding
> 
>
> Key: ARROW-4061
> URL: https://issues.apache.org/jira/browse/ARROW-4061
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Rust
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
>
> To support Parquet/Arrow encoding/decoding, we need to implement a "spaced" 
> version where slots for null values should be filled with undefined bytes. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4365) [Rust] [Parquet] Implement RecordReader

2019-01-27 Thread Renjie Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Renjie Liu updated ARROW-4365:
--
Issue Type: Sub-task  (was: Bug)
Parent: ARROW-4059

> [Rust] [Parquet] Implement RecordReader
> ---
>
> Key: ARROW-4365
> URL: https://issues.apache.org/jira/browse/ARROW-4365
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Rust
>Reporter: Renjie Liu
>Assignee: Renjie Liu
>Priority: Minor
>
> RecordReader reads logical records into memory, this is the prerequisite for 
> ColumnReader



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-4061) [Rust] [Parquet] Implement "spaced" version for non-dictionary encoding/decoding

2019-01-27 Thread Renjie Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-4061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16753662#comment-16753662
 ] 

Renjie Liu commented on ARROW-4061:
---

Hi, [~csun] Are you working on this? This is a blocker for other parts of arrow 
reader. I can take this if you are not available.

> [Rust] [Parquet] Implement "spaced" version for non-dictionary 
> encoding/decoding
> 
>
> Key: ARROW-4061
> URL: https://issues.apache.org/jira/browse/ARROW-4061
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Rust
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
>
> To support Parquet/Arrow encoding/decoding, we need to implement a "spaced" 
> version where slots for null values should be filled with undefined bytes. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4365) [Rust] [Parquet] Implement RecordReader

2019-01-24 Thread Renjie Liu (JIRA)
Renjie Liu created ARROW-4365:
-

 Summary: [Rust] [Parquet] Implement RecordReader
 Key: ARROW-4365
 URL: https://issues.apache.org/jira/browse/ARROW-4365
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust
Reporter: Renjie Liu
Assignee: Renjie Liu


RecordReader reads logical records into memory, this is the prerequisite for 
ColumnReader



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4219) [Rust] [Parquet] Implement ArrowReader

2019-01-09 Thread Renjie Liu (JIRA)
Renjie Liu created ARROW-4219:
-

 Summary: [Rust] [Parquet] Implement ArrowReader
 Key: ARROW-4219
 URL: https://issues.apache.org/jira/browse/ARROW-4219
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Rust
Reporter: Renjie Liu
Assignee: Renjie Liu


ArrowReader reads parquet into arrow. In this ticket our goal is to  implement 
get_schema and read row groups into record batch iterator.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4218) [Rust] [Parquet] Implement ColumnReader

2019-01-09 Thread Renjie Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Renjie Liu updated ARROW-4218:
--
Summary: [Rust] [Parquet] Implement ColumnReader  (was: 
[Rust][Parquet]Implement ColumnReader)

> [Rust] [Parquet] Implement ColumnReader
> ---
>
> Key: ARROW-4218
> URL: https://issues.apache.org/jira/browse/ARROW-4218
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Rust
>Reporter: Renjie Liu
>Assignee: Renjie Liu
>Priority: Major
>
> ColumnReader reads columns in parquet file into arrow array, this's this the 
> first step for reading parquet into arrow.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4218) [Rust][Parquet]Implement ColumnReader

2019-01-09 Thread Renjie Liu (JIRA)
Renjie Liu created ARROW-4218:
-

 Summary: [Rust][Parquet]Implement ColumnReader
 Key: ARROW-4218
 URL: https://issues.apache.org/jira/browse/ARROW-4218
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Rust
Reporter: Renjie Liu
Assignee: Renjie Liu


ColumnReader reads columns in parquet file into arrow array, this's this the 
first step for reading parquet into arrow.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-4060) [Rust] Add Parquet/Arrow schema converter

2018-12-27 Thread Renjie Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Renjie Liu reassigned ARROW-4060:
-

Assignee: Renjie Liu

> [Rust] Add Parquet/Arrow schema converter
> -
>
> Key: ARROW-4060
> URL: https://issues.apache.org/jira/browse/ARROW-4060
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Rust
>Reporter: Chao Sun
>Assignee: Renjie Liu
>Priority: Major
>
> We should support conversion from Parquet to Arrow schema.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3706) [Rust] Add record batch reader trait.

2018-11-06 Thread Renjie Liu (JIRA)
Renjie Liu created ARROW-3706:
-

 Summary: [Rust] Add record batch reader trait.
 Key: ARROW-3706
 URL: https://issues.apache.org/jira/browse/ARROW-3706
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Rust
Reporter: Renjie Liu
Assignee: Renjie Liu
 Fix For: 0.12.0


Add an RecordBatchReader trait.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3085) [Rust] Add an adapter for parquet.

2018-08-19 Thread Renjie Liu (JIRA)
Renjie Liu created ARROW-3085:
-

 Summary: [Rust] Add an adapter for parquet.
 Key: ARROW-3085
 URL: https://issues.apache.org/jira/browse/ARROW-3085
 Project: Apache Arrow
  Issue Type: New Feature
Reporter: Renjie Liu
Assignee: Renjie Liu
 Fix For: 0.11.0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2852) [Rust] Mark Array as Sync and Send

2018-07-15 Thread Renjie Liu (JIRA)
Renjie Liu created ARROW-2852:
-

 Summary: [Rust] Mark Array as Sync and Send
 Key: ARROW-2852
 URL: https://issues.apache.org/jira/browse/ARROW-2852
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust
Affects Versions: 0.9.0
Reporter: Renjie Liu
Assignee: Renjie Liu


Since arrays are immutable containers, it would be safe to mark it as Sync and 
Send. This is useful for processing in multithread environments.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-2435) [Rust] Add memory pool abstraction.

2018-04-10 Thread Renjie Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Renjie Liu reassigned ARROW-2435:
-

Assignee: Renjie Liu

> [Rust] Add memory pool abstraction.
> ---
>
> Key: ARROW-2435
> URL: https://issues.apache.org/jira/browse/ARROW-2435
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Affects Versions: 0.9.0
>Reporter: Renjie Liu
>Assignee: Renjie Liu
>Priority: Major
>  Labels: pull-request-available
>
> Add memory pool abstraction as the c++ api.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2435) [Rust] Add memory pool abstraction.

2018-04-09 Thread Renjie Liu (JIRA)
Renjie Liu created ARROW-2435:
-

 Summary: [Rust] Add memory pool abstraction.
 Key: ARROW-2435
 URL: https://issues.apache.org/jira/browse/ARROW-2435
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Affects Versions: 0.9.0
Reporter: Renjie Liu


Add memory pool abstraction as the c++ api.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)