[jira] [Created] (ARROW-14718) [Java] loadValidityBuffer should avoid allocating memory when input is not null and there are only null or non-null values

2021-11-15 Thread Chao Sun (Jira)
Chao Sun created ARROW-14718:


 Summary: [Java] loadValidityBuffer should avoid allocating memory 
when input is not null and there are only null or non-null values
 Key: ARROW-14718
 URL: https://issues.apache.org/jira/browse/ARROW-14718
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Java
Reporter: Chao Sun


Currently in {{BitVectorHelper.loadValidityBuffer}}, we always allocate memory 
when the source vector contains only null or non-null values. However, as the 
format also allows allocating validity buffer even if all values are null or 
not-null, the method should also consider whether the input validity buffer is 
null or not, and avoiding allocating new buffer when it is latter.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ARROW-14666) Publish Maven artifacts for arrow-c-data

2021-11-10 Thread Chao Sun (Jira)
Chao Sun created ARROW-14666:


 Summary: Publish Maven artifacts for arrow-c-data
 Key: ARROW-14666
 URL: https://issues.apache.org/jira/browse/ARROW-14666
 Project: Apache Arrow
  Issue Type: Bug
  Components: Java
Reporter: Chao Sun


It doesn't seem like we are publishing {{arrow-c-data}} in 6.0.0 release. It 
can't be found in Maven (see 
[here|https://mvnrepository.com/artifact/org.apache.arrow]).

I think we should add it in {{dev/release/post-11-java.sh}}.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ARROW-13941) [C++] Parquet writes incorrect file_offset

2021-09-08 Thread Chao Sun (Jira)
Chao Sun created ARROW-13941:


 Summary: [C++] Parquet writes incorrect file_offset 
 Key: ARROW-13941
 URL: https://issues.apache.org/jira/browse/ARROW-13941
 Project: Apache Arrow
  Issue Type: Bug
  Components: Parquet
Reporter: Chao Sun


Currently the Parquet writer set {{file_offset}} in the following way:
{code:cpp}
if (dictionary_page_offset > 0) {
  
column_chunk_->meta_data.__set_dictionary_page_offset(dictionary_page_offset);
  column_chunk_->__set_file_offset(dictionary_page_offset + 
compressed_size);
} else {
  column_chunk_->__set_file_offset(data_page_offset + compressed_size);
}{code}
This doesn't look correct, and seems it should not take {{compressed_size}} 
into consideration.

The {{file_offset}} is used when filtering row groups, and the issue could 
cause correctness issue. See SPARK-36696.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (ARROW-8455) [Rust] [Parquet] Arrow column read on partially compatible files

2020-05-28 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun closed ARROW-8455.
---
Resolution: Fixed

> [Rust] [Parquet] Arrow column read on partially compatible files
> 
>
> Key: ARROW-8455
> URL: https://issues.apache.org/jira/browse/ARROW-8455
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Affects Versions: 0.16.0
>Reporter: Remi Dettai
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> Seen behavior: When reading a Parquet file into Arrow with 
> `get_record_reader_by_columns`, it will fail if one of the column of the file 
> is a list (or any other unsupported type).
> Expected behavior: it should only fail if you are actually reading the column 
> with unsuported type.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-8455) [Rust] [Parquet] Arrow column read on partially compatible files

2020-05-22 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun resolved ARROW-8455.
-
Fix Version/s: 1.0.0
   Resolution: Fixed

Issue resolved by pull request 6935
[https://github.com/apache/arrow/pull/6935]

> [Rust] [Parquet] Arrow column read on partially compatible files
> 
>
> Key: ARROW-8455
> URL: https://issues.apache.org/jira/browse/ARROW-8455
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Affects Versions: 0.16.0
>Reporter: Remi Dettai
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> Seen behavior: When reading a Parquet file into Arrow with 
> `get_record_reader_by_columns`, it will fail if one of the column of the file 
> is a list (or any other unsupported type).
> Expected behavior: it should only fail if you are actually reading the column 
> with unsuported type.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-8752) [Rust] Remove unused hashmap

2020-05-13 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun resolved ARROW-8752.
-
Fix Version/s: 1.0.0
   Resolution: Fixed

Issue resolved by pull request 7139
[https://github.com/apache/arrow/pull/7139]

> [Rust] Remove unused hashmap 
> -
>
> Key: ARROW-8752
> URL: https://issues.apache.org/jira/browse/ARROW-8752
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: QP Hou
>Assignee: QP Hou
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> both base_nodes and base_nodes_set doesn't seem to be used at all in 
> build_array_reader.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-8680) [Rust] ComplexObjectArrayReader incorrect null value shuffling

2020-05-13 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun resolved ARROW-8680.
-
Fix Version/s: 1.0.0
   Resolution: Fixed

Issue resolved by pull request 7091
[https://github.com/apache/arrow/pull/7091]

> [Rust] ComplexObjectArrayReader incorrect null value shuffling
> --
>
> Key: ARROW-8680
> URL: https://issues.apache.org/jira/browse/ARROW-8680
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Reporter: Raphael Taylor-Davies
>Assignee: Raphael Taylor-Davies
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The null shifting logic within ComplexObjectArrayReader is incorrect as it 
> doesn't take into account the num_readers offset within the def_levels buffer



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8659) [Rust] ListBuilder and FixedSizeListBuilder capacity

2020-05-01 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated ARROW-8659:

Component/s: Rust

> [Rust] ListBuilder and FixedSizeListBuilder capacity
> 
>
> Key: ARROW-8659
> URL: https://issues.apache.org/jira/browse/ARROW-8659
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Raphael Taylor-Davies
>Assignee: Raphael Taylor-Davies
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Both ListBuilder and FixedSizeListBuilder accept a values_builder as a 
> constructor argument and then set the capacity of their internal builders 
> based off the length of this values_builder. Unfortunately at construction 
> time this values_builder is normally empty, and consequently programs spend 
> an unnecessary amount of time reallocating memory.
>  
> This should be addressed by adding new constructor methods that allow 
> specifying the desired capacity upfront.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-8659) [Rust] ListBuilder and FixedSizeListBuilder capacity

2020-05-01 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun resolved ARROW-8659.
-
Fix Version/s: 1.0.0
   Resolution: Fixed

Issue resolved by pull request 7076
[https://github.com/apache/arrow/pull/7076]

> [Rust] ListBuilder and FixedSizeListBuilder capacity
> 
>
> Key: ARROW-8659
> URL: https://issues.apache.org/jira/browse/ARROW-8659
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Raphael Taylor-Davies
>Assignee: Raphael Taylor-Davies
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Both ListBuilder and FixedSizeListBuilder accept a values_builder as a 
> constructor argument and then set the capacity of their internal builders 
> based off the length of this values_builder. Unfortunately at construction 
> time this values_builder is normally empty, and consequently programs spend 
> an unnecessary amount of time reallocating memory.
>  
> This should be addressed by adding new constructor methods that allow 
> specifying the desired capacity upfront.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-7681) [Rust] Explicitly seeking a BufReader will discard the internal buffer

2020-04-27 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun resolved ARROW-7681.
-
Resolution: Fixed

Issue resolved by pull request 6949
[https://github.com/apache/arrow/pull/6949]

> [Rust] Explicitly seeking a BufReader will discard the internal buffer
> --
>
> Key: ARROW-7681
> URL: https://issues.apache.org/jira/browse/ARROW-7681
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Reporter: Max Burke
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> This behavior was observed in the Parquet Rust file reader 
> (parquet/src/util/io.rs).
>  
> Pull request: [https://github.com/apache/arrow/pull/6280]
>  
> From the Rust documentation for BufReader:
>  
> "Seeking always discards the internal buffer, even if the seek position would 
> otherwise fall within it. This guarantees that calling {{.into_inner()}} 
> immediately after a seek yields the underlying reader at the same position."
>  
> [https://doc.rust-lang.org/std/io/struct.BufReader.html#impl-Seek]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-8473) [Rust] "Statistics support" in rust/parquet readme is incorrect

2020-04-23 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun resolved ARROW-8473.
-
Fix Version/s: 1.0.0
   Resolution: Fixed

Issue resolved by pull request 6951
[https://github.com/apache/arrow/pull/6951]

> [Rust] "Statistics support" in rust/parquet readme is incorrect
> ---
>
> Key: ARROW-8473
> URL: https://issues.apache.org/jira/browse/ARROW-8473
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Reporter: Krzysztof Stanisławek
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Statistics are not actually supported in rust implementation of parquet. See 
> [https://github.com/apache/arrow/blob/3e3712a14a3242d70145fb9d3d6f0f4b8c374e68/rust/parquet/src/column/writer.rs#L522]
>  or similar lines in this file, or writer.rs.
> https://github.com/apache/arrow/pull/6951



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-7775) Don't let safe code arbitrarily transmute readers and writers

2020-02-15 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun resolved ARROW-7775.
-
Fix Version/s: 1.0.0
   Resolution: Fixed

Issue resolved by pull request 6256
[https://github.com/apache/arrow/pull/6256]

> Don't let safe code arbitrarily transmute readers and writers
> -
>
> Key: ARROW-7775
> URL: https://issues.apache.org/jira/browse/ARROW-7775
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Reporter: Markus Westerlind
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> https://github.com/apache/arrow/pull/6256



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-7768) [Rust] Implement Length and TryClone traits for Cursor> in reader.rs

2020-02-06 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun reassigned ARROW-7768:
---

Assignee: (was: Chao Sun)

> [Rust] Implement Length and TryClone traits for Cursor> in reader.rs
> 
>
> Key: ARROW-7768
> URL: https://issues.apache.org/jira/browse/ARROW-7768
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: David Kegley
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.16.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Currently Length and TryClone are implemented for Cursor<&'a [u8]> in 
> src/file/reader.rs
> Attempting to create a cursor from a Vec...
> {code:java}
> fn test_cursor_and_file_has_the_same_behaviour() {
>   let mut buf: Vec = Vec::new();
>   get_test_file("alltypes_plain.parquet")
> .read_to_end( buf)
> .unwrap();
>   let cursor = Cursor::new(buf.as_slice());
> ...
> {code}
>  
> results in:
> {code:java}
> `buf` does not live long enough
> borrowed value does not live long enough 
> rustc(E0597)
> reader.rs(681, 34): borrowed value does not live long enough
> reader.rs(681, 34): argument requires that `buf` is borrowed for `'static`
> reader.rs(691, 5): `buf` dropped here while still borrowed
> {code}
>  
> Implementing Length and TryClone for Cursor> would allow for:
> {code:java}
> fn test_cursor_and_file_has_the_same_behaviour() {
>   let mut buf: Vec = Vec::new();
>   get_test_file("alltypes_plain.parquet")
> .read_to_end( buf)
> .unwrap();
>   let cursor = Cursor::new(buf);
>   let read_from_cursor = SerializedFileReader::new(cursor).unwrap();
> ...
> {code}
> Otherwise, buf: Vec must be declared static in order to initialize a 
> SerializedFileReader from a Cursor.
> I'm new to rust so perhaps this is the intended behavior, but if not I'm 
> happy to submit a PR for this
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-7768) [Rust] Implement Length and TryClone traits for Cursor> in reader.rs

2020-02-06 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun resolved ARROW-7768.
-
Fix Version/s: 0.16.0
   Resolution: Fixed

Issue resolved by pull request 6376
[https://github.com/apache/arrow/pull/6376]

> [Rust] Implement Length and TryClone traits for Cursor> in reader.rs
> 
>
> Key: ARROW-7768
> URL: https://issues.apache.org/jira/browse/ARROW-7768
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: David Kegley
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.16.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Currently Length and TryClone are implemented for Cursor<&'a [u8]> in 
> src/file/reader.rs
> Attempting to create a cursor from a Vec...
> {code:java}
> fn test_cursor_and_file_has_the_same_behaviour() {
>   let mut buf: Vec = Vec::new();
>   get_test_file("alltypes_plain.parquet")
> .read_to_end( buf)
> .unwrap();
>   let cursor = Cursor::new(buf.as_slice());
> ...
> {code}
>  
> results in:
> {code:java}
> `buf` does not live long enough
> borrowed value does not live long enough 
> rustc(E0597)
> reader.rs(681, 34): borrowed value does not live long enough
> reader.rs(681, 34): argument requires that `buf` is borrowed for `'static`
> reader.rs(691, 5): `buf` dropped here while still borrowed
> {code}
>  
> Implementing Length and TryClone for Cursor> would allow for:
> {code:java}
> fn test_cursor_and_file_has_the_same_behaviour() {
>   let mut buf: Vec = Vec::new();
>   get_test_file("alltypes_plain.parquet")
> .read_to_end( buf)
> .unwrap();
>   let cursor = Cursor::new(buf);
>   let read_from_cursor = SerializedFileReader::new(cursor).unwrap();
> ...
> {code}
> Otherwise, buf: Vec must be declared static in order to initialize a 
> SerializedFileReader from a Cursor.
> I'm new to rust so perhaps this is the intended behavior, but if not I'm 
> happy to submit a PR for this
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-7768) [Rust] Implement Length and TryClone traits for Cursor> in reader.rs

2020-02-06 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun reassigned ARROW-7768:
---

Assignee: Chao Sun

> [Rust] Implement Length and TryClone traits for Cursor> in reader.rs
> 
>
> Key: ARROW-7768
> URL: https://issues.apache.org/jira/browse/ARROW-7768
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: David Kegley
>Assignee: Chao Sun
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.16.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Currently Length and TryClone are implemented for Cursor<&'a [u8]> in 
> src/file/reader.rs
> Attempting to create a cursor from a Vec...
> {code:java}
> fn test_cursor_and_file_has_the_same_behaviour() {
>   let mut buf: Vec = Vec::new();
>   get_test_file("alltypes_plain.parquet")
> .read_to_end( buf)
> .unwrap();
>   let cursor = Cursor::new(buf.as_slice());
> ...
> {code}
>  
> results in:
> {code:java}
> `buf` does not live long enough
> borrowed value does not live long enough 
> rustc(E0597)
> reader.rs(681, 34): borrowed value does not live long enough
> reader.rs(681, 34): argument requires that `buf` is borrowed for `'static`
> reader.rs(691, 5): `buf` dropped here while still borrowed
> {code}
>  
> Implementing Length and TryClone for Cursor> would allow for:
> {code:java}
> fn test_cursor_and_file_has_the_same_behaviour() {
>   let mut buf: Vec = Vec::new();
>   get_test_file("alltypes_plain.parquet")
> .read_to_end( buf)
> .unwrap();
>   let cursor = Cursor::new(buf);
>   let read_from_cursor = SerializedFileReader::new(cursor).unwrap();
> ...
> {code}
> Otherwise, buf: Vec must be declared static in order to initialize a 
> SerializedFileReader from a Cursor.
> I'm new to rust so perhaps this is the intended behavior, but if not I'm 
> happy to submit a PR for this
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-7743) [Rust] [Parquet] Support reading timestamp micros

2020-02-06 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun resolved ARROW-7743.
-
Fix Version/s: 0.16.0
   Resolution: Fixed

Issue resolved by pull request 6338
[https://github.com/apache/arrow/pull/6338]

> [Rust] [Parquet] Support reading timestamp micros
> -
>
> Key: ARROW-7743
> URL: https://issues.apache.org/jira/browse/ARROW-7743
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust
>Reporter: Onur Satici
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.16.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Parquet in Rust doesn't seem to have an easy way to read microsecond 
> precision timestamp columns.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-7312) [Rust] ArrowError should implement std::error:Error

2019-12-05 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated ARROW-7312:

Component/s: Rust

> [Rust] ArrowError should implement std::error:Error
> ---
>
> Key: ARROW-7312
> URL: https://issues.apache.org/jira/browse/ARROW-7312
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Renjie Liu
>Assignee: Renjie Liu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> ArrowError should implement this trait so that other crates can handle error 
> from this crate more friendly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-7113) [Rust] Buffer should accept memory owned by others

2019-11-24 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun resolved ARROW-7113.
-
Fix Version/s: 1.0.0
   Resolution: Fixed

Issue resolved by pull request 5806
[https://github.com/apache/arrow/pull/5806]

> [Rust] Buffer should accept memory owned by others
> --
>
> Key: ARROW-7113
> URL: https://issues.apache.org/jira/browse/ARROW-7113
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Renjie Liu
>Assignee: Renjie Liu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Currently rust Buffer always assume that the memory passed to it is owned by 
> itself, and frees the memory when Buffer is dropped. This is inconvenient 
> when used in cross language environments such as jni. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-7113) [Rust] Buffer should accept memory owned by others

2019-11-24 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated ARROW-7113:

Component/s: Rust

> [Rust] Buffer should accept memory owned by others
> --
>
> Key: ARROW-7113
> URL: https://issues.apache.org/jira/browse/ARROW-7113
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Renjie Liu
>Assignee: Renjie Liu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Currently rust Buffer always assume that the memory passed to it is owned by 
> itself, and frees the memory when Buffer is dropped. This is inconvenient 
> when used in cross language environments such as jni. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6154) [Rust] Too many open files (os error 24)

2019-08-07 Thread Chao Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-6154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16901748#comment-16901748
 ] 

Chao Sun commented on ARROW-6154:
-

Thanks for reporting. Do you have rough idea how deep the nested data type is? 
is there any error message? would be great if we can reproduce this.

> [Rust] Too many open files (os error 24)
> 
>
> Key: ARROW-6154
> URL: https://issues.apache.org/jira/browse/ARROW-6154
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Reporter: Yesh
>Priority: Major
>
> Used [rust]*parquet-read binary to read a deeply nested parquet file and see 
> the below stack trace. Unfortunately won't be able to upload file.*
> {code:java}
> stack backtrace:
>    0: std::panicking::default_hook::{{closure}}
>    1: std::panicking::default_hook
>    2: std::panicking::rust_panic_with_hook
>    3: std::panicking::continue_panic_fmt
>    4: rust_begin_unwind
>    5: core::panicking::panic_fmt
>    6: core::result::unwrap_failed
>    7: parquet::util::io::FileSource::new
>    8:  as 
> parquet::file::reader::RowGroupReader>::get_column_page_reader
>    9:  as 
> parquet::file::reader::RowGroupReader>::get_column_reader
>   10: parquet::record::reader::TreeBuilder::reader_tree
>   11: parquet::record::reader::TreeBuilder::reader_tree
>   12: parquet::record::reader::TreeBuilder::reader_tree
>   13: parquet::record::reader::TreeBuilder::reader_tree
>   14: parquet::record::reader::TreeBuilder::reader_tree
>   15: parquet::record::reader::TreeBuilder::build
>   16:  core::iter::traits::iterator::Iterator>::next
>   17: parquet_read::main
>   18: std::rt::lang_start::{{closure}}
>   19: std::panicking::try::do_call
>   20: __rust_maybe_catch_panic
>   21: std::rt::lang_start_internal
>   22: main{code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Resolved] (ARROW-4365) [Rust] [Parquet] Implement RecordReader

2019-07-30 Thread Chao Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun resolved ARROW-4365.
-
   Resolution: Fixed
Fix Version/s: 1.0.0

Issue resolved by pull request 4292
[https://github.com/apache/arrow/pull/4292]

> [Rust] [Parquet] Implement RecordReader
> ---
>
> Key: ARROW-4365
> URL: https://issues.apache.org/jira/browse/ARROW-4365
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Rust
>Reporter: Renjie Liu
>Assignee: Renjie Liu
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 10h
>  Remaining Estimate: 0h
>
> RecordReader reads logical records into memory, this is the prerequisite for 
> ColumnReader



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (ARROW-5357) [Rust] Add capacity field in Buffer

2019-07-28 Thread Chao Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16894819#comment-16894819
 ] 

Chao Sun commented on ARROW-5357:
-

Re-purposing this Jira to add capacity info in {{Buffer}} struct.

> [Rust] Add capacity field in Buffer
> ---
>
> Key: ARROW-5357
> URL: https://issues.apache.org/jira/browse/ARROW-5357
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Currently {{Buffer}} only has {{len}}, but no {{capacity}}. We should add 
> both.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (ARROW-5357) [Rust] Add capacity field in Buffer

2019-07-28 Thread Chao Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated ARROW-5357:

Summary: [Rust] Add capacity field in Buffer  (was: [Rust] Change 
Buffer::len to represent total bytes instead of used bytes)

> [Rust] Add capacity field in Buffer
> ---
>
> Key: ARROW-5357
> URL: https://issues.apache.org/jira/browse/ARROW-5357
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Currently {{Buffer::len}} records the number of used bytes, as opposed to the 
> number of total bytes. This poses a problem when converting from buffers 
> defined in flatbuffer, where the length is actually the number of allocated 
> bytes for the buffer. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (ARROW-5357) [Rust] Add capacity field in Buffer

2019-07-28 Thread Chao Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated ARROW-5357:

Description: Currently {{Buffer}} only has {{len}}, but no {{capacity}}. We 
should add both.  (was: Currently {{Buffer::len}} records the number of used 
bytes, as opposed to the number of total bytes. This poses a problem when 
converting from buffers defined in flatbuffer, where the length is actually the 
number of allocated bytes for the buffer. )

> [Rust] Add capacity field in Buffer
> ---
>
> Key: ARROW-5357
> URL: https://issues.apache.org/jira/browse/ARROW-5357
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Currently {{Buffer}} only has {{len}}, but no {{capacity}}. We should add 
> both.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (ARROW-6047) [Rust] Rust nightly 1.38.0 builds failing

2019-07-26 Thread Chao Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-6047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16894120#comment-16894120
 ] 

Chao Sun commented on ARROW-6047:
-

{quote}
Uh, that's not good. I'm concerned about having this crate managed by some 
extra-Apache proces/
{quote}
I'm fine by putting this under the govern of ASF, just not sure what's the 
process look like.

> [Rust] Rust nightly 1.38.0 builds failing
> -
>
> Key: ARROW-6047
> URL: https://issues.apache.org/jira/browse/ARROW-6047
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Reporter: Wes McKinney
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> see
> * https://travis-ci.org/apache/arrow/jobs/563893205
> * https://ci.ursalabs.org/#/builders/93/builds/669/steps/2/logs/stdio



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (ARROW-6047) [Rust] Rust nightly 1.38.0 builds failing

2019-07-26 Thread Chao Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-6047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16894082#comment-16894082
 ] 

Chao Sun commented on ARROW-6047:
-

Hmm I didn't know that it will uses 2.6.0 even though we specified:
{code:java}
parquet-format = "2.5.0"
{code}
Let me change it to using semantic versioning.

> [Rust] Rust nightly 1.38.0 builds failing
> -
>
> Key: ARROW-6047
> URL: https://issues.apache.org/jira/browse/ARROW-6047
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Reporter: Wes McKinney
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> see
> * https://travis-ci.org/apache/arrow/jobs/563893205
> * https://ci.ursalabs.org/#/builders/93/builds/669/steps/2/logs/stdio



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (ARROW-5785) Rust datafusion implementation should not depend on rustyline

2019-07-01 Thread Chao Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated ARROW-5785:

Component/s: Rust - DataFusion

> Rust datafusion implementation should not depend on rustyline
> -
>
> Key: ARROW-5785
> URL: https://issues.apache.org/jira/browse/ARROW-5785
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust - DataFusion
>Reporter: Marius Seritan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> The rust implementation of datafusion produces both a library and a cli. The 
> cli is not necessarily useful for downstream consumers so its dependencies 
> should not show up in the library.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-5358) [Rust] Implement equality check for ArrayData and Array

2019-07-01 Thread Chao Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun resolved ARROW-5358.
-
   Resolution: Fixed
Fix Version/s: 1.0.0

Issue resolved by pull request 4643
[https://github.com/apache/arrow/pull/4643]

> [Rust] Implement equality check for ArrayData and Array
> ---
>
> Key: ARROW-5358
> URL: https://issues.apache.org/jira/browse/ARROW-5358
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> Currently {{Array}} doesn't implement the {{Eq}} trait. Although 
> {{ArrayData}} derives from the {{PartialEq}} trait, the default 
> implementation is not suitable here. Instead, we should implement customized 
> equality comparison.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5792) [Rust] [Parquet] A visitor trait for parquet types.

2019-07-01 Thread Chao Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated ARROW-5792:

Fix Version/s: (was: 0.14.0)
   1.0.0

> [Rust] [Parquet] A visitor trait for parquet types.
> ---
>
> Key: ARROW-5792
> URL: https://issues.apache.org/jira/browse/ARROW-5792
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust
>Reporter: Renjie Liu
>Assignee: Renjie Liu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Useful in dealing with parquet types.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5792) [Rust] [Parquet] A visitor trait for parquet types.

2019-07-01 Thread Chao Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated ARROW-5792:

Component/s: Rust

> [Rust] [Parquet] A visitor trait for parquet types.
> ---
>
> Key: ARROW-5792
> URL: https://issues.apache.org/jira/browse/ARROW-5792
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust
>Reporter: Renjie Liu
>Assignee: Renjie Liu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Useful in dealing with parquet types.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-5792) [Rust] [Parquet] A visitor trait for parquet types.

2019-07-01 Thread Chao Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun resolved ARROW-5792.
-
   Resolution: Fixed
Fix Version/s: 0.14.0

Issue resolved by pull request 4766
[https://github.com/apache/arrow/pull/4766]

> [Rust] [Parquet] A visitor trait for parquet types.
> ---
>
> Key: ARROW-5792
> URL: https://issues.apache.org/jira/browse/ARROW-5792
> Project: Apache Arrow
>  Issue Type: New Feature
>Reporter: Renjie Liu
>Assignee: Renjie Liu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Useful in dealing with parquet types.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5753) [Rust] Fix test failure in CI code coverage

2019-06-29 Thread Chao Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated ARROW-5753:

Summary: [Rust] Fix test failure in CI code coverage  (was: [Rust] Fix code 
coverage in CI)

> [Rust] Fix test failure in CI code coverage
> ---
>
> Key: ARROW-5753
> URL: https://issues.apache.org/jira/browse/ARROW-5753
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
>
> Rust code coverage in CI has been broken for a while now. We should fix it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-5785) Rust datafusion implementation should not depend on rustyline

2019-06-28 Thread Chao Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun resolved ARROW-5785.
-
   Resolution: Fixed
Fix Version/s: 0.14.0

Issue resolved by pull request 4742
[https://github.com/apache/arrow/pull/4742]

> Rust datafusion implementation should not depend on rustyline
> -
>
> Key: ARROW-5785
> URL: https://issues.apache.org/jira/browse/ARROW-5785
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Marius Seritan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The rust implementation of datafusion produces both a library and a cli. The 
> cli is not necessarily useful for downstream consumers so its dependencies 
> should not show up in the library.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-5755) [Rust] [Parquet] Add derived clone for Type

2019-06-27 Thread Chao Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun resolved ARROW-5755.
-
   Resolution: Fixed
Fix Version/s: 0.14.0

Issue resolved by pull request 4719
[https://github.com/apache/arrow/pull/4719]

> [Rust] [Parquet] Add derived clone for Type
> ---
>
> Key: ARROW-5755
> URL: https://issues.apache.org/jira/browse/ARROW-5755
> Project: Apache Arrow
>  Issue Type: New Feature
>Reporter: Renjie Liu
>Assignee: Renjie Liu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Add clone for Type



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5755) [Rust] [Parquet] Add derived clone for Type

2019-06-27 Thread Chao Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated ARROW-5755:

Component/s: Rust

> [Rust] [Parquet] Add derived clone for Type
> ---
>
> Key: ARROW-5755
> URL: https://issues.apache.org/jira/browse/ARROW-5755
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust
>Reporter: Renjie Liu
>Assignee: Renjie Liu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Add clone for Type



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5753) [Rust] Fix code coverage in CI

2019-06-26 Thread Chao Sun (JIRA)
Chao Sun created ARROW-5753:
---

 Summary: [Rust] Fix code coverage in CI
 Key: ARROW-5753
 URL: https://issues.apache.org/jira/browse/ARROW-5753
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust
Reporter: Chao Sun
Assignee: Chao Sun


Rust code coverage in CI has been broken for a while now. We should fix it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-5045) [Rust] Code coverage silently failing in CI

2019-06-25 Thread Chao Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun reassigned ARROW-5045:
---

Assignee: Chao Sun

> [Rust] Code coverage silently failing in CI
> ---
>
> Key: ARROW-5045
> URL: https://issues.apache.org/jira/browse/ARROW-5045
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Affects Versions: 0.13.0
>Reporter: Andy Grove
>Assignee: Chao Sun
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
>  
> {code:java}
> error: could not execute process `target/kcov-master/build/src/kcov --verify 
> --include-path=/home/travis/build/apache/arrow/rust 
> /home/travis/build/apache/arrow/rust/target/kcov-arrow-f04240306dd653e9 
> /home/travis/build/apache/arrow/rust/target/debug/deps/arrow-f04240306dd653e9`
>  (never executed){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5722) [Rust] Implement std::fmt::Debug for ListArray, BinaryArray and StructArray

2019-06-24 Thread Chao Sun (JIRA)
Chao Sun created ARROW-5722:
---

 Summary: [Rust] Implement std::fmt::Debug for ListArray, 
BinaryArray and StructArray
 Key: ARROW-5722
 URL: https://issues.apache.org/jira/browse/ARROW-5722
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Chao Sun






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5721) [Rust] Move array related code into a separate module

2019-06-24 Thread Chao Sun (JIRA)
Chao Sun created ARROW-5721:
---

 Summary: [Rust] Move array related code into a separate module
 Key: ARROW-5721
 URL: https://issues.apache.org/jira/browse/ARROW-5721
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Chao Sun
Assignee: Chao Sun


We should move all array related code into a separate module {{array}}, and 
re-export public interfaces. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5680) [Rust] datafusion group-by tests depends on result set order

2019-06-21 Thread Chao Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869942#comment-16869942
 ] 

Chao Sun commented on ARROW-5680:
-

[~fsaintjacques] I think ARROW-5217 is related. Basically it's caused by some 
nightly changes on HashMap. But you are right, we should ideally make this 
deterministic.

> [Rust] datafusion group-by tests depends on result set order
> 
>
> Key: ARROW-5680
> URL: https://issues.apache.org/jira/browse/ARROW-5680
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust - DataFusion
>Reporter: Francois Saint-Jacques
>Priority: Minor
>
> See 
> https://circleci.com/gh/ursa-labs/crossbow/223?utm_campaign=vcs-integration-link_medium=referral_source=github-build-link
> once I properly export ARROW_TEST_DATA and PARQUET_TEST_DATA, I get further 
> failures, e.g.
> {code:bash}
> running 18 tests
> test csv_query_group_by_int_min_max ... FAILED
> test csv_query_external_table_count ... ok
> test csv_query_count ... ok
> test csv_count_star ... ok
> test csv_query_avg ... ok
> test csv_query_avg_multi_batch ... ok
> test csv_query_cast ... ok
> test csv_query_group_by_avg ... FAILED
> test csv_query_group_by_string_min_max ... FAILED
> test csv_query_group_by_int_count ... FAILED
> test csv_query_limit ... ok
> test csv_query_limit_bigger_than_nbr_of_rows ... ok
> test csv_query_limit_with_same_nbr_of_rows ... ok
> test csv_query_cast_literal ... ok
> test csv_query_limit_zero ... ok
> test csv_query_create_external_table ... ok
> test csv_query_with_predicate ... ok
> test parquet_query ... ok
> failures:
>  csv_query_group_by_int_min_max stdout 
> thread 'csv_query_group_by_int_min_max' panicked at 'assertion failed: `(left 
> == right)`
>   left: 
> `"4\t0.02182578039211991\t0.9237877978193884\n5\t0.0147930530301\t0.9723580396501548\n2\t0.16301110515739792\t0.991517828651004\n3\t0.047343434291126085\t0.9293883502480845\n1\t0.05636955101974106\t0.9965400387585364\n"`,
>  right: 
> `"4\t0.02182578039211991\t0.9237877978193884\n2\t0.16301110515739792\t0.991517828651004\n5\t0.0147930530301\t0.9723580396501548\n3\t0.047343434291126085\t0.9293883502480845\n1\t0.05636955101974106\t0.9965400387585364\n"`',
>  datafusion/tests/sql.rs:77:5
> note: Run with `RUST_BACKTRACE=1` environment variable to display a backtrace.
>  csv_query_group_by_avg stdout 
> thread 'csv_query_group_by_avg' panicked at 'assertion failed: `(left == 
> right)`
>   left: 
> `"\"a\"\t0.48754517466109415\n\"e\"\t0.48600669271341534\n\"d\"\t0.48855379387549824\n\"c\"\t0.6600456536439784\n\"b\"\t0.41040709263815384\n"`,
>  right: 
> `"\"d\"\t0.48855379387549824\n\"c\"\t0.6600456536439784\n\"b\"\t0.41040709263815384\n\"a\"\t0.48754517466109415\n\"e\"\t0.48600669271341534\n"`',
>  datafusion/tests/sql.rs:99:5
>  csv_query_group_by_string_min_max stdout 
> thread 'csv_query_group_by_string_min_max' panicked at 'assertion failed: 
> `(left == right)`
>   left: 
> `"\"a\"\t0.02182578039211991\t0.9800193410444061\n\"e\"\t0.0147930530301\t0.9965400387585364\n\"d\"\t0.061029375346466685\t0.9748360509016578\n\"c\"\t0.0494924465469434\t0.991517828651004\n\"b\"\t0.04893135681998029\t0.9185813970744787\n"`,
>  right: 
> `"\"d\"\t0.061029375346466685\t0.9748360509016578\n\"c\"\t0.0494924465469434\t0.991517828651004\n\"b\"\t0.04893135681998029\t0.9185813970744787\n\"a\"\t0.02182578039211991\t0.9800193410444061\n\"e\"\t0.0147930530301\t0.9965400387585364\n"`',
>  datafusion/tests/sql.rs:187:5
>  csv_query_group_by_int_count stdout 
> thread 'csv_query_group_by_int_count' panicked at 'assertion failed: `(left 
> == right)`
>   left: `"\"a\"\t21\n\"e\"\t21\n\"d\"\t18\n\"c\"\t21\n\"b\"\t19\n"`,
>  right: `"\"d\"\t18\n\"c\"\t21\n\"b\"\t19\n\"a\"\t21\n\"e\"\t21\n"`', 
> datafusion/tests/sql.rs:175:5
> {code}
> I suspect that the tests are expecting the group-by results in a fix order. 
> That would be highly dependent on the iterator of the hash table. Note that 
> once I did a rustup update (and docker rmi rustlangrust/nightly), the 
> failures have gone away.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5680) [Rust] datafusion group-by tests depends on result set order

2019-06-21 Thread Chao Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869641#comment-16869641
 ] 

Chao Sun commented on ARROW-5680:
-

cc [~andygrove]

> [Rust] datafusion group-by tests depends on result set order
> 
>
> Key: ARROW-5680
> URL: https://issues.apache.org/jira/browse/ARROW-5680
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust - DataFusion
>Reporter: Francois Saint-Jacques
>Priority: Major
>
> See 
> https://circleci.com/gh/ursa-labs/crossbow/223?utm_campaign=vcs-integration-link_medium=referral_source=github-build-link
> once I properly export ARROW_TEST_DATA and PARQUET_TEST_DATA, I get further 
> failures, e.g.
> {code:bash}
> running 18 tests
> test csv_query_group_by_int_min_max ... FAILED
> test csv_query_external_table_count ... ok
> test csv_query_count ... ok
> test csv_count_star ... ok
> test csv_query_avg ... ok
> test csv_query_avg_multi_batch ... ok
> test csv_query_cast ... ok
> test csv_query_group_by_avg ... FAILED
> test csv_query_group_by_string_min_max ... FAILED
> test csv_query_group_by_int_count ... FAILED
> test csv_query_limit ... ok
> test csv_query_limit_bigger_than_nbr_of_rows ... ok
> test csv_query_limit_with_same_nbr_of_rows ... ok
> test csv_query_cast_literal ... ok
> test csv_query_limit_zero ... ok
> test csv_query_create_external_table ... ok
> test csv_query_with_predicate ... ok
> test parquet_query ... ok
> failures:
>  csv_query_group_by_int_min_max stdout 
> thread 'csv_query_group_by_int_min_max' panicked at 'assertion failed: `(left 
> == right)`
>   left: 
> `"4\t0.02182578039211991\t0.9237877978193884\n5\t0.0147930530301\t0.9723580396501548\n2\t0.16301110515739792\t0.991517828651004\n3\t0.047343434291126085\t0.9293883502480845\n1\t0.05636955101974106\t0.9965400387585364\n"`,
>  right: 
> `"4\t0.02182578039211991\t0.9237877978193884\n2\t0.16301110515739792\t0.991517828651004\n5\t0.0147930530301\t0.9723580396501548\n3\t0.047343434291126085\t0.9293883502480845\n1\t0.05636955101974106\t0.9965400387585364\n"`',
>  datafusion/tests/sql.rs:77:5
> note: Run with `RUST_BACKTRACE=1` environment variable to display a backtrace.
>  csv_query_group_by_avg stdout 
> thread 'csv_query_group_by_avg' panicked at 'assertion failed: `(left == 
> right)`
>   left: 
> `"\"a\"\t0.48754517466109415\n\"e\"\t0.48600669271341534\n\"d\"\t0.48855379387549824\n\"c\"\t0.6600456536439784\n\"b\"\t0.41040709263815384\n"`,
>  right: 
> `"\"d\"\t0.48855379387549824\n\"c\"\t0.6600456536439784\n\"b\"\t0.41040709263815384\n\"a\"\t0.48754517466109415\n\"e\"\t0.48600669271341534\n"`',
>  datafusion/tests/sql.rs:99:5
>  csv_query_group_by_string_min_max stdout 
> thread 'csv_query_group_by_string_min_max' panicked at 'assertion failed: 
> `(left == right)`
>   left: 
> `"\"a\"\t0.02182578039211991\t0.9800193410444061\n\"e\"\t0.0147930530301\t0.9965400387585364\n\"d\"\t0.061029375346466685\t0.9748360509016578\n\"c\"\t0.0494924465469434\t0.991517828651004\n\"b\"\t0.04893135681998029\t0.9185813970744787\n"`,
>  right: 
> `"\"d\"\t0.061029375346466685\t0.9748360509016578\n\"c\"\t0.0494924465469434\t0.991517828651004\n\"b\"\t0.04893135681998029\t0.9185813970744787\n\"a\"\t0.02182578039211991\t0.9800193410444061\n\"e\"\t0.0147930530301\t0.9965400387585364\n"`',
>  datafusion/tests/sql.rs:187:5
>  csv_query_group_by_int_count stdout 
> thread 'csv_query_group_by_int_count' panicked at 'assertion failed: `(left 
> == right)`
>   left: `"\"a\"\t21\n\"e\"\t21\n\"d\"\t18\n\"c\"\t21\n\"b\"\t19\n"`,
>  right: `"\"d\"\t18\n\"c\"\t21\n\"b\"\t19\n\"a\"\t21\n\"e\"\t21\n"`', 
> datafusion/tests/sql.rs:175:5
> {code}
> I suspect that the tests are expecting the group-by results in a fix order. 
> That would be highly dependent on the iterator of the hash table. Note that 
> once I did a rustup update (and docker rmi rustlangrust/nightly), the 
> failures have gone away.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5613) [Rust] Fail to compile with unrecognized platform-specific intrinsic function

2019-06-18 Thread Chao Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16867207#comment-16867207
 ] 

Chao Sun commented on ARROW-5613:
-

I was running on my Mac. Haven't tested on other platforms.

> [Rust] Fail to compile with unrecognized platform-specific intrinsic function
> -
>
> Key: ARROW-5613
> URL: https://issues.apache.org/jira/browse/ARROW-5613
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Reporter: Chao Sun
>Priority: Major
>
> I'm testing a project which depends on the Arrow crate. It failed with the 
> following error:
> {code}
> error[E0441]: unrecognized platform-specific intrinsic function: 
> `simd_bitmask`
>--> 
> /Users/sunchao/.cargo/registry/src/github.com-1ecc6299db9ec823/packed_simd-0.3.3/src/codegen/llvm.rs:100:5
> |
> 100 | crate fn simd_bitmask(value: T) -> U;
> | ^^^
> error: aborting due to previous error
> For more information about this error, try `rustc --explain E0441`.
> error: Could not compile `packed_simd`.
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5440) [Rust][Parquet] Rust Parquet requiring libstd-xxx.so dependency on centos

2019-06-17 Thread Chao Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16865808#comment-16865808
 ] 

Chao Sun commented on ARROW-5440:
-

Actually, this is already removed in the latest version 0.13.0 of parquet-rs. 
Could you try that and see if you still get the issue?

> [Rust][Parquet] Rust Parquet requiring libstd-xxx.so dependency on centos
> -
>
> Key: ARROW-5440
> URL: https://issues.apache.org/jira/browse/ARROW-5440
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
> Environment: CentOS Linux release 7.6.1810 (Core) 
>Reporter: Tenzin Rigden
>Priority: Major
> Attachments: parquet-test-libstd.tar.gz, serde_json_test.tar.gz
>
>
> Hello,
> In the rust parquet implementation ([https://github.com/sunchao/parquet-rs]) 
> on centos, the binary created has a `libstd-hash.so` shared library 
> dependency that is causing issues since it's a shared library found in the 
> rustup directory. This `libstd-hash.so` dependency isn't there on any other 
> rust binaries I've made before. This dependency means that I can't run this 
> binary anywhere where rustup isn't installed with that exact libstd library.
> This is not an issue on Mac.
> I've attached the rust files and here is the command line output below.
> {code:java|title=cli-output|borderStyle=solid}
> [centos@_ parquet-test]$ cat /etc/centos-release
> CentOS Linux release 7.6.1810 (Core)
> [centos@_ parquet-test]$ rustc --version
> rustc 1.36.0-nightly (e70d5386d 2019-05-27)
> [centos@_ parquet-test]$ ldd target/release/parquet-test
> linux-vdso.so.1 =>  (0x7ffd02fee000)
> libstd-44988553032616b2.so => not found
> librt.so.1 => /lib64/librt.so.1 (0x7f6ecd209000)
> libpthread.so.0 => /lib64/libpthread.so.0 (0x7f6eccfed000)
> libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x7f6eccdd7000)
> libc.so.6 => /lib64/libc.so.6 (0x7f6ecca0a000)
> libm.so.6 => /lib64/libm.so.6 (0x7f6ecc708000)
> /lib64/ld-linux-x86-64.so.2 (0x7f6ecd8b1000)
> [centos@_ parquet-test]$ ls -l 
> ~/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/libstd-44988553032616b2.so
> -rw-r--r--. 1 centos centos 5623568 May 27 21:46 
> /home/centos/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/libstd-44988553032616b2.so
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5440) [Rust][Parquet] Rust Parquet requiring libstd-xxx.so dependency on centos

2019-06-17 Thread Chao Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16865800#comment-16865800
 ] 

Chao Sun commented on ARROW-5440:
-

It seems this is because the usage of {{[feature(rustc_private)]}} in the 
crate, which really isn't necessary. I'll file a PR to remove it.

> [Rust][Parquet] Rust Parquet requiring libstd-xxx.so dependency on centos
> -
>
> Key: ARROW-5440
> URL: https://issues.apache.org/jira/browse/ARROW-5440
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
> Environment: CentOS Linux release 7.6.1810 (Core) 
>Reporter: Tenzin Rigden
>Priority: Major
> Attachments: parquet-test-libstd.tar.gz, serde_json_test.tar.gz
>
>
> Hello,
> In the rust parquet implementation ([https://github.com/sunchao/parquet-rs]) 
> on centos, the binary created has a `libstd-hash.so` shared library 
> dependency that is causing issues since it's a shared library found in the 
> rustup directory. This `libstd-hash.so` dependency isn't there on any other 
> rust binaries I've made before. This dependency means that I can't run this 
> binary anywhere where rustup isn't installed with that exact libstd library.
> This is not an issue on Mac.
> I've attached the rust files and here is the command line output below.
> {code:java|title=cli-output|borderStyle=solid}
> [centos@_ parquet-test]$ cat /etc/centos-release
> CentOS Linux release 7.6.1810 (Core)
> [centos@_ parquet-test]$ rustc --version
> rustc 1.36.0-nightly (e70d5386d 2019-05-27)
> [centos@_ parquet-test]$ ldd target/release/parquet-test
> linux-vdso.so.1 =>  (0x7ffd02fee000)
> libstd-44988553032616b2.so => not found
> librt.so.1 => /lib64/librt.so.1 (0x7f6ecd209000)
> libpthread.so.0 => /lib64/libpthread.so.0 (0x7f6eccfed000)
> libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x7f6eccdd7000)
> libc.so.6 => /lib64/libc.so.6 (0x7f6ecca0a000)
> libm.so.6 => /lib64/libm.so.6 (0x7f6ecc708000)
> /lib64/ld-linux-x86-64.so.2 (0x7f6ecd8b1000)
> [centos@_ parquet-test]$ ls -l 
> ~/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/libstd-44988553032616b2.so
> -rw-r--r--. 1 centos centos 5623568 May 27 21:46 
> /home/centos/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/libstd-44988553032616b2.so
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5440) [Rust][Parquet] Rust Parquet requiring libstd-xxx.so dependency on centos

2019-06-14 Thread Chao Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16864373#comment-16864373
 ] 

Chao Sun commented on ARROW-5440:
-

BTW: you mentioned this is not an issue in Mac. Does it mean under Mac it will 
statically link the `libstd-XXX.so`?

> [Rust][Parquet] Rust Parquet requiring libstd-xxx.so dependency on centos
> -
>
> Key: ARROW-5440
> URL: https://issues.apache.org/jira/browse/ARROW-5440
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
> Environment: CentOS Linux release 7.6.1810 (Core) 
>Reporter: Tenzin Rigden
>Priority: Major
> Attachments: parquet-test-libstd.tar.gz, serde_json_test.tar.gz
>
>
> Hello,
> In the rust parquet implementation ([https://github.com/sunchao/parquet-rs]) 
> on centos, the binary created has a `libstd-hash.so` shared library 
> dependency that is causing issues since it's a shared library found in the 
> rustup directory. This `libstd-hash.so` dependency isn't there on any other 
> rust binaries I've made before. This dependency means that I can't run this 
> binary anywhere where rustup isn't installed with that exact libstd library.
> This is not an issue on Mac.
> I've attached the rust files and here is the command line output below.
> {code:java|title=cli-output|borderStyle=solid}
> [centos@_ parquet-test]$ cat /etc/centos-release
> CentOS Linux release 7.6.1810 (Core)
> [centos@_ parquet-test]$ rustc --version
> rustc 1.36.0-nightly (e70d5386d 2019-05-27)
> [centos@_ parquet-test]$ ldd target/release/parquet-test
> linux-vdso.so.1 =>  (0x7ffd02fee000)
> libstd-44988553032616b2.so => not found
> librt.so.1 => /lib64/librt.so.1 (0x7f6ecd209000)
> libpthread.so.0 => /lib64/libpthread.so.0 (0x7f6eccfed000)
> libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x7f6eccdd7000)
> libc.so.6 => /lib64/libc.so.6 (0x7f6ecca0a000)
> libm.so.6 => /lib64/libm.so.6 (0x7f6ecc708000)
> /lib64/ld-linux-x86-64.so.2 (0x7f6ecd8b1000)
> [centos@_ parquet-test]$ ls -l 
> ~/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/libstd-44988553032616b2.so
> -rw-r--r--. 1 centos centos 5623568 May 27 21:46 
> /home/centos/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/libstd-44988553032616b2.so
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5440) [Rust][Parquet] Rust Parquet requiring libstd-xxx.so dependency on centos

2019-06-14 Thread Chao Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16864370#comment-16864370
 ] 

Chao Sun commented on ARROW-5440:
-

Hmm I see. Interesting that if I just change the `parquet-test` repo to use any 
of the crate used by `parquet-rs`, the `libstd-XXX` won't be dynamically 
linked. But somehow it is linked when using `parquet-rs`.. Not sure what 
affected this.

> [Rust][Parquet] Rust Parquet requiring libstd-xxx.so dependency on centos
> -
>
> Key: ARROW-5440
> URL: https://issues.apache.org/jira/browse/ARROW-5440
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
> Environment: CentOS Linux release 7.6.1810 (Core) 
>Reporter: Tenzin Rigden
>Priority: Major
> Attachments: parquet-test-libstd.tar.gz, serde_json_test.tar.gz
>
>
> Hello,
> In the rust parquet implementation ([https://github.com/sunchao/parquet-rs]) 
> on centos, the binary created has a `libstd-hash.so` shared library 
> dependency that is causing issues since it's a shared library found in the 
> rustup directory. This `libstd-hash.so` dependency isn't there on any other 
> rust binaries I've made before. This dependency means that I can't run this 
> binary anywhere where rustup isn't installed with that exact libstd library.
> This is not an issue on Mac.
> I've attached the rust files and here is the command line output below.
> {code:java|title=cli-output|borderStyle=solid}
> [centos@_ parquet-test]$ cat /etc/centos-release
> CentOS Linux release 7.6.1810 (Core)
> [centos@_ parquet-test]$ rustc --version
> rustc 1.36.0-nightly (e70d5386d 2019-05-27)
> [centos@_ parquet-test]$ ldd target/release/parquet-test
> linux-vdso.so.1 =>  (0x7ffd02fee000)
> libstd-44988553032616b2.so => not found
> librt.so.1 => /lib64/librt.so.1 (0x7f6ecd209000)
> libpthread.so.0 => /lib64/libpthread.so.0 (0x7f6eccfed000)
> libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x7f6eccdd7000)
> libc.so.6 => /lib64/libc.so.6 (0x7f6ecca0a000)
> libm.so.6 => /lib64/libm.so.6 (0x7f6ecc708000)
> /lib64/ld-linux-x86-64.so.2 (0x7f6ecd8b1000)
> [centos@_ parquet-test]$ ls -l 
> ~/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/libstd-44988553032616b2.so
> -rw-r--r--. 1 centos centos 5623568 May 27 21:46 
> /home/centos/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/libstd-44988553032616b2.so
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5613) [Rust] Fail to compile with unrecognized platform-specific intrinsic function

2019-06-14 Thread Chao Sun (JIRA)
Chao Sun created ARROW-5613:
---

 Summary: [Rust] Fail to compile with unrecognized 
platform-specific intrinsic function
 Key: ARROW-5613
 URL: https://issues.apache.org/jira/browse/ARROW-5613
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust
Reporter: Chao Sun


I'm testing a project which depends on the Arrow crate. It failed with the 
following error:
{code}
error[E0441]: unrecognized platform-specific intrinsic function: `simd_bitmask`
   --> 
/Users/sunchao/.cargo/registry/src/github.com-1ecc6299db9ec823/packed_simd-0.3.3/src/codegen/llvm.rs:100:5
|
100 | crate fn simd_bitmask(value: T) -> U;
| ^^^

error: aborting due to previous error

For more information about this error, try `rustc --explain E0441`.
error: Could not compile `packed_simd`.
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5440) [Rust][Parquet] Rust Parquet requiring libstd-xxx.so dependency on centos

2019-06-13 Thread Chao Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16863692#comment-16863692
 ] 

Chao Sun commented on ARROW-5440:
-

Are you asking how to compile the project without `rustup`? what exactly is the 
error you got? `rustup` is the most convenient method but you can also install 
the `libstd-XXX` library through other ways and run the binary though (through 
`LD_LIBRARY_PATH` flag).

I'm not really sure this is a parquet-rs issue.

> [Rust][Parquet] Rust Parquet requiring libstd-xxx.so dependency on centos
> -
>
> Key: ARROW-5440
> URL: https://issues.apache.org/jira/browse/ARROW-5440
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
> Environment: CentOS Linux release 7.6.1810 (Core) 
>Reporter: Tenzin Rigden
>Priority: Major
> Attachments: parquet-test-libstd.tar.gz
>
>
> Hello,
> In the rust parquet implementation ([https://github.com/sunchao/parquet-rs]) 
> on centos, the binary created has a `libstd-hash.so` shared library 
> dependency that is causing issues since it's a shared library found in the 
> rustup directory. This `libstd-hash.so` dependency isn't there on any other 
> rust binaries I've made before. This dependency means that I can't run this 
> binary anywhere where rustup isn't installed with that exact libstd library.
> This is not an issue on Mac.
> I've attached the rust files and here is the command line output below.
> {code:java|title=cli-output|borderStyle=solid}
> [centos@_ parquet-test]$ cat /etc/centos-release
> CentOS Linux release 7.6.1810 (Core)
> [centos@_ parquet-test]$ rustc --version
> rustc 1.36.0-nightly (e70d5386d 2019-05-27)
> [centos@_ parquet-test]$ ldd target/release/parquet-test
> linux-vdso.so.1 =>  (0x7ffd02fee000)
> libstd-44988553032616b2.so => not found
> librt.so.1 => /lib64/librt.so.1 (0x7f6ecd209000)
> libpthread.so.0 => /lib64/libpthread.so.0 (0x7f6eccfed000)
> libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x7f6eccdd7000)
> libc.so.6 => /lib64/libc.so.6 (0x7f6ecca0a000)
> libm.so.6 => /lib64/libm.so.6 (0x7f6ecc708000)
> /lib64/ld-linux-x86-64.so.2 (0x7f6ecd8b1000)
> [centos@_ parquet-test]$ ls -l 
> ~/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/libstd-44988553032616b2.so
> -rw-r--r--. 1 centos centos 5623568 May 27 21:46 
> /home/centos/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/libstd-44988553032616b2.so
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-5358) [Rust] Implement equality check for ArrayData and Array

2019-06-13 Thread Chao Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun reassigned ARROW-5358:
---

Assignee: Chao Sun

> [Rust] Implement equality check for ArrayData and Array
> ---
>
> Key: ARROW-5358
> URL: https://issues.apache.org/jira/browse/ARROW-5358
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
>
> Currently {{Array}} doesn't implement the {{Eq}} trait. Although 
> {{ArrayData}} derives from the {{PartialEq}} trait, the default 
> implementation is not suitable here. Instead, we should implement customized 
> equality comparison.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5440) [Rust][Parquet] Rust Parquet requiring libstd-xxx.so dependency on centos

2019-06-13 Thread Chao Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16862782#comment-16862782
 ] 

Chao Sun commented on ARROW-5440:
-

Sorry [~jstoneham] [~rigden33] didn't see this JIRA. This probably is from one 
of the dependencies. I'm not too familiar with the shared library dependency 
stuff but let me take a look and get back to you soon.

> [Rust][Parquet] Rust Parquet requiring libstd-xxx.so dependency on centos
> -
>
> Key: ARROW-5440
> URL: https://issues.apache.org/jira/browse/ARROW-5440
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
> Environment: CentOS Linux release 7.6.1810 (Core) 
>Reporter: Tenzin Rigden
>Priority: Major
> Attachments: parquet-test-libstd.tar.gz
>
>
> Hello,
> In the rust parquet implementation ([https://github.com/sunchao/parquet-rs]) 
> on centos, the binary created has a `libstd-hash.so` shared library 
> dependency that is causing issues since it's a shared library found in the 
> rustup directory. This `libstd-hash.so` dependency isn't there on any other 
> rust binaries I've made before. This dependency means that I can't run this 
> binary anywhere where rustup isn't installed with that exact libstd library.
> This is not an issue on Mac.
> I've attached the rust files and here is the command line output below.
> {code:java|title=cli-output|borderStyle=solid}
> [centos@_ parquet-test]$ cat /etc/centos-release
> CentOS Linux release 7.6.1810 (Core)
> [centos@_ parquet-test]$ rustc --version
> rustc 1.36.0-nightly (e70d5386d 2019-05-27)
> [centos@_ parquet-test]$ ldd target/release/parquet-test
> linux-vdso.so.1 =>  (0x7ffd02fee000)
> libstd-44988553032616b2.so => not found
> librt.so.1 => /lib64/librt.so.1 (0x7f6ecd209000)
> libpthread.so.0 => /lib64/libpthread.so.0 (0x7f6eccfed000)
> libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x7f6eccdd7000)
> libc.so.6 => /lib64/libc.so.6 (0x7f6ecca0a000)
> libm.so.6 => /lib64/libm.so.6 (0x7f6ecc708000)
> /lib64/ld-linux-x86-64.so.2 (0x7f6ecd8b1000)
> [centos@_ parquet-test]$ ls -l 
> ~/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/libstd-44988553032616b2.so
> -rw-r--r--. 1 centos centos 5623568 May 27 21:46 
> /home/centos/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/libstd-44988553032616b2.so
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-4304) [Rust] Enhance documentation for arrow

2019-06-11 Thread Chao Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-4304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16861536#comment-16861536
 ] 

Chao Sun commented on ARROW-4304:
-

[~npr] Yes it would be great if we can publish the Rust doc on 
arrow.apache.org. Let me file a JIRA for that and perhaps you can give us some 
pointer on how to do that. 

> [Rust] Enhance documentation for arrow
> --
>
> Key: ARROW-4304
> URL: https://issues.apache.org/jira/browse/ARROW-4304
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Documentation, Rust
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
> Fix For: 0.14.0
>
>
> The documentation for arrow crate (https://docs.rs/arrow/0.12.0/arrow/) is 
> not complete. We should add more content to it to help people who want to use 
> the crate.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5561) [Release] Build and publish Rust docs

2019-06-11 Thread Chao Sun (JIRA)
Chao Sun created ARROW-5561:
---

 Summary: [Release] Build and publish Rust docs
 Key: ARROW-5561
 URL: https://issues.apache.org/jira/browse/ARROW-5561
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Documentation
Reporter: Chao Sun


Besides docs.rs, we can perhaps host the Rust documentation on 
arrow.apache.org, along with other languages. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-4304) [Rust] Enhance documentation for arrow

2019-06-11 Thread Chao Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-4304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16861514#comment-16861514
 ] 

Chao Sun commented on ARROW-4304:
-

Sorry for the delay. I think the scope is not that big. Let me try to push this 
in 0.14.0.

> [Rust] Enhance documentation for arrow
> --
>
> Key: ARROW-4304
> URL: https://issues.apache.org/jira/browse/ARROW-4304
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Documentation, Rust
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
> Fix For: 0.14.0
>
>
> The documentation for arrow crate (https://docs.rs/arrow/0.12.0/arrow/) is 
> not complete. We should add more content to it to help people who want to use 
> the crate.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-5463) [Rust] Implement AsRef for Buffer

2019-06-04 Thread Chao Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun resolved ARROW-5463.
-
   Resolution: Fixed
Fix Version/s: 0.14.0

Issue resolved by pull request 4450
[https://github.com/apache/arrow/pull/4450]

> [Rust] Implement AsRef for Buffer
> -
>
> Key: ARROW-5463
> URL: https://issues.apache.org/jira/browse/ARROW-5463
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust
>Reporter: Renjie Liu
>Assignee: Renjie Liu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Implement AsRef ArrowNativeType for Buffer



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5479) [Rust] [DataFusion] Use ARROW_TEST_DATA instead of relative path for testing

2019-06-02 Thread Chao Sun (JIRA)
Chao Sun created ARROW-5479:
---

 Summary: [Rust] [DataFusion] Use ARROW_TEST_DATA instead of 
relative path for testing
 Key: ARROW-5479
 URL: https://issues.apache.org/jira/browse/ARROW-5479
 Project: Apache Arrow
  Issue Type: Test
  Components: Rust - DataFusion
Reporter: Chao Sun
Assignee: Chao Sun






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-5455) [Rust] Build broken by 2019-05-30 Rust nightly

2019-05-31 Thread Chao Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun reassigned ARROW-5455:
---

Assignee: Chao Sun

> [Rust] Build broken by 2019-05-30 Rust nightly
> --
>
> Key: ARROW-5455
> URL: https://issues.apache.org/jira/browse/ARROW-5455
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Reporter: Wes McKinney
>Assignee: Chao Sun
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Seem example failed build
> https://travis-ci.org/apache/arrow/jobs/539477452



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-5317) [Rust] [Parquet] impl IntoIterator for SerializedFileReader

2019-05-19 Thread Chao Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun resolved ARROW-5317.
-
   Resolution: Fixed
Fix Version/s: 0.14.0

Issue resolved by pull request 4323
[https://github.com/apache/arrow/pull/4323]

> [Rust] [Parquet] impl IntoIterator for SerializedFileReader
> ---
>
> Key: ARROW-5317
> URL: https://issues.apache.org/jira/browse/ARROW-5317
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Fabio Silva
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> This is a follow up to [https://github.com/apache/arrow/issues/4301].
> The current implementation of a row iterator *RowIter* borrows the 
> *FileReader*
>  which the user has to keep the file reader alive for as long as the iterator 
> is alive..
> And make is hard to iterate over multiple *FileReader* / *RowIter*..
> {code:java}
> fn main() {
> let path1 = Path::new("path-to/1.snappy.parquet");
> let path2 = Path::new("path-to/2.snappy.parquet");
> let vec = vec![path1, path2];
> let it = vec.iter()
> .map(|p| {
> File::open(p).unwrap()
> })
> .map(|f| {
> SerializedFileReader::new(f).unwrap()
> })
> .flat_map(|reader| -> RowIter {
> RowIter::from_file(None, ).unwrap()
> //| ||
> //| |`reader` is borrowed here
> //| returns a value referencing data owned by the current function
> })
> ;
> for r in it {
> println!("{}", r);
> }
> }
> {code}
> One solution could be to implement a row iterator that takes owners of the 
> reader.
> Perhaps implementing *std::iter::IntoIterator* for the *SerializedFileReader*
> {code:java}
> 
> .map(|p| {
> File::open(p).unwrap()
> })
> .map(|f| {
> SerializedFileReader::new(f).unwrap()
> })
> .flat_map(|r| -> r.into_iter())
> 
> {code}
>  
> Happy to put a PR out with this..
>  Please let me know if this makes sense and you guys already have some way of 
> doing this..



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-5360) [Rust] Builds are broken by rustyline on nightly 2019-05-16+

2019-05-17 Thread Chao Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun reassigned ARROW-5360:
---

Assignee: Neville Dipale

> [Rust] Builds are broken by rustyline on nightly 2019-05-16+
> 
>
> Key: ARROW-5360
> URL: https://issues.apache.org/jira/browse/ARROW-5360
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust - DataFusion
>Reporter: Neville Dipale
>Assignee: Neville Dipale
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Rust builds are broken on nightly since 2019-05-16. Please see 
> [https://github.com/kkawakam/rustyline/issues/217]
> The issue might need to be fixed on the rustyline crate.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-5360) [Rust] Builds are broken by rustyline on nightly 2019-05-16+

2019-05-17 Thread Chao Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun resolved ARROW-5360.
-
   Resolution: Fixed
Fix Version/s: 0.14.0

Issue resolved by pull request 4337
[https://github.com/apache/arrow/pull/4337]

> [Rust] Builds are broken by rustyline on nightly 2019-05-16+
> 
>
> Key: ARROW-5360
> URL: https://issues.apache.org/jira/browse/ARROW-5360
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust - DataFusion
>Reporter: Neville Dipale
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Rust builds are broken on nightly since 2019-05-16. Please see 
> [https://github.com/kkawakam/rustyline/issues/217]
> The issue might need to be fixed on the rustyline crate.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5360) [Rust] Builds are broken by rustyline on nightly 2019-05-16+

2019-05-17 Thread Chao Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16842310#comment-16842310
 ] 

Chao Sun commented on ARROW-5360:
-

Can we temporarily use an old nightly version (e.g., nightly-2019-05-15) to 
unblock the CI? We can always switch back to the latest nightly later after the 
issue is fixed.

> [Rust] Builds are broken by rustyline on nightly 2019-05-16+
> 
>
> Key: ARROW-5360
> URL: https://issues.apache.org/jira/browse/ARROW-5360
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust - DataFusion
>Reporter: Neville Dipale
>Priority: Critical
>
> Rust builds are broken on nightly since 2019-05-16. Please see 
> [https://github.com/kkawakam/rustyline/issues/217]
> The issue might need to be fixed on the rustyline crate.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5357) [Rust] Change Buffer::len to represent total bytes instead of used bytes

2019-05-16 Thread Chao Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16841825#comment-16841825
 ] 

Chao Sun commented on ARROW-5357:
-

Yes we can perhaps do that - this is how C++ does anyway.

Originally the problem was: when I compare a buffer I created via 
`Buffer::from(&[1, 2, 3])` versus a buffer created from flatbuffer with the 
same data, it will fail because on the left hand side the `len` is less than 
`capacity`, but on the right hand side `len` = `capacity`.

However thinking on it more, maybe we should not compare buffer directly 
without other contexts, such as how many valid elements are stored in the 
buffer. Also, unless we always zero-out the padded bytes in a buffer, 
comparison on buffer will likely to fail.

> [Rust] Change Buffer::len to represent total bytes instead of used bytes
> 
>
> Key: ARROW-5357
> URL: https://issues.apache.org/jira/browse/ARROW-5357
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
>
> Currently {{Buffer::len}} records the number of used bytes, as opposed to the 
> number of total bytes. This poses a problem when converting from buffers 
> defined in flatbuffer, where the length is actually the number of allocated 
> bytes for the buffer. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5358) [Rust] Implement equality check for ArrayData and Array

2019-05-16 Thread Chao Sun (JIRA)
Chao Sun created ARROW-5358:
---

 Summary: [Rust] Implement equality check for ArrayData and Array
 Key: ARROW-5358
 URL: https://issues.apache.org/jira/browse/ARROW-5358
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Chao Sun


Currently {{Array}} doesn't implement the {{Eq}} trait. Although {{ArrayData}} 
derives from the {{PartialEq}} trait, the default implementation is not 
suitable here. Instead, we should implement customized equality comparison.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5357) [Rust] Change Buffer::len to represent total bytes instead of used bytes

2019-05-16 Thread Chao Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated ARROW-5357:

Summary: [Rust] Change Buffer::len to represent total bytes instead of used 
bytes  (was: [Rust] change Buffer::len to represent total bytes instead of used 
bytes)

> [Rust] Change Buffer::len to represent total bytes instead of used bytes
> 
>
> Key: ARROW-5357
> URL: https://issues.apache.org/jira/browse/ARROW-5357
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
>
> Currently {{Buffer::len}} records the number of used bytes, as opposed to the 
> number of total bytes. This poses a problem when converting from buffers 
> defined in flatbuffer, where the length is actually the number of allocated 
> bytes for the buffer. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5357) [Rust] change Buffer::len to represent total bytes instead of used bytes

2019-05-16 Thread Chao Sun (JIRA)
Chao Sun created ARROW-5357:
---

 Summary: [Rust] change Buffer::len to represent total bytes 
instead of used bytes
 Key: ARROW-5357
 URL: https://issues.apache.org/jira/browse/ARROW-5357
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Chao Sun
Assignee: Chao Sun


Currently {{Buffer::len}} records the number of used bytes, as opposed to the 
number of total bytes. This poses a problem when converting from buffers 
defined in flatbuffer, where the length is actually the number of allocated 
bytes for the buffer. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-5284) [Rust] Replace libc with std::alloc for memory allocation

2019-05-14 Thread Chao Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun resolved ARROW-5284.
-
   Resolution: Fixed
Fix Version/s: 0.14.0

Issue resolved by pull request 4273
[https://github.com/apache/arrow/pull/4273]

> [Rust] Replace libc with std::alloc for memory allocation
> -
>
> Key: ARROW-5284
> URL: https://issues.apache.org/jira/browse/ARROW-5284
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-4806) [Rust] Support casting temporal arrays in cast kernels

2019-05-14 Thread Chao Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun resolved ARROW-4806.
-
Resolution: Fixed

Issue resolved by pull request 4150
[https://github.com/apache/arrow/pull/4150]

> [Rust] Support casting temporal arrays in cast kernels
> --
>
> Key: ARROW-4806
> URL: https://issues.apache.org/jira/browse/ARROW-4806
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust
>Affects Versions: 0.12.0
>Reporter: Neville Dipale
>Assignee: Neville Dipale
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> [ARROW-3882] is too far in the review process to add temporal casts.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5317) [Rust] [Parquet] impl IntoIterator for SerializedFileReader

2019-05-14 Thread Chao Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16839627#comment-16839627
 ] 

Chao Sun commented on ARROW-5317:
-

[~wesmckinn] @andygrove: could you add [~FabioBatSilva] into the contributor 
list so we can assign this Jira to him? Thanks.

> [Rust] [Parquet] impl IntoIterator for SerializedFileReader
> ---
>
> Key: ARROW-5317
> URL: https://issues.apache.org/jira/browse/ARROW-5317
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Fabio Batista da Silva
>Priority: Minor
>
> This is a follow up to [https://github.com/apache/arrow/issues/4301].
> The current implementation of a row iterator *RowIter* borrows the 
> *FileReader*
>  which the user has to keep the file reader alive for as long as the iterator 
> is alive..
> And make is hard to iterate over multiple *FileReader* / *RowIter*..
> {code:java}
> fn main() {
> let path1 = Path::new("path-to/1.snappy.parquet");
> let path2 = Path::new("path-to/2.snappy.parquet");
> let vec = vec![path1, path2];
> let it = vec.iter()
> .map(|p| {
> File::open(p).unwrap()
> })
> .map(|f| {
> SerializedFileReader::new(f).unwrap()
> })
> .flat_map(|reader| -> RowIter {
> RowIter::from_file(None, ).unwrap()
> //| ||
> //| |`reader` is borrowed here
> //| returns a value referencing data owned by the current function
> })
> ;
> for r in it {
> println!("{}", r);
> }
> }
> {code}
> One solution could be to implement a row iterator that takes owners of the 
> reader.
> Perhaps implementing *std::iter::IntoIterator* for the *SerializedFileReader*
> {code:java}
> 
> .map(|p| {
> File::open(p).unwrap()
> })
> .map(|f| {
> SerializedFileReader::new(f).unwrap()
> })
> .flat_map(|r| -> r.into_iter())
> 
> {code}
>  
> Happy to put a PR out with this..
>  Please let me know if this makes sense and you guys already have some way of 
> doing this..



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (ARROW-5317) [Rust] [Parquet] impl IntoIterator for SerializedFileReader

2019-05-14 Thread Chao Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16839627#comment-16839627
 ] 

Chao Sun edited comment on ARROW-5317 at 5/14/19 5:19 PM:
--

[~wesmckinn], [~andygrove] could you add [~FabioBatSilva] into the contributor 
list so we can assign this Jira to him? Thanks.


was (Author: csun):
[~wesmckinn] @andygrove: could you add [~FabioBatSilva] into the contributor 
list so we can assign this Jira to him? Thanks.

> [Rust] [Parquet] impl IntoIterator for SerializedFileReader
> ---
>
> Key: ARROW-5317
> URL: https://issues.apache.org/jira/browse/ARROW-5317
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Fabio Batista da Silva
>Priority: Minor
>
> This is a follow up to [https://github.com/apache/arrow/issues/4301].
> The current implementation of a row iterator *RowIter* borrows the 
> *FileReader*
>  which the user has to keep the file reader alive for as long as the iterator 
> is alive..
> And make is hard to iterate over multiple *FileReader* / *RowIter*..
> {code:java}
> fn main() {
> let path1 = Path::new("path-to/1.snappy.parquet");
> let path2 = Path::new("path-to/2.snappy.parquet");
> let vec = vec![path1, path2];
> let it = vec.iter()
> .map(|p| {
> File::open(p).unwrap()
> })
> .map(|f| {
> SerializedFileReader::new(f).unwrap()
> })
> .flat_map(|reader| -> RowIter {
> RowIter::from_file(None, ).unwrap()
> //| ||
> //| |`reader` is borrowed here
> //| returns a value referencing data owned by the current function
> })
> ;
> for r in it {
> println!("{}", r);
> }
> }
> {code}
> One solution could be to implement a row iterator that takes owners of the 
> reader.
> Perhaps implementing *std::iter::IntoIterator* for the *SerializedFileReader*
> {code:java}
> 
> .map(|p| {
> File::open(p).unwrap()
> })
> .map(|f| {
> SerializedFileReader::new(f).unwrap()
> })
> .flat_map(|r| -> r.into_iter())
> 
> {code}
>  
> Happy to put a PR out with this..
>  Please let me know if this makes sense and you guys already have some way of 
> doing this..



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5317) [Rust] [Parquet] impl IntoIterator for SerializedFileReader

2019-05-14 Thread Chao Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16839624#comment-16839624
 ] 

Chao Sun commented on ARROW-5317:
-

[~FabioBatSilva] yes this does make sense. Can you put a PR for this? Thanks!

> [Rust] [Parquet] impl IntoIterator for SerializedFileReader
> ---
>
> Key: ARROW-5317
> URL: https://issues.apache.org/jira/browse/ARROW-5317
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Fabio Batista da Silva
>Priority: Minor
>
> This is a follow up to [https://github.com/apache/arrow/issues/4301].
> The current implementation of a row iterator *RowIter* borrows the 
> *FileReader*
>  which the user has to keep the file reader alive for as long as the iterator 
> is alive..
> And make is hard to iterate over multiple *FileReader* / *RowIter*..
> {code:java}
> fn main() {
> let path1 = Path::new("path-to/1.snappy.parquet");
> let path2 = Path::new("path-to/2.snappy.parquet");
> let vec = vec![path1, path2];
> let it = vec.iter()
> .map(|p| {
> File::open(p).unwrap()
> })
> .map(|f| {
> SerializedFileReader::new(f).unwrap()
> })
> .flat_map(|reader| -> RowIter {
> RowIter::from_file(None, ).unwrap()
> //| ||
> //| |`reader` is borrowed here
> //| returns a value referencing data owned by the current function
> })
> ;
> for r in it {
> println!("{}", r);
> }
> }
> {code}
> One solution could be to implement a row iterator that takes owners of the 
> reader.
> Perhaps implementing *std::iter::IntoIterator* for the *SerializedFileReader*
> {code:java}
> 
> .map(|p| {
> File::open(p).unwrap()
> })
> .map(|f| {
> SerializedFileReader::new(f).unwrap()
> })
> .flat_map(|r| -> r.into_iter())
> 
> {code}
>  
> Happy to put a PR out with this..
>  Please let me know if this makes sense and you guys already have some way of 
> doing this..



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-5298) [Rust] Add debug implementation for Buffer

2019-05-09 Thread Chao Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun resolved ARROW-5298.
-
   Resolution: Fixed
Fix Version/s: 0.14.0

Issue resolved by pull request 4287
[https://github.com/apache/arrow/pull/4287]

> [Rust] Add debug implementation for Buffer
> --
>
> Key: ARROW-5298
> URL: https://issues.apache.org/jira/browse/ARROW-5298
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Renjie Liu
>Assignee: Renjie Liu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Default debug implementation is not good enough for debugging.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-5281) [Rust] [Parquet] Move DataPageBuilder to test_common

2019-05-09 Thread Chao Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun resolved ARROW-5281.
-
   Resolution: Fixed
Fix Version/s: 0.14.0

Issue resolved by pull request 4269
[https://github.com/apache/arrow/pull/4269]

> [Rust] [Parquet] Move DataPageBuilder to test_common
> 
>
> Key: ARROW-5281
> URL: https://issues.apache.org/jira/browse/ARROW-5281
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Renjie Liu
>Assignee: Renjie Liu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> DataPageBuilder is a helpful tool for mocking test page data, it's worthy to 
> move it to test_common so that other parts can reuse it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5284) [Rust] Replace libc with std::alloc for memory allocation

2019-05-07 Thread Chao Sun (JIRA)
Chao Sun created ARROW-5284:
---

 Summary: [Rust] Replace libc with std::alloc for memory allocation
 Key: ARROW-5284
 URL: https://issues.apache.org/jira/browse/ARROW-5284
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Chao Sun
Assignee: Chao Sun






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5191) [Rust] [Rust] Expose CSV and JSON reader schemas

2019-04-25 Thread Chao Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated ARROW-5191:

Summary: [Rust] [Rust] Expose CSV and JSON reader schemas  (was: [Rust] 
Expose schema in readers (CSV, JSON) without reading batches)

> [Rust] [Rust] Expose CSV and JSON reader schemas
> 
>
> Key: ARROW-5191
> URL: https://issues.apache.org/jira/browse/ARROW-5191
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Neville Dipale
>Assignee: Neville Dipale
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> It's sometimes convenient to be able to view a datasource's schema without 
> reading the first record batch. This is a proposal to create a `pub fn 
> schema() -> Arc` on the various readers that we support.
> I think this would also enable schema inference in datafusion



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5191) [Rust] Expose CSV and JSON reader schemas

2019-04-25 Thread Chao Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated ARROW-5191:

Summary: [Rust] Expose CSV and JSON reader schemas  (was: [Rust] [Rust] 
Expose CSV and JSON reader schemas)

> [Rust] Expose CSV and JSON reader schemas
> -
>
> Key: ARROW-5191
> URL: https://issues.apache.org/jira/browse/ARROW-5191
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Neville Dipale
>Assignee: Neville Dipale
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> It's sometimes convenient to be able to view a datasource's schema without 
> reading the first record batch. This is a proposal to create a `pub fn 
> schema() -> Arc` on the various readers that we support.
> I think this would also enable schema inference in datafusion



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-5191) [Rust] Expose schema in readers (CSV, JSON) without reading batches

2019-04-25 Thread Chao Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun resolved ARROW-5191.
-
Resolution: Fixed

Issue resolved by pull request 4181
[https://github.com/apache/arrow/pull/4181]

> [Rust] Expose schema in readers (CSV, JSON) without reading batches
> ---
>
> Key: ARROW-5191
> URL: https://issues.apache.org/jira/browse/ARROW-5191
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Neville Dipale
>Assignee: Neville Dipale
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> It's sometimes convenient to be able to view a datasource's schema without 
> reading the first record batch. This is a proposal to create a `pub fn 
> schema() -> Arc` on the various readers that we support.
> I think this would also enable schema inference in datafusion



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-5189) [Rust] [Parquet] Format individual fields within a parquet row

2019-04-19 Thread Chao Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun resolved ARROW-5189.
-
   Resolution: Fixed
Fix Version/s: 0.14.0

Issue resolved by pull request 4174
[https://github.com/apache/arrow/pull/4174]

> [Rust] [Parquet] Format individual fields within a parquet row
> --
>
> Key: ARROW-5189
> URL: https://issues.apache.org/jira/browse/ARROW-5189
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust
>Reporter: Fabio Batista da Silva
>Priority: Minor
>  Labels: Parquet, pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> The current api for a *parquet::record::Row*  doesn't provide a way to get a 
> string representation of individual column within a Row.
> All *Field* s in a row already implement *{{fmt::Display}}* but there is now 
> way to format individual fields since the *Row#fields* is not exposed by the 
> api.
> Formatting individual fields seems like a common problem,
>  I belie having this as part of the *parquet::record::Row* api would would be 
> helpful to anyone peking into values within a row..
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5189) [Rust] [Parquet] Format individual fields within a parquet row

2019-04-19 Thread Chao Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated ARROW-5189:

Summary: [Rust] [Parquet] Format individual fields within a parquet row  
(was: Format individual fields within a parquet row)

> [Rust] [Parquet] Format individual fields within a parquet row
> --
>
> Key: ARROW-5189
> URL: https://issues.apache.org/jira/browse/ARROW-5189
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust
>Reporter: Fabio Batista da Silva
>Priority: Minor
>  Labels: Parquet
>
> The current api for a *parquet::record::Row*  doesn't provide a way to get a 
> string representation of individual column within a Row.
> All *Field* s in a row already implement *{{fmt::Display}}* but there is now 
> way to format individual fields since the *Row#fields* is not exposed by the 
> api.
> Formatting individual fields seems like a common problem,
>  I belie having this as part of the *parquet::record::Row* api would would be 
> helpful to anyone peking into values within a row..
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5184) [Rust] Broken links and other documentation warnings

2019-04-18 Thread Chao Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16821404#comment-16821404
 ] 

Chao Sun commented on ARROW-5184:
-

[~andygrove] could you add [~jblon...@gmail.com] to the contributor list so we 
can assign this JIRA to him? Thanks.

> [Rust] Broken links and other documentation warnings
> 
>
> Key: ARROW-5184
> URL: https://issues.apache.org/jira/browse/ARROW-5184
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Affects Versions: 0.12.1
>Reporter: Jamie Blondin
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>   Original Estimate: 2h
>  Time Spent: 20m
>  Remaining Estimate: 1h 40m
>
> There are several broken links and other documentation warnings (when running 
> 'cargo doc') in the Rust implementation of Arrow and Parquet.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-5184) [Rust] Broken links and other documentation warnings

2019-04-18 Thread Chao Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun resolved ARROW-5184.
-
   Resolution: Fixed
Fix Version/s: 0.14.0

Issue resolved by pull request 4172
[https://github.com/apache/arrow/pull/4172]

> [Rust] Broken links and other documentation warnings
> 
>
> Key: ARROW-5184
> URL: https://issues.apache.org/jira/browse/ARROW-5184
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Affects Versions: 0.12.1
>Reporter: Jamie Blondin
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>   Original Estimate: 2h
>  Time Spent: 10m
>  Remaining Estimate: 1h 50m
>
> There are several broken links and other documentation warnings (when running 
> 'cargo doc') in the Rust implementation of Arrow and Parquet.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-5129) [Rust] Column writer bug: check dictionary encoder when adding a new data page

2019-04-14 Thread Chao Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun resolved ARROW-5129.
-
   Resolution: Fixed
Fix Version/s: 0.14.0

Issue resolved by pull request 4152
[https://github.com/apache/arrow/pull/4152]

> [Rust] Column writer bug: check dictionary encoder when adding a new data page
> --
>
> Key: ARROW-5129
> URL: https://issues.apache.org/jira/browse/ARROW-5129
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
> Environment: N/A
>Reporter: Ivan Sadikov
>Priority: Major
>  Labels: parquet, pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> As part of my weekly routine, I glanced over code in Parquet column writer 
> and found that the way we check when to add a new data page is buggy. The 
> idea is checking the current encoder and deciding if we have written enough 
> bytes for a page to construct. The problem is that we only check value 
> encoder, regardless whether or not dictionary encoder is enabled. 
> Here is how we do it now: actual check 
> (https://github.com/apache/arrow/blob/master/rust/parquet/src/column/writer.rs#L378)
>  and the buggy function 
> (https://github.com/apache/arrow/blob/master/rust/parquet/src/column/writer.rs#L423).
>  
> In the case of sparse column and dictionary  encoder we would write a single 
> data page, even though we would have accumulated a large enough number of 
> bytes for more than one page in encoder (value encoder will be empty, so it 
> will always less than constant limit).
> I forgot that parquet-cpp has `current_encoder` as either value encoder or 
> dictionary encoder 
> (https://github.com/apache/parquet-cpp/blob/master/src/parquet/column_writer.cc#L544),
>  but in parquet-rs we have them separate.
> So the fix could be something like this:
> {code}
> /// Returns true if there is enough data for a data page, false otherwise.
> #[inline]
> fn should_add_data_page() -> bool {
>   match self.dict_encoder {
> Some(ref encoder) => {
>   encoder.estimated_data_encoded_size() >= 
> self.props.data_pagesize_limit()
> },
> None => {
>   self.encoder.estimated_data_encoded_size() >= 
> self.props.data_pagesize_limit()
> }
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5129) [Rust] Column writer bug: check dictionary encoder when adding a new data page

2019-04-14 Thread Chao Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16817449#comment-16817449
 ] 

Chao Sun commented on ARROW-5129:
-

[~andygrove]: could you add [~sadikovi] as contributor and assign this JIRA to 
him?

> [Rust] Column writer bug: check dictionary encoder when adding a new data page
> --
>
> Key: ARROW-5129
> URL: https://issues.apache.org/jira/browse/ARROW-5129
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
> Environment: N/A
>Reporter: Ivan Sadikov
>Priority: Major
>  Labels: parquet, pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> As part of my weekly routine, I glanced over code in Parquet column writer 
> and found that the way we check when to add a new data page is buggy. The 
> idea is checking the current encoder and deciding if we have written enough 
> bytes for a page to construct. The problem is that we only check value 
> encoder, regardless whether or not dictionary encoder is enabled. 
> Here is how we do it now: actual check 
> (https://github.com/apache/arrow/blob/master/rust/parquet/src/column/writer.rs#L378)
>  and the buggy function 
> (https://github.com/apache/arrow/blob/master/rust/parquet/src/column/writer.rs#L423).
>  
> In the case of sparse column and dictionary  encoder we would write a single 
> data page, even though we would have accumulated a large enough number of 
> bytes for more than one page in encoder (value encoder will be empty, so it 
> will always less than constant limit).
> I forgot that parquet-cpp has `current_encoder` as either value encoder or 
> dictionary encoder 
> (https://github.com/apache/parquet-cpp/blob/master/src/parquet/column_writer.cc#L544),
>  but in parquet-rs we have them separate.
> So the fix could be something like this:
> {code}
> /// Returns true if there is enough data for a data page, false otherwise.
> #[inline]
> fn should_add_data_page() -> bool {
>   match self.dict_encoder {
> Some(ref encoder) => {
>   encoder.estimated_data_encoded_size() >= 
> self.props.data_pagesize_limit()
> },
> None => {
>   self.encoder.estimated_data_encoded_size() >= 
> self.props.data_pagesize_limit()
> }
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-5162) [Rust] [Parquet] Rename mod reader to arrow.

2019-04-13 Thread Chao Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun resolved ARROW-5162.
-
   Resolution: Fixed
Fix Version/s: 0.14.0

Issue resolved by pull request 4145
[https://github.com/apache/arrow/pull/4145]

> [Rust] [Parquet] Rename mod reader to arrow.
> 
>
> Key: ARROW-5162
> URL: https://issues.apache.org/jira/browse/ARROW-5162
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Renjie Liu
>Assignee: Renjie Liu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Rename mod to arrow.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-5127) [Rust] [Parquet] Add page iterator

2019-04-12 Thread Chao Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun resolved ARROW-5127.
-
   Resolution: Fixed
Fix Version/s: 0.14.0

Issue resolved by pull request 4136
[https://github.com/apache/arrow/pull/4136]

> [Rust] [Parquet] Add page iterator
> --
>
> Key: ARROW-5127
> URL: https://issues.apache.org/jira/browse/ARROW-5127
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Rust
>Reporter: Renjie Liu
>Assignee: Renjie Liu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Adds a page iterator for column reader.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-5159) Unable to build benches in arrow crate.

2019-04-10 Thread Chao Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun resolved ARROW-5159.
-
Resolution: Fixed

Issue resolved by pull request 4138
[https://github.com/apache/arrow/pull/4138]

> Unable to build benches in arrow crate.
> ---
>
> Key: ARROW-5159
> URL: https://issues.apache.org/jira/browse/ARROW-5159
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Affects Versions: 0.14.0
>Reporter: Zhiyuan Zheng
>Assignee: Zhiyuan Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> After the refactor of kernel related files in ARROW-5116, the files in 
> `bench` folder won't compile.
> eg. 
> error[E0432]: unresolved import `arrow::compute::boolean_kernels`
>  --> arrow/benches/boolean_kernels.rs:26:5
>  |
> 26 | use arrow::compute::boolean_kernels;
>  | ^^^ no `boolean_kernels` in `compute`
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-5126) [Rust] [Parquet] Convert parquet column desc to arrow data type

2019-04-07 Thread Chao Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun resolved ARROW-5126.
-
   Resolution: Fixed
Fix Version/s: 0.14.0

Issue resolved by pull request 4117
[https://github.com/apache/arrow/pull/4117]

> [Rust] [Parquet] Convert parquet column desc to arrow data type
> ---
>
> Key: ARROW-5126
> URL: https://issues.apache.org/jira/browse/ARROW-5126
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust
>Reporter: Renjie Liu
>Assignee: Renjie Liu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5116) [Rust] move kernel related files under compute/kernels

2019-04-04 Thread Chao Sun (JIRA)
Chao Sun created ARROW-5116:
---

 Summary: [Rust] move kernel related files under compute/kernels
 Key: ARROW-5116
 URL: https://issues.apache.org/jira/browse/ARROW-5116
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Chao Sun
Assignee: Chao Sun






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-4853) [Rust] Array slice doesn't work on ListArray and StructArray

2019-03-20 Thread Chao Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun resolved ARROW-4853.
-
   Resolution: Fixed
Fix Version/s: 0.13.0

Issue resolved by pull request 3972
[https://github.com/apache/arrow/pull/3972]

> [Rust] Array slice doesn't work on ListArray and StructArray
> 
>
> Key: ARROW-4853
> URL: https://issues.apache.org/jira/browse/ARROW-4853
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Reporter: Neville Dipale
>Assignee: Chao Sun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> -ARROW-3954- added the ability to slice arrays. It's been implemented on the 
> Array trait, so callers might expect it to also work on ListArray and 
> StructArray.
> It looks like for ListArray, the offset buffer is sliced, but the child_data 
> buffer is not modified. This leads to an assertion failure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-4914) [Rust] Array slice returns incorrect bitmask

2019-03-19 Thread Chao Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-4914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16795770#comment-16795770
 ] 

Chao Sun commented on ARROW-4914:
-

Thanks [~nevi_me]! I think the failure for this JIRA is because {{is_null}} and 
{{is_valid}} in {{Array}} do not consider offset, the second failure is because 
the invariant check in {{From for ListArray}} is wrong. Both 
should be an easy fix. Will post a PR soon.

> [Rust] Array slice returns incorrect bitmask
> 
>
> Key: ARROW-4914
> URL: https://issues.apache.org/jira/browse/ARROW-4914
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Affects Versions: 0.13.0
>Reporter: Neville Dipale
>Priority: Blocker
>
> Slicing arrays changes the offset, length and null count of their array data, 
> but the bitmask is not changed.
> This results in the correct null count, but the array values might be marked 
> incorrectly as valid/invalid based on the old bitmask positions before the 
> offset.
> To reproduce, create an array with some null values, slice the array, and 
> then dbg!() it (after downcasting).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-4914) [Rust] Array slice returns incorrect bitmask

2019-03-18 Thread Chao Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-4914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16795667#comment-16795667
 ] 

Chao Sun commented on ARROW-4914:
-

[~nevi_me] looking at this - could you provide a simple test case to reproduce 
the issue? Thanks.

> [Rust] Array slice returns incorrect bitmask
> 
>
> Key: ARROW-4914
> URL: https://issues.apache.org/jira/browse/ARROW-4914
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Affects Versions: 0.13.0
>Reporter: Neville Dipale
>Priority: Blocker
>
> Slicing arrays changes the offset, length and null count of their array data, 
> but the bitmask is not changed.
> This results in the correct null count, but the array values might be marked 
> incorrectly as valid/invalid based on the old bitmask positions before the 
> offset.
> To reproduce, create an array with some null values, slice the array, and 
> then dbg!() it (after downcasting).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-4896) [Rust] [DataFusion] Remove all uses of panic! from tests

2019-03-15 Thread Chao Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun resolved ARROW-4896.
-
Resolution: Fixed

> [Rust] [DataFusion] Remove all uses of panic! from tests
> 
>
> Key: ARROW-4896
> URL: https://issues.apache.org/jira/browse/ARROW-4896
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust, Rust - DataFusion
>Affects Versions: 0.12.0
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Tests should use assert!(false) rather than panic!()



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-4705) [Rust] CSV reader should show line number and error message when failing to parse a line

2019-03-14 Thread Chao Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun resolved ARROW-4705.
-
Resolution: Fixed

Issue resolved by pull request 3895
[https://github.com/apache/arrow/pull/3895]

> [Rust] CSV reader should show line number and error message when failing to 
> parse a line
> 
>
> Key: ARROW-4705
> URL: https://issues.apache.org/jira/browse/ARROW-4705
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Affects Versions: 0.12.0
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>  Labels: newbie, pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> We currently throw away the original error and do not report line number, 
> making it very difficult to debug.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4304) [Rust] Enhance documentation for arrow

2019-03-14 Thread Chao Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated ARROW-4304:

Fix Version/s: (was: 0.13.0)
   0.14.0

> [Rust] Enhance documentation for arrow
> --
>
> Key: ARROW-4304
> URL: https://issues.apache.org/jira/browse/ARROW-4304
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Documentation, Rust
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
> Fix For: 0.14.0
>
>
> The documentation for arrow crate (https://docs.rs/arrow/0.12.0/arrow/) is 
> not complete. We should add more content to it to help people who want to use 
> the crate.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-4304) [Rust] Enhance documentation for arrow

2019-03-14 Thread Chao Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-4304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16792814#comment-16792814
 ] 

Chao Sun commented on ARROW-4304:
-

I think we can push this to 0.14 since some of the work has been done in some 
other JIRAs such as ARROW-4245. 

> [Rust] Enhance documentation for arrow
> --
>
> Key: ARROW-4304
> URL: https://issues.apache.org/jira/browse/ARROW-4304
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Documentation, Rust
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
> Fix For: 0.13.0
>
>
> The documentation for arrow crate (https://docs.rs/arrow/0.12.0/arrow/) is 
> not complete. We should add more content to it to help people who want to use 
> the crate.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-4853) [Rust] Array slice doesn't work on ListArray and StructArray

2019-03-13 Thread Chao Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun reassigned ARROW-4853:
---

Assignee: Chao Sun

> [Rust] Array slice doesn't work on ListArray and StructArray
> 
>
> Key: ARROW-4853
> URL: https://issues.apache.org/jira/browse/ARROW-4853
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Reporter: Neville Dipale
>Assignee: Chao Sun
>Priority: Major
>
> -ARROW-3954- added the ability to slice arrays. It's been implemented on the 
> Array trait, so callers might expect it to also work on ListArray and 
> StructArray.
> It looks like for ListArray, the offset buffer is sliced, but the child_data 
> buffer is not modified. This leads to an assertion failure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-4853) [Rust] Array slice doesn't work on ListArray and StructArray

2019-03-13 Thread Chao Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-4853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16791763#comment-16791763
 ] 

Chao Sun commented on ARROW-4853:
-

Good catch [~nevi_me]. I'll take a look.

> [Rust] Array slice doesn't work on ListArray and StructArray
> 
>
> Key: ARROW-4853
> URL: https://issues.apache.org/jira/browse/ARROW-4853
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Reporter: Neville Dipale
>Assignee: Chao Sun
>Priority: Major
>
> -ARROW-3954- added the ability to slice arrays. It's been implemented on the 
> Array trait, so callers might expect it to also work on ListArray and 
> StructArray.
> It looks like for ListArray, the offset buffer is sliced, but the child_data 
> buffer is not modified. This leads to an assertion failure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4807) [Rust] Fix csv_writer benchmark

2019-03-07 Thread Chao Sun (JIRA)
Chao Sun created ARROW-4807:
---

 Summary: [Rust] Fix csv_writer benchmark
 Key: ARROW-4807
 URL: https://issues.apache.org/jira/browse/ARROW-4807
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust
Reporter: Chao Sun
Assignee: Chao Sun
 Fix For: 0.13.0


CSV writer benchmark suite isn't working due to `RecordBatch::try_new` returns 
a `Result` now.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-4386) [Rust] Implement Date and Time Arrays

2019-03-07 Thread Chao Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun resolved ARROW-4386.
-
Resolution: Fixed

Issue resolved by pull request 3726
[https://github.com/apache/arrow/pull/3726]

> [Rust] Implement Date and Time Arrays
> -
>
> Key: ARROW-4386
> URL: https://issues.apache.org/jira/browse/ARROW-4386
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust
>Reporter: Neville Dipale
>Assignee: Neville Dipale
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> We have added date/time types, but have not yet created arrays for these 
> types. See discussion: 
> https://github.com/apache/arrow/pull/3340#issuecomment-452226570



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-4749) [Rust] RecordBatch::new() should return result instead of panicking

2019-03-05 Thread Chao Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun resolved ARROW-4749.
-
Resolution: Fixed

Issue resolved by pull request 3800
[https://github.com/apache/arrow/pull/3800]

> [Rust] RecordBatch::new() should return result instead of panicking
> ---
>
> Key: ARROW-4749
> URL: https://issues.apache.org/jira/browse/ARROW-4749
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Affects Versions: 0.12.0
>Reporter: Andy Grove
>Assignee: Neville Dipale
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> RecordBatch::new() has some good validation checks, but calls assert_eq 
> instead of returning a Result



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-4769) [Rust] Improve array limit function where max records > len

2019-03-05 Thread Chao Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun resolved ARROW-4769.
-
   Resolution: Fixed
Fix Version/s: 0.13.0

Issue resolved by pull request 3811
[https://github.com/apache/arrow/pull/3811]

> [Rust] Improve array limit function where max records > len
> ---
>
> Key: ARROW-4769
> URL: https://issues.apache.org/jira/browse/ARROW-4769
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Affects Versions: 0.12.0
>Reporter: Neville Dipale
>Assignee: Neville Dipale
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> When we have an array of n records, and we want to take a limit that's higher 
> or equat to n, we still iterate through the array values and create a new 
> array.
> We could improve this by returning a copy of the array as-is.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-4678) [Rust] Minimize unstable feature usage

2019-02-28 Thread Chao Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-4678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16780816#comment-16780816
 ] 

Chao Sun commented on ARROW-4678:
-

Seems I can't assign that to you as well - you name doesn't show up in the 
available assignees. 

[~wesmckinn], [~xhochy]: do you know how to assign this JIRA to [~sfackler]? do 
we need to add him as a contributor first? Thanks.

> [Rust] Minimize unstable feature usage
> --
>
> Key: ARROW-4678
> URL: https://issues.apache.org/jira/browse/ARROW-4678
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Affects Versions: 0.12.0
>Reporter: Steven Fackler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The Rust implementation currently uses quite a few nightly features. This is 
> unfortunately a hard blocker on using these crates for many users.
> Here's the list of currently use nightly features:
>  * type_ascription: Unused, can be trivially removed.
>  * rustc_private: Unused, can be trivially removed.
>  * box_syntax: Indefinitely far from stabilization, trivially replaceable 
> with Box::new.
>  * box_patterns: Indefinitely far from stabilization, replaceable with some 
> minor restructuring of a couple of matches.
>  * serde's alloc feature: Unused, can be trivially removed.
>  * try_from: Scheduled for stabilization in Rust 1.35.
>  * specialization: Actively being worked on - maybe ~1 year timeframe?
>  * packed_simd: Actively being worked on - maybe ~1 year timeframe?
> The first set of features are easy enough to get rid of - I'll make a PR to 
> do that (https://github.com/sfackler/arrow/tree/more-stable). I'm a bit less 
> sure of what to do with specialization and packed_simd, though.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   >