[jira] [Created] (ARROW-11760) [Rust]: Conditionally compile leak tracking & lower atomic consistency guarantees

2021-02-24 Thread Mahmut Bulut (Jira)
Mahmut Bulut created ARROW-11760:


 Summary: [Rust]: Conditionally compile leak tracking & lower 
atomic consistency guarantees
 Key: ARROW-11760
 URL: https://issues.apache.org/jira/browse/ARROW-11760
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Mahmut Bulut
Assignee: Mahmut Bulut


Conditionally compile object tracking in alloc.rs and lower the atomic 
consistency guarantees.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11316) [Rust]: BitMap is_set should return Result rather than relying on inlined assertion

2021-01-19 Thread Mahmut Bulut (Jira)
Mahmut Bulut created ARROW-11316:


 Summary: [Rust]: BitMap is_set should return Result rather than 
relying on inlined assertion
 Key: ARROW-11316
 URL: https://issues.apache.org/jira/browse/ARROW-11316
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Mahmut Bulut


The inlined assertion is prone to fail and panic when a user of the method 
passes anything other than 0..7 range. This is making wrong usages to crash the 
application that uses Arrow.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11141) [Rust]: Miri checks

2021-01-06 Thread Mahmut Bulut (Jira)
Mahmut Bulut created ARROW-11141:


 Summary: [Rust]: Miri checks
 Key: ARROW-11141
 URL: https://issues.apache.org/jira/browse/ARROW-11141
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Mahmut Bulut
Assignee: Mahmut Bulut


Miri checks need to be enabled to see if there are any out of bounds reads or 
invalid memory accesses. Currently, there is no way of determining this and all 
invalid memory access related issues are experienced on the arrow dependant 
code.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10664) [Rust] Implement AVX-512 sort operation

2020-11-20 Thread Mahmut Bulut (Jira)
Mahmut Bulut created ARROW-10664:


 Summary: [Rust] Implement AVX-512 sort operation
 Key: ARROW-10664
 URL: https://issues.apache.org/jira/browse/ARROW-10664
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Rust
Reporter: Mahmut Bulut
Assignee: Mahmut Bulut






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10660) [Rust] Implement AVX-512 bit or operation

2020-11-19 Thread Mahmut Bulut (Jira)
Mahmut Bulut created ARROW-10660:


 Summary: [Rust] Implement AVX-512 bit or operation
 Key: ARROW-10660
 URL: https://issues.apache.org/jira/browse/ARROW-10660
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Rust
Reporter: Mahmut Bulut
Assignee: Mahmut Bulut






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-10653) [Rust]: Update toolchain version to bring new features

2020-11-19 Thread Mahmut Bulut (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-10653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahmut Bulut updated ARROW-10653:
-
Component/s: Rust

> [Rust]: Update toolchain version to bring new features
> --
>
> Key: ARROW-10653
> URL: https://issues.apache.org/jira/browse/ARROW-10653
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust
>Reporter: Mahmut Bulut
>Assignee: Mahmut Bulut
>Priority: Major
>
> I have deployed new intrinsics to rust lang core, so I want to bring these in 
> iterations.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-10653) [Rust]: Update toolchain version to bring new features

2020-11-19 Thread Mahmut Bulut (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-10653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahmut Bulut updated ARROW-10653:
-
Issue Type: New Feature  (was: Bug)

> [Rust]: Update toolchain version to bring new features
> --
>
> Key: ARROW-10653
> URL: https://issues.apache.org/jira/browse/ARROW-10653
> Project: Apache Arrow
>  Issue Type: New Feature
>Reporter: Mahmut Bulut
>Assignee: Mahmut Bulut
>Priority: Major
>
> I have deployed new intrinsics to rust lang core, so I want to bring these in 
> iterations.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-10612) [Rust]: Tracking issue for AVX-512

2020-11-16 Thread Mahmut Bulut (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-10612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahmut Bulut updated ARROW-10612:
-
Labels: AVX-512 SIMD  (was: )

> [Rust]: Tracking issue for AVX-512
> --
>
> Key: ARROW-10612
> URL: https://issues.apache.org/jira/browse/ARROW-10612
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust
>Reporter: Mahmut Bulut
>Assignee: Mahmut Bulut
>Priority: Major
>  Labels: AVX-512, SIMD
>
> This issue will track AVX-512 feature development in its entirety.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-10612) [Rust]: Tracking issue for AVX-512

2020-11-16 Thread Mahmut Bulut (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-10612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahmut Bulut updated ARROW-10612:
-
Issue Type: New Feature  (was: Improvement)

> [Rust]: Tracking issue for AVX-512
> --
>
> Key: ARROW-10612
> URL: https://issues.apache.org/jira/browse/ARROW-10612
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust
>Reporter: Mahmut Bulut
>Assignee: Mahmut Bulut
>Priority: Major
>
> This issue will track AVX-512 feature development in its entirety.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10615) [Rust]: Adapt existing benchmarks to have proper execution over AVX-512

2020-11-16 Thread Mahmut Bulut (Jira)
Mahmut Bulut created ARROW-10615:


 Summary: [Rust]: Adapt existing benchmarks to have proper 
execution over AVX-512
 Key: ARROW-10615
 URL: https://issues.apache.org/jira/browse/ARROW-10615
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Rust
Reporter: Mahmut Bulut


Some benchmarks are utilizing the same data which is easily predictable during 
execution, moreover, some of them the insufficient amount of data to utilize 
the power of the implemented operations.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10614) [Rust]: Write documentation for AVX-512 in the Arrow Readme

2020-11-16 Thread Mahmut Bulut (Jira)
Mahmut Bulut created ARROW-10614:


 Summary: [Rust]: Write documentation for AVX-512 in the Arrow 
Readme
 Key: ARROW-10614
 URL: https://issues.apache.org/jira/browse/ARROW-10614
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Rust
Reporter: Mahmut Bulut


Write documentation about how SIMD related features work in addition to 
`avx512` feature which will be introduced under this tracking issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10613) [Rust]: Enable CI for AVX-512

2020-11-16 Thread Mahmut Bulut (Jira)
Mahmut Bulut created ARROW-10613:


 Summary: [Rust]: Enable CI for AVX-512
 Key: ARROW-10613
 URL: https://issues.apache.org/jira/browse/ARROW-10613
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Rust
Reporter: Mahmut Bulut


Enable CI for AVX-512, CI will work on the nightly compiler dating with a 
version later than 14.11.2020 as of today.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10612) [Rust]: Tracking issue for AVX-512

2020-11-16 Thread Mahmut Bulut (Jira)
Mahmut Bulut created ARROW-10612:


 Summary: [Rust]: Tracking issue for AVX-512
 Key: ARROW-10612
 URL: https://issues.apache.org/jira/browse/ARROW-10612
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Mahmut Bulut
Assignee: Mahmut Bulut


This issue will track AVX-512 feature development in its entirety.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10589) [Rust]: Implement AVX-512 bit and operation

2020-11-14 Thread Mahmut Bulut (Jira)
Mahmut Bulut created ARROW-10589:


 Summary: [Rust]: Implement AVX-512 bit and operation
 Key: ARROW-10589
 URL: https://issues.apache.org/jira/browse/ARROW-10589
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Mahmut Bulut
Assignee: Mahmut Bulut


Implement bit and on avx-512.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10588) [Rust]: Safe bit operations for Arrow

2020-11-14 Thread Mahmut Bulut (Jira)
Mahmut Bulut created ARROW-10588:


 Summary: [Rust]: Safe bit operations for Arrow
 Key: ARROW-10588
 URL: https://issues.apache.org/jira/browse/ARROW-10588
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Mahmut Bulut
Assignee: Mahmut Bulut


Implement bit operations over the safe interface with checks instead of using 
unsafe operations.

Expose better API to users. Extends ARROW-10535.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-10572) [Rust][DataFusion] Use aHash and std::collections hashmap for aggregates / distinct

2020-11-14 Thread Mahmut Bulut (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-10572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahmut Bulut updated ARROW-10572:
-
Summary: [Rust][DataFusion] Use aHash and std::collections hashmap for 
aggregates / distinct  (was: [Rust] Use aHash and std::collections hashmap for 
aggregates / distinct)

> [Rust][DataFusion] Use aHash and std::collections hashmap for aggregates / 
> distinct
> ---
>
> Key: ARROW-10572
> URL: https://issues.apache.org/jira/browse/ARROW-10572
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust - DataFusion
>Reporter: Daniël Heres
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Ahash is a faster hash algorithm than FNV.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-10572) [Rust]: Use aHash and std::collections hashmap for aggregates / distinct

2020-11-14 Thread Mahmut Bulut (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-10572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahmut Bulut updated ARROW-10572:
-
Summary: [Rust]: Use aHash and std::collections hashmap for aggregates / 
distinct  (was: Use aHash and std::collections hashmap for aggregates / 
distinct)

> [Rust]: Use aHash and std::collections hashmap for aggregates / distinct
> 
>
> Key: ARROW-10572
> URL: https://issues.apache.org/jira/browse/ARROW-10572
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust - DataFusion
>Reporter: Daniël Heres
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Ahash is a faster hash algorithm than FNV.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-10572) [Rust] Use aHash and std::collections hashmap for aggregates / distinct

2020-11-14 Thread Mahmut Bulut (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-10572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahmut Bulut updated ARROW-10572:
-
Summary: [Rust] Use aHash and std::collections hashmap for aggregates / 
distinct  (was: [Rust]: Use aHash and std::collections hashmap for aggregates / 
distinct)

> [Rust] Use aHash and std::collections hashmap for aggregates / distinct
> ---
>
> Key: ARROW-10572
> URL: https://issues.apache.org/jira/browse/ARROW-10572
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust - DataFusion
>Reporter: Daniël Heres
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Ahash is a faster hash algorithm than FNV.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10551) [Rust]: Fix unreproducible benchmarks

2020-11-10 Thread Mahmut Bulut (Jira)
Mahmut Bulut created ARROW-10551:


 Summary: [Rust]: Fix unreproducible benchmarks
 Key: ARROW-10551
 URL: https://issues.apache.org/jira/browse/ARROW-10551
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust
Reporter: Mahmut Bulut
Assignee: Mahmut Bulut


Some benchmarks are unreproducible in Arrow impl. Fix them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10538) [Rust]: Read/write data in respect to endianness

2020-11-09 Thread Mahmut Bulut (Jira)
Mahmut Bulut created ARROW-10538:


 Summary: [Rust]: Read/write data in respect to endianness
 Key: ARROW-10538
 URL: https://issues.apache.org/jira/browse/ARROW-10538
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Rust
Reporter: Mahmut Bulut


Adapt endianness while parsing data on the machine with respect to endianness 
in:

https://github.com/apache/arrow/blob/master/format/Schema.fbs#L368-L371



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10537) [Rust]: Fix dense array implementations

2020-11-09 Thread Mahmut Bulut (Jira)
Mahmut Bulut created ARROW-10537:


 Summary: [Rust]: Fix dense array implementations
 Key: ARROW-10537
 URL: https://issues.apache.org/jira/browse/ARROW-10537
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Rust
Reporter: Mahmut Bulut


Dense implementations like union and null arrays should be fixed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10536) [Rust]: Adapt kernels to big endian platforms

2020-11-09 Thread Mahmut Bulut (Jira)
Mahmut Bulut created ARROW-10536:


 Summary: [Rust]: Adapt kernels to big endian platforms
 Key: ARROW-10536
 URL: https://issues.apache.org/jira/browse/ARROW-10536
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Rust
Reporter: Mahmut Bulut






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (ARROW-10534) [Rust]: Implement bit slice iterator for big endian platforms

2020-11-09 Thread Mahmut Bulut (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-10534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahmut Bulut closed ARROW-10534.

Resolution: Duplicate

> [Rust]: Implement bit slice iterator for big endian platforms
> -
>
> Key: ARROW-10534
> URL: https://issues.apache.org/jira/browse/ARROW-10534
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Mahmut Bulut
>Assignee: Mahmut Bulut
>Priority: Major
>
> Implement big-endian support for bit slice iterators in the array, also allow 
> storing and interpreting data as big-endian.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10535) [Rust]: Implement bit slice iterator for big endian platforms

2020-11-09 Thread Mahmut Bulut (Jira)
Mahmut Bulut created ARROW-10535:


 Summary: [Rust]: Implement bit slice iterator for big endian 
platforms
 Key: ARROW-10535
 URL: https://issues.apache.org/jira/browse/ARROW-10535
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Rust
Reporter: Mahmut Bulut
Assignee: Mahmut Bulut


Implement big-endian support for bit slice iterators in the arrow arrays, also 
allow storing and interpreting data as big-endian.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10534) [Rust]: Implement bit slice iterator for big endian platforms

2020-11-09 Thread Mahmut Bulut (Jira)
Mahmut Bulut created ARROW-10534:


 Summary: [Rust]: Implement bit slice iterator for big endian 
platforms
 Key: ARROW-10534
 URL: https://issues.apache.org/jira/browse/ARROW-10534
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Mahmut Bulut
Assignee: Mahmut Bulut


Implement big-endian support for bit slice iterators in the array, also allow 
storing and interpreting data as big-endian.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10533) [Rust]: Tracking issue for big endian platforms

2020-11-09 Thread Mahmut Bulut (Jira)
Mahmut Bulut created ARROW-10533:


 Summary: [Rust]: Tracking issue for big endian platforms
 Key: ARROW-10533
 URL: https://issues.apache.org/jira/browse/ARROW-10533
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust, Rust - DataFusion
Reporter: Mahmut Bulut
Assignee: Mahmut Bulut


This is a placeholder tracking issue for big-endian platform support of Arrow's 
Rust version.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10513) [Rust]: Enable running Arrow on ARMv7 with different ABIs

2020-11-07 Thread Mahmut Bulut (Jira)
Mahmut Bulut created ARROW-10513:


 Summary: [Rust]: Enable running Arrow on ARMv7 with different ABIs
 Key: ARROW-10513
 URL: https://issues.apache.org/jira/browse/ARROW-10513
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Mahmut Bulut
Assignee: Mahmut Bulut






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10507) [Rust]: Make full integration of ARMv7

2020-11-06 Thread Mahmut Bulut (Jira)
Mahmut Bulut created ARROW-10507:


 Summary: [Rust]: Make full integration of ARMv7
 Key: ARROW-10507
 URL: https://issues.apache.org/jira/browse/ARROW-10507
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Mahmut Bulut
Assignee: Mahmut Bulut


Arrow is not ready for ARMv7. It needs an effort to finalize the full 
integration.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10500) [Rust] Refactor bit slice, bit view iterator for array buffers

2020-11-05 Thread Mahmut Bulut (Jira)
Mahmut Bulut created ARROW-10500:


 Summary: [Rust] Refactor bit slice, bit view iterator for array 
buffers
 Key: ARROW-10500
 URL: https://issues.apache.org/jira/browse/ARROW-10500
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Mahmut Bulut
Assignee: Mahmut Bulut


Currently, bit slice, bit view, and operations all kind of bit operations 
looking blurry.
 # Support native endianness
 # Fix problems related to bit operations
 # Method docs are written.
 # Separate view and bit operation
 # Have good benchmarks still



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10393) [Rust]: Fix null value reading in jsonreader for both dictionary and stringbuilders

2020-10-26 Thread Mahmut Bulut (Jira)
Mahmut Bulut created ARROW-10393:


 Summary: [Rust]: Fix null value reading in jsonreader for both 
dictionary and stringbuilders
 Key: ARROW-10393
 URL: https://issues.apache.org/jira/browse/ARROW-10393
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust
Reporter: Mahmut Bulut
Assignee: Mahmut Bulut


There is a problem with reading nested null values for listarrays with both 
normal string builders and dictionary builders



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10339) [Rust]: Builder benchmarks are giving segfault

2020-10-18 Thread Mahmut Bulut (Jira)
Mahmut Bulut created ARROW-10339:


 Summary: [Rust]: Builder benchmarks are giving segfault
 Key: ARROW-10339
 URL: https://issues.apache.org/jira/browse/ARROW-10339
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust
Reporter: Mahmut Bulut


On the rustc stable(rustc 1.47.0 (18bf6b4f0 2020-10-07)) boolean benchmarks are 
giving segfault for the arrow.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10338) [Rust]: Use const fn for applicable methods

2020-10-18 Thread Mahmut Bulut (Jira)
Mahmut Bulut created ARROW-10338:


 Summary: [Rust]: Use const fn for applicable methods
 Key: ARROW-10338
 URL: https://issues.apache.org/jira/browse/ARROW-10338
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Mahmut Bulut
Assignee: Mahmut Bulut


I have realized that most of the propagation is not happening correctly and 
still boundary checks are triggered for kernels and operations. For this 
reason, if applicable, methods should use const fn.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-10335) [Rust]: Unify common methods of dictionaries and other array types

2020-10-18 Thread Mahmut Bulut (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-10335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahmut Bulut updated ARROW-10335:
-
Description: Currently, we have a differently named set of methods which do 
the same thing underneath but written inside the concrete implementations of 
Arrays. One example is append_value in DictionaryArray. Unify these methods 
with primitive arrays to prevent passing around dynamic objects.

> [Rust]: Unify common methods of dictionaries and other array types
> --
>
> Key: ARROW-10335
> URL: https://issues.apache.org/jira/browse/ARROW-10335
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Mahmut Bulut
>Priority: Major
>
> Currently, we have a differently named set of methods which do the same thing 
> underneath but written inside the concrete implementations of Arrays. One 
> example is append_value in DictionaryArray. Unify these methods with 
> primitive arrays to prevent passing around dynamic objects.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10335) [Rust]: Unify common methods of dictionaries and other array types

2020-10-18 Thread Mahmut Bulut (Jira)
Mahmut Bulut created ARROW-10335:


 Summary: [Rust]: Unify common methods of dictionaries and other 
array types
 Key: ARROW-10335
 URL: https://issues.apache.org/jira/browse/ARROW-10335
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Mahmut Bulut






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-10326) [Rust] Add missing method docs for Arrays

2020-10-16 Thread Mahmut Bulut (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-10326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahmut Bulut updated ARROW-10326:
-
Description: Whenever a PR comes we don't inspect documentation thus some 
of the methods are missing documentations about what they do. We should 
regularly check and carefully inspect the explanations if they are adequate or 
not. This issue is for filling in all missing doc comments.  (was: Currently, 
whenever a PR comes we don't inspect documentation thus some of the methods are 
missing documentations about what they do. We should regularly check and 
carefully inspect the explanations if they are adequate or not. This issue is 
for filling in all missing doc comments.)

> [Rust] Add missing method docs for Arrays
> -
>
> Key: ARROW-10326
> URL: https://issues.apache.org/jira/browse/ARROW-10326
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Mahmut Bulut
>Priority: Major
>
> Whenever a PR comes we don't inspect documentation thus some of the methods 
> are missing documentations about what they do. We should regularly check and 
> carefully inspect the explanations if they are adequate or not. This issue is 
> for filling in all missing doc comments.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-10326) [Rust] Add missing method docs for Arrays

2020-10-16 Thread Mahmut Bulut (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-10326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahmut Bulut updated ARROW-10326:
-
Description: Currently, whenever a PR comes we don't inspect documentation 
thus some of the methods are missing documentations about what they do. We 
should regularly check and carefully inspect the explanations if they are 
adequate or not. This issue is for filling in all missing doc comments.  (was: 
Currently, whenever a PR comes we don't inspect documentation thus some of the 
methods are missing documentations about what they do. We should regularly 
check and carefully inspect the explanations that are adequate and not missing. 
This issue is for filling in all missing doc comments.)

> [Rust] Add missing method docs for Arrays
> -
>
> Key: ARROW-10326
> URL: https://issues.apache.org/jira/browse/ARROW-10326
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Mahmut Bulut
>Priority: Major
>
> Currently, whenever a PR comes we don't inspect documentation thus some of 
> the methods are missing documentations about what they do. We should 
> regularly check and carefully inspect the explanations if they are adequate 
> or not. This issue is for filling in all missing doc comments.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10326) [Rust] Add missing method docs for Arrays

2020-10-16 Thread Mahmut Bulut (Jira)
Mahmut Bulut created ARROW-10326:


 Summary: [Rust] Add missing method docs for Arrays
 Key: ARROW-10326
 URL: https://issues.apache.org/jira/browse/ARROW-10326
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Mahmut Bulut


Currently, whenever a PR comes we don't inspect documentation thus some of the 
methods are missing documentations about what they do. We should regularly 
check and carefully inspect the explanations that are adequate and not missing. 
This issue is for filling in all missing doc comments.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-10249) [Rust]: Support Dictionary types for ListArrays in arrow json reader

2020-10-10 Thread Mahmut Bulut (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-10249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahmut Bulut reassigned ARROW-10249:


Assignee: Mahmut Bulut

> [Rust]: Support Dictionary types for ListArrays in arrow json reader
> 
>
> Key: ARROW-10249
> URL: https://issues.apache.org/jira/browse/ARROW-10249
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust
>Reporter: Mahmut Bulut
>Assignee: Mahmut Bulut
>Priority: Major
>
> Currently, dictionary types for listarrays are not supported in Arrow JSON 
> reader. It would be nice to add dictionary type support.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-10249) [Rust]: Support Dictionary types for ListArrays in arrow json reader

2020-10-09 Thread Mahmut Bulut (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-10249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahmut Bulut updated ARROW-10249:
-
Summary: [Rust]: Support Dictionary types for ListArrays in arrow json 
reader  (was: [Rust]: Support Dictionary types in arrow json reader)

> [Rust]: Support Dictionary types for ListArrays in arrow json reader
> 
>
> Key: ARROW-10249
> URL: https://issues.apache.org/jira/browse/ARROW-10249
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust
>Reporter: Mahmut Bulut
>Priority: Major
>
> Currently, dictionary types are not supported in Arrow JSON reader. It would 
> be nice to add dictionary type support.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-10249) [Rust]: Support Dictionary types for ListArrays in arrow json reader

2020-10-09 Thread Mahmut Bulut (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-10249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahmut Bulut updated ARROW-10249:
-
Description: Currently, dictionary types for listarrays are not supported 
in Arrow JSON reader. It would be nice to add dictionary type support.  (was: 
Currently, dictionary types are not supported in Arrow JSON reader. It would be 
nice to add dictionary type support.)

> [Rust]: Support Dictionary types for ListArrays in arrow json reader
> 
>
> Key: ARROW-10249
> URL: https://issues.apache.org/jira/browse/ARROW-10249
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust
>Reporter: Mahmut Bulut
>Priority: Major
>
> Currently, dictionary types for listarrays are not supported in Arrow JSON 
> reader. It would be nice to add dictionary type support.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10249) [Rust]: Support Dictionary types in arrow json reader

2020-10-09 Thread Mahmut Bulut (Jira)
Mahmut Bulut created ARROW-10249:


 Summary: [Rust]: Support Dictionary types in arrow json reader
 Key: ARROW-10249
 URL: https://issues.apache.org/jira/browse/ARROW-10249
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Rust
Reporter: Mahmut Bulut


Currently, dictionary types are not supported in Arrow JSON reader. It would be 
nice to add dictionary type support.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-10187) [Rust] Test failures on 32 bit ARM (Raspberry Pi)

2020-10-06 Thread Mahmut Bulut (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-10187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17208817#comment-17208817
 ] 

Mahmut Bulut commented on ARROW-10187:
--

[~andygrove] Hi Andy, I don't have raspberry pi at hand. I want to check the 
compilation problems on ARM asap, target_pointer_width gate might be a good 
option for it. What version of rpi did you use?

> [Rust] Test failures on 32 bit ARM (Raspberry Pi)
> -
>
> Key: ARROW-10187
> URL: https://issues.apache.org/jira/browse/ARROW-10187
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>
> Perhaps these failures are to be expected and perhaps we can't really support 
> 32 bit?
>  
> {code:java}
>  array::array::tests::test_primitive_array_from_vec stdout 
> thread 'array::array::tests::test_primitive_array_from_vec' panicked at 
> 'assertion failed: `(left == right)`
>   left: `144`,
>  right: `104`', arrow/src/array/array.rs:2383:9 
> array::array::tests::test_primitive_array_from_vec_option stdout 
> thread 'array::array::tests::test_primitive_array_from_vec_option' panicked 
> at 'assertion failed: `(left == right)`
>   left: `224`,
>  right: `176`', arrow/src/array/array.rs:2409:9 
> array::null::tests::test_null_array stdout 
> thread 'array::null::tests::test_null_array' panicked at 'assertion failed: 
> `(left == right)`
>   left: `64`,
>  right: `32`', arrow/src/array/null.rs:134:9 
> array::union::tests::test_dense_union_i32 stdout 
> thread 'array::union::tests::test_dense_union_i32' panicked at 'assertion 
> failed: `(left == right)`
>   left: `1024`,
>  right: `768`', arrow/src/array/union.rs:704:9 
> memory::tests::test_allocate stdout 
> thread 'memory::tests::test_allocate' panicked at 'assertion failed: `(left 
> == right)`
>   left: `0`,
>  right: `32`', arrow/src/memory.rs:243:13
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-10062) [Rust]: Fix for null elems for DoubleEndedIter for DictArray

2020-09-22 Thread Mahmut Bulut (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-10062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahmut Bulut updated ARROW-10062:
-
Description: A bug that I've introduced: during the reverse traversal the 
last element with null doesn't signal the completion.

> [Rust]: Fix for null elems for DoubleEndedIter for DictArray
> 
>
> Key: ARROW-10062
> URL: https://issues.apache.org/jira/browse/ARROW-10062
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Reporter: Mahmut Bulut
>Assignee: Mahmut Bulut
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> A bug that I've introduced: during the reverse traversal the last element 
> with null doesn't signal the completion.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10062) [Rust]: Fix for null elems for DoubleEndedIter for DictArray

2020-09-22 Thread Mahmut Bulut (Jira)
Mahmut Bulut created ARROW-10062:


 Summary: [Rust]: Fix for null elems for DoubleEndedIter for 
DictArray
 Key: ARROW-10062
 URL: https://issues.apache.org/jira/browse/ARROW-10062
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust
Reporter: Mahmut Bulut
Assignee: Mahmut Bulut






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10055) [Rust] Implement DoubleEndedIterator for NullableIter

2020-09-21 Thread Mahmut Bulut (Jira)
Mahmut Bulut created ARROW-10055:


 Summary: [Rust] Implement DoubleEndedIterator for NullableIter
 Key: ARROW-10055
 URL: https://issues.apache.org/jira/browse/ARROW-10055
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Mahmut Bulut
Assignee: Mahmut Bulut


Reversing doesn't take place for nullable iter for dictionary keys, so keys 
can't be reversed or rfolded.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-10002) [Rust] Trait-specialization requries nightly

2020-09-16 Thread Mahmut Bulut (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-10002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17196859#comment-17196859
 ] 

Mahmut Bulut commented on ARROW-10002:
--

Hi [~batmanaod] , I have checked out the code and commented about ornamental 
changes. I don't see any visible perf implications. since dispatch over 
PrimitiveArrayOps replaced by the ArrowPrimitiveType, I don't expect that much 
perf impact.

> [Rust] Trait-specialization requries nightly
> 
>
> Key: ARROW-10002
> URL: https://issues.apache.org/jira/browse/ARROW-10002
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Rust
>Reporter: Kyle Strand
>Priority: Major
>
> Trait specialization is widely used in the Rust Arrow implementation. Uses 
> can be identified by searching for instances of {{default fn}} in the 
> codebase:
>  
> {code:java}
> $> rg -c 'default fn' ../arrow/rust/
>  ../arrow/rust/parquet/src/util/test_common/rand_gen.rs:1
>  ../arrow/rust/parquet/src/column/writer.rs:2
>  ../arrow/rust/parquet/src/encodings/encoding.rs:16
>  ../arrow/rust/parquet/src/arrow/record_reader.rs:1
>  ../arrow/rust/parquet/src/encodings/decoding.rs:13
>  ../arrow/rust/parquet/src/file/statistics.rs:1
>  ../arrow/rust/arrow/src/array/builder.rs:7
>  ../arrow/rust/arrow/src/array/array.rs:3
>  ../arrow/rust/arrow/src/array/equal.rs:3{code}
>  
> This feature requires Nightly Rust. Additionally, there is [no schedule for 
> stabilization|https://github.com/rust-lang/rust/issues/31844#issue-135807289] 
> , primarily due to an [unresolved soundness 
> hole|http://aturon.github.io/blog/2017/07/08/lifetime-dispatch]. (Note: there 
> has been further discussion and ideas for resolving the soundness issue, but 
> to my knowledge no definitive action.)
> If we can remove specialization from the Rust codebase, we will not be 
> blocked on the Rust team's stabilization of that feature in order to move to 
> stable Rust.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9722) [Rust]: Shorten key lifetime for reverse lookup for dictionary arrays

2020-08-13 Thread Mahmut Bulut (Jira)
Mahmut Bulut created ARROW-9722:
---

 Summary: [Rust]: Shorten key lifetime for reverse lookup for 
dictionary arrays
 Key: ARROW-9722
 URL: https://issues.apache.org/jira/browse/ARROW-9722
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Mahmut Bulut
Assignee: Mahmut Bulut


Shorten key lifetime for reverse lookup for dictionary arrays



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-9632) [Rust] Add a "new" method for ExecutionContextSchemaProvider

2020-08-03 Thread Mahmut Bulut (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahmut Bulut updated ARROW-9632:

Summary: [Rust] Add a "new" method for ExecutionContextSchemaProvider  
(was: [Rust] add a func "new" for ExecutionContextSchemaProvider)

> [Rust] Add a "new" method for ExecutionContextSchemaProvider
> 
>
> Key: ARROW-9632
> URL: https://issues.apache.org/jira/browse/ARROW-9632
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Affects Versions: 2.0.0
>Reporter: qingcheng wu
>Priority: Major
>
> I use ExecutionContextSchemaProvider in outside app, so i add keyword "pub" 
> for ExecutionContextSchemaProvider, and add a new func "new" for 
> ExecutionContextSchemaProvider.
> I add keyword "pub" for build_schema also.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-9632) [Rust] add a func "new" for ExecutionContextSchemaProvider

2020-08-03 Thread Mahmut Bulut (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahmut Bulut updated ARROW-9632:

Summary: [Rust] add a func "new" for ExecutionContextSchemaProvider  (was: 
add a func "new" for ExecutionContextSchemaProvider)

> [Rust] add a func "new" for ExecutionContextSchemaProvider
> --
>
> Key: ARROW-9632
> URL: https://issues.apache.org/jira/browse/ARROW-9632
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Affects Versions: 2.0.0
>Reporter: qingcheng wu
>Priority: Major
>
> I use ExecutionContextSchemaProvider in outside app, so i add keyword "pub" 
> for ExecutionContextSchemaProvider, and add a new func "new" for 
> ExecutionContextSchemaProvider.
> I add keyword "pub" for build_schema also.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-9582) [Rust] Implement Array::memory_size()

2020-07-31 Thread Mahmut Bulut (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-9582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17168640#comment-17168640
 ] 

Mahmut Bulut commented on ARROW-9582:
-

Yes I am on it atm.

> [Rust] Implement Array::memory_size()
> -
>
> Key: ARROW-9582
> URL: https://issues.apache.org/jira/browse/ARROW-9582
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.0.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> I would like to be able to determine how much memory is being used by Arrow 
> Arrays so that I can better monitor and report on memory usage when profiling 
> and tuning code.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9608) [Rust] Remove arrow flight from parquet's feature gating

2020-07-31 Thread Mahmut Bulut (Jira)
Mahmut Bulut created ARROW-9608:
---

 Summary: [Rust] Remove arrow flight from parquet's feature gating
 Key: ARROW-9608
 URL: https://issues.apache.org/jira/browse/ARROW-9608
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Affects Versions: 1.0.0
Reporter: Mahmut Bulut
Assignee: Mahmut Bulut


Currently, the parquet is installing arrow-flight and it's dependencies, which 
breaks the CI builds and it's unnecessary because it is not used. Parquet 
should work without any default features by default. Simple PR will enable 
building it leaner.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (ARROW-9582) [Rust] Implement Array::memory_size()

2020-07-28 Thread Mahmut Bulut (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-9582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17166496#comment-17166496
 ] 

Mahmut Bulut edited comment on ARROW-9582 at 7/28/20, 3:16 PM:
---

[~andygrove] I have already a handy code for this one. I can open a pr adapting 
that.


was (Author: vertexclique):
[~andygrove] I have already a handy code for this one. I can hand open a pr 
adapting that.

> [Rust] Implement Array::memory_size()
> -
>
> Key: ARROW-9582
> URL: https://issues.apache.org/jira/browse/ARROW-9582
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>
> I would like to be able to determine how much memory is being used by Arrow 
> Arrays so that I can better monitor and report on memory usage when profiling 
> and tuning code.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (ARROW-9582) [Rust] Implement Array::memory_size()

2020-07-28 Thread Mahmut Bulut (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-9582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17166496#comment-17166496
 ] 

Mahmut Bulut edited comment on ARROW-9582 at 7/28/20, 3:16 PM:
---

[~andygrove] I have a snippet for this one. I can open a pr adapting that.


was (Author: vertexclique):
[~andygrove] I have already a handy code for this one. I can open a pr adapting 
that.

> [Rust] Implement Array::memory_size()
> -
>
> Key: ARROW-9582
> URL: https://issues.apache.org/jira/browse/ARROW-9582
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>
> I would like to be able to determine how much memory is being used by Arrow 
> Arrays so that I can better monitor and report on memory usage when profiling 
> and tuning code.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (ARROW-9582) [Rust] Implement Array::memory_size()

2020-07-28 Thread Mahmut Bulut (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-9582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17166496#comment-17166496
 ] 

Mahmut Bulut edited comment on ARROW-9582 at 7/28/20, 3:16 PM:
---

[~andygrove] I have already a handy code for this one. I can hand open a pr 
adapting that.


was (Author: vertexclique):
I have already a handy code for this one. I can hand open a pr adapting that.

> [Rust] Implement Array::memory_size()
> -
>
> Key: ARROW-9582
> URL: https://issues.apache.org/jira/browse/ARROW-9582
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>
> I would like to be able to determine how much memory is being used by Arrow 
> Arrays so that I can better monitor and report on memory usage when profiling 
> and tuning code.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-9582) [Rust] Implement Array::memory_size()

2020-07-28 Thread Mahmut Bulut (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-9582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17166496#comment-17166496
 ] 

Mahmut Bulut commented on ARROW-9582:
-

I have already a handy code for this one. I can hand open a pr adapting that.

> [Rust] Implement Array::memory_size()
> -
>
> Key: ARROW-9582
> URL: https://issues.apache.org/jira/browse/ARROW-9582
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>
> I would like to be able to determine how much memory is being used by Arrow 
> Arrays so that I can better monitor and report on memory usage when profiling 
> and tuning code.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-8480) [Rust] There is no check for allocation failure

2020-07-13 Thread Mahmut Bulut (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-8480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17156899#comment-17156899
 ] 

Mahmut Bulut commented on ARROW-8480:
-

Suggested API can't be used until it stabilizes. So leaving this open until it 
stabilizes. Tracking issue: [https://github.com/rust-lang/rust/issues/32838]

> [Rust] There is no check for allocation failure
> ---
>
> Key: ARROW-8480
> URL: https://issues.apache.org/jira/browse/ARROW-8480
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Reporter: Paddy Horan
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Reported by bluss on Github:
> [https://github.com/rust-ndarray/ndarray/issues/771]
>  
> "What I can see, there is no check for allocation success, so any buffer can 
> be created with a null pointer, which leads to soundness problems in most 
> methods. Best look into using {{std::alloc::handle_alloc_error}} or 
> alternatives. (This problem means that the mutablebuffer is not a safe 
> abstraction, and it should preferably not be exposed as public API like 
> this.)"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-8480) [Rust] There is no check for allocation failure

2020-07-13 Thread Mahmut Bulut (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-8480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17156893#comment-17156893
 ] 

Mahmut Bulut commented on ARROW-8480:
-

Workaround for the first set of allocation related considerations:

[https://github.com/apache/arrow/pull/7734]

> [Rust] There is no check for allocation failure
> ---
>
> Key: ARROW-8480
> URL: https://issues.apache.org/jira/browse/ARROW-8480
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Reporter: Paddy Horan
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Reported by bluss on Github:
> [https://github.com/rust-ndarray/ndarray/issues/771]
>  
> "What I can see, there is no check for allocation success, so any buffer can 
> be created with a null pointer, which leads to soundness problems in most 
> methods. Best look into using {{std::alloc::handle_alloc_error}} or 
> alternatives. (This problem means that the mutablebuffer is not a safe 
> abstraction, and it should preferably not be exposed as public API like 
> this.)"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-9275) [Rust] – Async Sans IO: R/W into/to Arrow Arrays

2020-07-03 Thread Mahmut Bulut (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-9275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17150943#comment-17150943
 ] 

Mahmut Bulut commented on ARROW-9275:
-

Yes, exactly Neville, so users can choose whatever they want to incorporate in 
their workloads, which enables plenty of projects with different workloads, 
scenarios, etc.

And yes again, I feel like there should be a collaborative effort together to 
add APIs around crates. Spans a little wider than other tickets.

Sure! I will send a similar email with similar content of this ticket. Tagging 
`[Rust]`. Thanks for the feedback, will send a mail asap.

> [Rust] – Async Sans IO: R/W into/to Arrow Arrays
> 
>
> Key: ARROW-9275
> URL: https://issues.apache.org/jira/browse/ARROW-9275
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Mahmut Bulut
>Assignee: Mahmut Bulut
>Priority: Major
>
> This issue can be considered an epic level that spans across other arrow 
> projects.
> *Drill down*
> Currently, traits like `ParquetReader` only allow synchronous interface which 
> uses BufReader having 8KB constant buffer. Over the network, this becomes a 
> problem. This can be easily solvable with differential buffers. In addition 
> to this shortage, there is a problem of executor engine is needed to schedule 
> from async trait methods to sync trait methods which should sit somewhere in 
> between to make requests asynchronous to external IO. On-disk IO is 
> acceptable with the approach we currently have since no reliable evented IO 
> exists for on-disk IO on major platforms.
> All these considered abstractions that will expose asynchronous IO without 
> any side from executors, needs to be exposed.
>  
> *Design Suggestions & Considerations*
> The design should apply and consider:
>  * Sans IO, (for more information about Sans approach please see 
> [https://sans-io.readthedocs.io/] ) 
>  * Not including any executor specific data, at all.
>  * Tests should work with any executor with little to no modification.
>  * Buffers are adjusted accordingly and use differential buffers to optimize 
> network trips.
>  * Sync IO shouldn't be touched. At all costs. If we try to unify Sync IO 
> traits or we do overlapping implementation, that will make our life harder in 
> the future. Sans IO should be compartmentalized.
>  
> *Notes*
> If Sans approach is not taken, the project will:
>  * use an extreme amount of dependencies.
>  * be not compatible with other Rust code at all.
>  * break currently working code uses array ingestions.
>  * integrations tests are going to be harder.
>  * it will really hard to adapt to completion-based APIs stabilize in the 
> future. (in the user projects)
>  * this suggestion is not about the flight format or any flight-related 
> information atm. This is purely making on-disk, remote IO (provider backends 
> like AWS etc.) async.
>  
> *Open points*
> A couple of open points:
>  * Identifying traits that are going to be asyncized.
>  * Designing internal routines.
>  * package name to expose.
>  * Gather traits into the designated packages in all file formats.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-9275) [Rust] – Async Sans IO: R/W into/to Arrow Arrays

2020-06-30 Thread Mahmut Bulut (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahmut Bulut updated ARROW-9275:

Description: 
This issue can be considered an epic level that spans across other arrow 
projects.

*Drill down*

Currently, traits like `ParquetReader` only allow synchronous interface which 
uses BufReader having 8KB constant buffer. Over the network, this becomes a 
problem. This can be easily solvable with differential buffers. In addition to 
this shortage, there is a problem of executor engine is needed to schedule from 
async trait methods to sync trait methods which should sit somewhere in between 
to make requests asynchronous to external IO. On-disk IO is acceptable with the 
approach we currently have since no reliable evented IO exists for on-disk IO 
on major platforms.

All these considered abstractions that will expose asynchronous IO without any 
side from executors, needs to be exposed.

 

*Design Suggestions & Considerations*

The design should apply and consider:
 * Sans IO, (for more information about Sans approach please see 
[https://sans-io.readthedocs.io/] ) 
 * Not including any executor specific data, at all.
 * Tests should work with any executor with little to no modification.
 * Buffers are adjusted accordingly and use differential buffers to optimize 
network trips.
 * Sync IO shouldn't be touched. At all costs. If we try to unify Sync IO 
traits or we do overlapping implementation, that will make our life harder in 
the future. Sans IO should be compartmentalized.

 

*Notes*

If Sans approach is not taken, the project will:
 * use an extreme amount of dependencies.
 * be not compatible with other Rust code at all.
 * break currently working code uses array ingestions.
 * integrations tests are going to be harder.
 * it will really hard to adapt to completion-based APIs stabilize in the 
future. (in the user projects)
 * this suggestion is not about the flight format or any flight-related 
information atm. This is purely making on-disk, remote IO (provider backends 
like AWS etc.) async.

 

*Open points*

A couple of open points:
 * Identifying traits that are going to be asyncized.
 * Designing internal routines.
 * package name to expose.
 * Gather traits into the designated packages in all file formats.

  was:
This issue can be considered an epic level that spans across other arrow 
projects.

*Drill down*

Currently, traits like `ParquetReader` only allow synchronous interface which 
uses BufReader having 8KB constant buffer. Over the network, this becomes a 
problem. This can be easily solvable with differential buffers. In addition to 
this shortage, there is a problem of executor engine is needed to schedule from 
async trait methods to sync trait methods which should sit somewhere in between 
to make requests asynchronous to external IO. On-disk IO is acceptable with the 
approach we currently have since no reliable evented IO exists for on-disk IO 
on major platforms.

All these considered abstractions that will expose asynchronous IO without any 
side from executors, needs to be exposed.

 

*Design Suggestions & Considerations*

The design should apply and consider:
 * Sans IO, (for more information about Sans approach please see 
[https://sans-io.readthedocs.io/] ) 
 * Not including any executor specific data, at all.
 * Tests should work with any executor with little to no modification.
 * Buffers are adjusted accordingly and use differential buffers to optimize 
network trips.
 * Sync IO shouldn't be touched. At all costs. If we try to unify Sync IO 
traits or we do overlapping implementation, that will make our life harder in 
the future. Sans IO should be compartmentalized.

 

*Notes*

If Sans approach is not taken, the project will:
 * use an extreme amount of dependencies.
 * be not compatible with other Rust code at all.
 * break currently working code uses array ingestions.
 * integrations tests are going to be harder.
 * it will really hard to adapt to completion-based APIs stabilize in the 
future. (in the user projects)
 * this suggestion is not about the in-flight format or any in-flight related 
information atm. This is purely making on-disk, remote IO (provider backends 
like AWS etc.) async.

 

*Open points*

A couple of open points:
 * Identifying traits that are going to be asyncized.
 * Designing internal routines.
 * package name to expose.
 * Gather traits into the designated packages in all file formats.


> [Rust] – Async Sans IO: R/W into/to Arrow Arrays
> 
>
> Key: ARROW-9275
> URL: https://issues.apache.org/jira/browse/ARROW-9275
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Mahmut Bulut
>Assignee: Mahmut Bulut
>Priority: Major
>
> This issue can be considered an 

[jira] [Commented] (ARROW-9275) [Rust] – Async Sans IO: R/W into/to Arrow Arrays

2020-06-30 Thread Mahmut Bulut (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-9275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17148510#comment-17148510
 ] 

Mahmut Bulut commented on ARROW-9275:
-

[~nevi_me], [~andygrove], [~paddyhoran] I need input for this from you if 
possible.

> [Rust] – Async Sans IO: R/W into/to Arrow Arrays
> 
>
> Key: ARROW-9275
> URL: https://issues.apache.org/jira/browse/ARROW-9275
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Mahmut Bulut
>Assignee: Mahmut Bulut
>Priority: Major
>
> This issue can be considered an epic level that spans across other arrow 
> projects.
> *Drill down*
> Currently, traits like `ParquetReader` only allow synchronous interface which 
> uses BufReader having 8KB constant buffer. Over the network, this becomes a 
> problem. This can be easily solvable with differential buffers. In addition 
> to this shortage, there is a problem of executor engine is needed to schedule 
> from async trait methods to sync trait methods which should sit somewhere in 
> between to make requests asynchronous to external IO. On-disk IO is 
> acceptable with the approach we currently have since no reliable evented IO 
> exists for on-disk IO on major platforms.
> All these considered abstractions that will expose asynchronous IO without 
> any side from executors, needs to be exposed.
>  
> *Design Suggestions & Considerations*
> The design should apply and consider:
>  * Sans IO, (for more information about Sans approach please see 
> [https://sans-io.readthedocs.io/] ) 
>  * Not including any executor specific data, at all.
>  * Tests should work with any executor with little to no modification.
>  * Buffers are adjusted accordingly and use differential buffers to optimize 
> network trips.
>  * Sync IO shouldn't be touched. At all costs. If we try to unify Sync IO 
> traits or we do overlapping implementation, that will make our life harder in 
> the future. Sans IO should be compartmentalized.
>  
> *Notes*
> If Sans approach is not taken, the project will:
>  * use an extreme amount of dependencies.
>  * be not compatible with other Rust code at all.
>  * break currently working code uses array ingestions.
>  * integrations tests are going to be harder.
>  * it will really hard to adapt to completion-based APIs stabilize in the 
> future. (in the user projects)
>  * this suggestion is not about the in-flight format or any in-flight related 
> information atm. This is purely making on-disk, remote IO (provider backends 
> like AWS etc.) async.
>  
> *Open points*
> A couple of open points:
>  * Identifying traits that are going to be asyncized.
>  * Designing internal routines.
>  * package name to expose.
>  * Gather traits into the designated packages in all file formats.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9275) [Rust] – Async Sans IO: R/W into/to Arrow Arrays

2020-06-30 Thread Mahmut Bulut (Jira)
Mahmut Bulut created ARROW-9275:
---

 Summary: [Rust] – Async Sans IO: R/W into/to Arrow Arrays
 Key: ARROW-9275
 URL: https://issues.apache.org/jira/browse/ARROW-9275
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Mahmut Bulut
Assignee: Mahmut Bulut


This issue can be considered an epic level that spans across other arrow 
projects.

*Drill down*

Currently, traits like `ParquetReader` only allow synchronous interface which 
uses BufReader having 8KB constant buffer. Over the network, this becomes a 
problem. This can be easily solvable with differential buffers. In addition to 
this shortage, there is a problem of executor engine is needed to schedule from 
async trait methods to sync trait methods which should sit somewhere in between 
to make requests asynchronous to external IO. On-disk IO is acceptable with the 
approach we currently have since no reliable evented IO exists for on-disk IO 
on major platforms.

All these considered abstractions that will expose asynchronous IO without any 
side from executors, needs to be exposed.

 

*Design Suggestions & Considerations*

The design should apply and consider:
 * Sans IO, (for more information about Sans approach please see 
[https://sans-io.readthedocs.io/] ) 
 * Not including any executor specific data, at all.
 * Tests should work with any executor with little to no modification.
 * Buffers are adjusted accordingly and use differential buffers to optimize 
network trips.
 * Sync IO shouldn't be touched. At all costs. If we try to unify Sync IO 
traits or we do overlapping implementation, that will make our life harder in 
the future. Sans IO should be compartmentalized.

 

*Notes*

If Sans approach is not taken, the project will:
 * use an extreme amount of dependencies.
 * be not compatible with other Rust code at all.
 * break currently working code uses array ingestions.
 * integrations tests are going to be harder.
 * it will really hard to adapt to completion-based APIs stabilize in the 
future. (in the user projects)
 * this suggestion is not about the in-flight format or any in-flight related 
information atm. This is purely making on-disk, remote IO (provider backends 
like AWS etc.) async.

 

*Open points*

A couple of open points:
 * Identifying traits that are going to be asyncized.
 * Designing internal routines.
 * package name to expose.
 * Gather traits into the designated packages in all file formats.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)