[jira] [Created] (ARROW-11760) [Rust]: Conditionally compile leak tracking & lower atomic consistency guarantees
Mahmut Bulut created ARROW-11760: Summary: [Rust]: Conditionally compile leak tracking & lower atomic consistency guarantees Key: ARROW-11760 URL: https://issues.apache.org/jira/browse/ARROW-11760 Project: Apache Arrow Issue Type: Improvement Components: Rust Reporter: Mahmut Bulut Assignee: Mahmut Bulut Conditionally compile object tracking in alloc.rs and lower the atomic consistency guarantees. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11316) [Rust]: BitMap is_set should return Result rather than relying on inlined assertion
Mahmut Bulut created ARROW-11316: Summary: [Rust]: BitMap is_set should return Result rather than relying on inlined assertion Key: ARROW-11316 URL: https://issues.apache.org/jira/browse/ARROW-11316 Project: Apache Arrow Issue Type: Bug Reporter: Mahmut Bulut The inlined assertion is prone to fail and panic when a user of the method passes anything other than 0..7 range. This is making wrong usages to crash the application that uses Arrow. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11141) [Rust]: Miri checks
Mahmut Bulut created ARROW-11141: Summary: [Rust]: Miri checks Key: ARROW-11141 URL: https://issues.apache.org/jira/browse/ARROW-11141 Project: Apache Arrow Issue Type: Improvement Reporter: Mahmut Bulut Assignee: Mahmut Bulut Miri checks need to be enabled to see if there are any out of bounds reads or invalid memory accesses. Currently, there is no way of determining this and all invalid memory access related issues are experienced on the arrow dependant code. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10664) [Rust] Implement AVX-512 sort operation
Mahmut Bulut created ARROW-10664: Summary: [Rust] Implement AVX-512 sort operation Key: ARROW-10664 URL: https://issues.apache.org/jira/browse/ARROW-10664 Project: Apache Arrow Issue Type: Sub-task Components: Rust Reporter: Mahmut Bulut Assignee: Mahmut Bulut -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10660) [Rust] Implement AVX-512 bit or operation
Mahmut Bulut created ARROW-10660: Summary: [Rust] Implement AVX-512 bit or operation Key: ARROW-10660 URL: https://issues.apache.org/jira/browse/ARROW-10660 Project: Apache Arrow Issue Type: Sub-task Components: Rust Reporter: Mahmut Bulut Assignee: Mahmut Bulut -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-10653) [Rust]: Update toolchain version to bring new features
[ https://issues.apache.org/jira/browse/ARROW-10653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahmut Bulut updated ARROW-10653: - Component/s: Rust > [Rust]: Update toolchain version to bring new features > -- > > Key: ARROW-10653 > URL: https://issues.apache.org/jira/browse/ARROW-10653 > Project: Apache Arrow > Issue Type: New Feature > Components: Rust >Reporter: Mahmut Bulut >Assignee: Mahmut Bulut >Priority: Major > > I have deployed new intrinsics to rust lang core, so I want to bring these in > iterations. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-10653) [Rust]: Update toolchain version to bring new features
[ https://issues.apache.org/jira/browse/ARROW-10653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahmut Bulut updated ARROW-10653: - Issue Type: New Feature (was: Bug) > [Rust]: Update toolchain version to bring new features > -- > > Key: ARROW-10653 > URL: https://issues.apache.org/jira/browse/ARROW-10653 > Project: Apache Arrow > Issue Type: New Feature >Reporter: Mahmut Bulut >Assignee: Mahmut Bulut >Priority: Major > > I have deployed new intrinsics to rust lang core, so I want to bring these in > iterations. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-10612) [Rust]: Tracking issue for AVX-512
[ https://issues.apache.org/jira/browse/ARROW-10612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahmut Bulut updated ARROW-10612: - Labels: AVX-512 SIMD (was: ) > [Rust]: Tracking issue for AVX-512 > -- > > Key: ARROW-10612 > URL: https://issues.apache.org/jira/browse/ARROW-10612 > Project: Apache Arrow > Issue Type: New Feature > Components: Rust >Reporter: Mahmut Bulut >Assignee: Mahmut Bulut >Priority: Major > Labels: AVX-512, SIMD > > This issue will track AVX-512 feature development in its entirety. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-10612) [Rust]: Tracking issue for AVX-512
[ https://issues.apache.org/jira/browse/ARROW-10612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahmut Bulut updated ARROW-10612: - Issue Type: New Feature (was: Improvement) > [Rust]: Tracking issue for AVX-512 > -- > > Key: ARROW-10612 > URL: https://issues.apache.org/jira/browse/ARROW-10612 > Project: Apache Arrow > Issue Type: New Feature > Components: Rust >Reporter: Mahmut Bulut >Assignee: Mahmut Bulut >Priority: Major > > This issue will track AVX-512 feature development in its entirety. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10615) [Rust]: Adapt existing benchmarks to have proper execution over AVX-512
Mahmut Bulut created ARROW-10615: Summary: [Rust]: Adapt existing benchmarks to have proper execution over AVX-512 Key: ARROW-10615 URL: https://issues.apache.org/jira/browse/ARROW-10615 Project: Apache Arrow Issue Type: Sub-task Components: Rust Reporter: Mahmut Bulut Some benchmarks are utilizing the same data which is easily predictable during execution, moreover, some of them the insufficient amount of data to utilize the power of the implemented operations. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10614) [Rust]: Write documentation for AVX-512 in the Arrow Readme
Mahmut Bulut created ARROW-10614: Summary: [Rust]: Write documentation for AVX-512 in the Arrow Readme Key: ARROW-10614 URL: https://issues.apache.org/jira/browse/ARROW-10614 Project: Apache Arrow Issue Type: Sub-task Components: Rust Reporter: Mahmut Bulut Write documentation about how SIMD related features work in addition to `avx512` feature which will be introduced under this tracking issue. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10613) [Rust]: Enable CI for AVX-512
Mahmut Bulut created ARROW-10613: Summary: [Rust]: Enable CI for AVX-512 Key: ARROW-10613 URL: https://issues.apache.org/jira/browse/ARROW-10613 Project: Apache Arrow Issue Type: Sub-task Components: Rust Reporter: Mahmut Bulut Enable CI for AVX-512, CI will work on the nightly compiler dating with a version later than 14.11.2020 as of today. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10612) [Rust]: Tracking issue for AVX-512
Mahmut Bulut created ARROW-10612: Summary: [Rust]: Tracking issue for AVX-512 Key: ARROW-10612 URL: https://issues.apache.org/jira/browse/ARROW-10612 Project: Apache Arrow Issue Type: Improvement Components: Rust Reporter: Mahmut Bulut Assignee: Mahmut Bulut This issue will track AVX-512 feature development in its entirety. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10589) [Rust]: Implement AVX-512 bit and operation
Mahmut Bulut created ARROW-10589: Summary: [Rust]: Implement AVX-512 bit and operation Key: ARROW-10589 URL: https://issues.apache.org/jira/browse/ARROW-10589 Project: Apache Arrow Issue Type: Improvement Components: Rust Reporter: Mahmut Bulut Assignee: Mahmut Bulut Implement bit and on avx-512. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10588) [Rust]: Safe bit operations for Arrow
Mahmut Bulut created ARROW-10588: Summary: [Rust]: Safe bit operations for Arrow Key: ARROW-10588 URL: https://issues.apache.org/jira/browse/ARROW-10588 Project: Apache Arrow Issue Type: Improvement Components: Rust Reporter: Mahmut Bulut Assignee: Mahmut Bulut Implement bit operations over the safe interface with checks instead of using unsafe operations. Expose better API to users. Extends ARROW-10535. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-10572) [Rust][DataFusion] Use aHash and std::collections hashmap for aggregates / distinct
[ https://issues.apache.org/jira/browse/ARROW-10572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahmut Bulut updated ARROW-10572: - Summary: [Rust][DataFusion] Use aHash and std::collections hashmap for aggregates / distinct (was: [Rust] Use aHash and std::collections hashmap for aggregates / distinct) > [Rust][DataFusion] Use aHash and std::collections hashmap for aggregates / > distinct > --- > > Key: ARROW-10572 > URL: https://issues.apache.org/jira/browse/ARROW-10572 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust - DataFusion >Reporter: Daniël Heres >Priority: Major > Labels: pull-request-available > Time Spent: 2h 10m > Remaining Estimate: 0h > > Ahash is a faster hash algorithm than FNV. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-10572) [Rust]: Use aHash and std::collections hashmap for aggregates / distinct
[ https://issues.apache.org/jira/browse/ARROW-10572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahmut Bulut updated ARROW-10572: - Summary: [Rust]: Use aHash and std::collections hashmap for aggregates / distinct (was: Use aHash and std::collections hashmap for aggregates / distinct) > [Rust]: Use aHash and std::collections hashmap for aggregates / distinct > > > Key: ARROW-10572 > URL: https://issues.apache.org/jira/browse/ARROW-10572 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust - DataFusion >Reporter: Daniël Heres >Priority: Major > Labels: pull-request-available > Time Spent: 2h 10m > Remaining Estimate: 0h > > Ahash is a faster hash algorithm than FNV. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-10572) [Rust] Use aHash and std::collections hashmap for aggregates / distinct
[ https://issues.apache.org/jira/browse/ARROW-10572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahmut Bulut updated ARROW-10572: - Summary: [Rust] Use aHash and std::collections hashmap for aggregates / distinct (was: [Rust]: Use aHash and std::collections hashmap for aggregates / distinct) > [Rust] Use aHash and std::collections hashmap for aggregates / distinct > --- > > Key: ARROW-10572 > URL: https://issues.apache.org/jira/browse/ARROW-10572 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust - DataFusion >Reporter: Daniël Heres >Priority: Major > Labels: pull-request-available > Time Spent: 2h 10m > Remaining Estimate: 0h > > Ahash is a faster hash algorithm than FNV. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10551) [Rust]: Fix unreproducible benchmarks
Mahmut Bulut created ARROW-10551: Summary: [Rust]: Fix unreproducible benchmarks Key: ARROW-10551 URL: https://issues.apache.org/jira/browse/ARROW-10551 Project: Apache Arrow Issue Type: Bug Components: Rust Reporter: Mahmut Bulut Assignee: Mahmut Bulut Some benchmarks are unreproducible in Arrow impl. Fix them. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10538) [Rust]: Read/write data in respect to endianness
Mahmut Bulut created ARROW-10538: Summary: [Rust]: Read/write data in respect to endianness Key: ARROW-10538 URL: https://issues.apache.org/jira/browse/ARROW-10538 Project: Apache Arrow Issue Type: Sub-task Components: Rust Reporter: Mahmut Bulut Adapt endianness while parsing data on the machine with respect to endianness in: https://github.com/apache/arrow/blob/master/format/Schema.fbs#L368-L371 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10537) [Rust]: Fix dense array implementations
Mahmut Bulut created ARROW-10537: Summary: [Rust]: Fix dense array implementations Key: ARROW-10537 URL: https://issues.apache.org/jira/browse/ARROW-10537 Project: Apache Arrow Issue Type: Sub-task Components: Rust Reporter: Mahmut Bulut Dense implementations like union and null arrays should be fixed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10536) [Rust]: Adapt kernels to big endian platforms
Mahmut Bulut created ARROW-10536: Summary: [Rust]: Adapt kernels to big endian platforms Key: ARROW-10536 URL: https://issues.apache.org/jira/browse/ARROW-10536 Project: Apache Arrow Issue Type: Sub-task Components: Rust Reporter: Mahmut Bulut -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (ARROW-10534) [Rust]: Implement bit slice iterator for big endian platforms
[ https://issues.apache.org/jira/browse/ARROW-10534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahmut Bulut closed ARROW-10534. Resolution: Duplicate > [Rust]: Implement bit slice iterator for big endian platforms > - > > Key: ARROW-10534 > URL: https://issues.apache.org/jira/browse/ARROW-10534 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust >Reporter: Mahmut Bulut >Assignee: Mahmut Bulut >Priority: Major > > Implement big-endian support for bit slice iterators in the array, also allow > storing and interpreting data as big-endian. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10535) [Rust]: Implement bit slice iterator for big endian platforms
Mahmut Bulut created ARROW-10535: Summary: [Rust]: Implement bit slice iterator for big endian platforms Key: ARROW-10535 URL: https://issues.apache.org/jira/browse/ARROW-10535 Project: Apache Arrow Issue Type: Sub-task Components: Rust Reporter: Mahmut Bulut Assignee: Mahmut Bulut Implement big-endian support for bit slice iterators in the arrow arrays, also allow storing and interpreting data as big-endian. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10534) [Rust]: Implement bit slice iterator for big endian platforms
Mahmut Bulut created ARROW-10534: Summary: [Rust]: Implement bit slice iterator for big endian platforms Key: ARROW-10534 URL: https://issues.apache.org/jira/browse/ARROW-10534 Project: Apache Arrow Issue Type: Improvement Components: Rust Reporter: Mahmut Bulut Assignee: Mahmut Bulut Implement big-endian support for bit slice iterators in the array, also allow storing and interpreting data as big-endian. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10533) [Rust]: Tracking issue for big endian platforms
Mahmut Bulut created ARROW-10533: Summary: [Rust]: Tracking issue for big endian platforms Key: ARROW-10533 URL: https://issues.apache.org/jira/browse/ARROW-10533 Project: Apache Arrow Issue Type: Improvement Components: Rust, Rust - DataFusion Reporter: Mahmut Bulut Assignee: Mahmut Bulut This is a placeholder tracking issue for big-endian platform support of Arrow's Rust version. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10513) [Rust]: Enable running Arrow on ARMv7 with different ABIs
Mahmut Bulut created ARROW-10513: Summary: [Rust]: Enable running Arrow on ARMv7 with different ABIs Key: ARROW-10513 URL: https://issues.apache.org/jira/browse/ARROW-10513 Project: Apache Arrow Issue Type: Improvement Components: Rust Reporter: Mahmut Bulut Assignee: Mahmut Bulut -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10507) [Rust]: Make full integration of ARMv7
Mahmut Bulut created ARROW-10507: Summary: [Rust]: Make full integration of ARMv7 Key: ARROW-10507 URL: https://issues.apache.org/jira/browse/ARROW-10507 Project: Apache Arrow Issue Type: Improvement Components: Rust Reporter: Mahmut Bulut Assignee: Mahmut Bulut Arrow is not ready for ARMv7. It needs an effort to finalize the full integration. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10500) [Rust] Refactor bit slice, bit view iterator for array buffers
Mahmut Bulut created ARROW-10500: Summary: [Rust] Refactor bit slice, bit view iterator for array buffers Key: ARROW-10500 URL: https://issues.apache.org/jira/browse/ARROW-10500 Project: Apache Arrow Issue Type: Improvement Components: Rust Reporter: Mahmut Bulut Assignee: Mahmut Bulut Currently, bit slice, bit view, and operations all kind of bit operations looking blurry. # Support native endianness # Fix problems related to bit operations # Method docs are written. # Separate view and bit operation # Have good benchmarks still -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10393) [Rust]: Fix null value reading in jsonreader for both dictionary and stringbuilders
Mahmut Bulut created ARROW-10393: Summary: [Rust]: Fix null value reading in jsonreader for both dictionary and stringbuilders Key: ARROW-10393 URL: https://issues.apache.org/jira/browse/ARROW-10393 Project: Apache Arrow Issue Type: Bug Components: Rust Reporter: Mahmut Bulut Assignee: Mahmut Bulut There is a problem with reading nested null values for listarrays with both normal string builders and dictionary builders -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10339) [Rust]: Builder benchmarks are giving segfault
Mahmut Bulut created ARROW-10339: Summary: [Rust]: Builder benchmarks are giving segfault Key: ARROW-10339 URL: https://issues.apache.org/jira/browse/ARROW-10339 Project: Apache Arrow Issue Type: Bug Components: Rust Reporter: Mahmut Bulut On the rustc stable(rustc 1.47.0 (18bf6b4f0 2020-10-07)) boolean benchmarks are giving segfault for the arrow. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10338) [Rust]: Use const fn for applicable methods
Mahmut Bulut created ARROW-10338: Summary: [Rust]: Use const fn for applicable methods Key: ARROW-10338 URL: https://issues.apache.org/jira/browse/ARROW-10338 Project: Apache Arrow Issue Type: Improvement Components: Rust Reporter: Mahmut Bulut Assignee: Mahmut Bulut I have realized that most of the propagation is not happening correctly and still boundary checks are triggered for kernels and operations. For this reason, if applicable, methods should use const fn. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-10335) [Rust]: Unify common methods of dictionaries and other array types
[ https://issues.apache.org/jira/browse/ARROW-10335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahmut Bulut updated ARROW-10335: - Description: Currently, we have a differently named set of methods which do the same thing underneath but written inside the concrete implementations of Arrays. One example is append_value in DictionaryArray. Unify these methods with primitive arrays to prevent passing around dynamic objects. > [Rust]: Unify common methods of dictionaries and other array types > -- > > Key: ARROW-10335 > URL: https://issues.apache.org/jira/browse/ARROW-10335 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust >Reporter: Mahmut Bulut >Priority: Major > > Currently, we have a differently named set of methods which do the same thing > underneath but written inside the concrete implementations of Arrays. One > example is append_value in DictionaryArray. Unify these methods with > primitive arrays to prevent passing around dynamic objects. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10335) [Rust]: Unify common methods of dictionaries and other array types
Mahmut Bulut created ARROW-10335: Summary: [Rust]: Unify common methods of dictionaries and other array types Key: ARROW-10335 URL: https://issues.apache.org/jira/browse/ARROW-10335 Project: Apache Arrow Issue Type: Improvement Components: Rust Reporter: Mahmut Bulut -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-10326) [Rust] Add missing method docs for Arrays
[ https://issues.apache.org/jira/browse/ARROW-10326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahmut Bulut updated ARROW-10326: - Description: Whenever a PR comes we don't inspect documentation thus some of the methods are missing documentations about what they do. We should regularly check and carefully inspect the explanations if they are adequate or not. This issue is for filling in all missing doc comments. (was: Currently, whenever a PR comes we don't inspect documentation thus some of the methods are missing documentations about what they do. We should regularly check and carefully inspect the explanations if they are adequate or not. This issue is for filling in all missing doc comments.) > [Rust] Add missing method docs for Arrays > - > > Key: ARROW-10326 > URL: https://issues.apache.org/jira/browse/ARROW-10326 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust >Reporter: Mahmut Bulut >Priority: Major > > Whenever a PR comes we don't inspect documentation thus some of the methods > are missing documentations about what they do. We should regularly check and > carefully inspect the explanations if they are adequate or not. This issue is > for filling in all missing doc comments. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-10326) [Rust] Add missing method docs for Arrays
[ https://issues.apache.org/jira/browse/ARROW-10326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahmut Bulut updated ARROW-10326: - Description: Currently, whenever a PR comes we don't inspect documentation thus some of the methods are missing documentations about what they do. We should regularly check and carefully inspect the explanations if they are adequate or not. This issue is for filling in all missing doc comments. (was: Currently, whenever a PR comes we don't inspect documentation thus some of the methods are missing documentations about what they do. We should regularly check and carefully inspect the explanations that are adequate and not missing. This issue is for filling in all missing doc comments.) > [Rust] Add missing method docs for Arrays > - > > Key: ARROW-10326 > URL: https://issues.apache.org/jira/browse/ARROW-10326 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust >Reporter: Mahmut Bulut >Priority: Major > > Currently, whenever a PR comes we don't inspect documentation thus some of > the methods are missing documentations about what they do. We should > regularly check and carefully inspect the explanations if they are adequate > or not. This issue is for filling in all missing doc comments. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10326) [Rust] Add missing method docs for Arrays
Mahmut Bulut created ARROW-10326: Summary: [Rust] Add missing method docs for Arrays Key: ARROW-10326 URL: https://issues.apache.org/jira/browse/ARROW-10326 Project: Apache Arrow Issue Type: Improvement Components: Rust Reporter: Mahmut Bulut Currently, whenever a PR comes we don't inspect documentation thus some of the methods are missing documentations about what they do. We should regularly check and carefully inspect the explanations that are adequate and not missing. This issue is for filling in all missing doc comments. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (ARROW-10249) [Rust]: Support Dictionary types for ListArrays in arrow json reader
[ https://issues.apache.org/jira/browse/ARROW-10249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahmut Bulut reassigned ARROW-10249: Assignee: Mahmut Bulut > [Rust]: Support Dictionary types for ListArrays in arrow json reader > > > Key: ARROW-10249 > URL: https://issues.apache.org/jira/browse/ARROW-10249 > Project: Apache Arrow > Issue Type: New Feature > Components: Rust >Reporter: Mahmut Bulut >Assignee: Mahmut Bulut >Priority: Major > > Currently, dictionary types for listarrays are not supported in Arrow JSON > reader. It would be nice to add dictionary type support. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-10249) [Rust]: Support Dictionary types for ListArrays in arrow json reader
[ https://issues.apache.org/jira/browse/ARROW-10249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahmut Bulut updated ARROW-10249: - Summary: [Rust]: Support Dictionary types for ListArrays in arrow json reader (was: [Rust]: Support Dictionary types in arrow json reader) > [Rust]: Support Dictionary types for ListArrays in arrow json reader > > > Key: ARROW-10249 > URL: https://issues.apache.org/jira/browse/ARROW-10249 > Project: Apache Arrow > Issue Type: New Feature > Components: Rust >Reporter: Mahmut Bulut >Priority: Major > > Currently, dictionary types are not supported in Arrow JSON reader. It would > be nice to add dictionary type support. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-10249) [Rust]: Support Dictionary types for ListArrays in arrow json reader
[ https://issues.apache.org/jira/browse/ARROW-10249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahmut Bulut updated ARROW-10249: - Description: Currently, dictionary types for listarrays are not supported in Arrow JSON reader. It would be nice to add dictionary type support. (was: Currently, dictionary types are not supported in Arrow JSON reader. It would be nice to add dictionary type support.) > [Rust]: Support Dictionary types for ListArrays in arrow json reader > > > Key: ARROW-10249 > URL: https://issues.apache.org/jira/browse/ARROW-10249 > Project: Apache Arrow > Issue Type: New Feature > Components: Rust >Reporter: Mahmut Bulut >Priority: Major > > Currently, dictionary types for listarrays are not supported in Arrow JSON > reader. It would be nice to add dictionary type support. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10249) [Rust]: Support Dictionary types in arrow json reader
Mahmut Bulut created ARROW-10249: Summary: [Rust]: Support Dictionary types in arrow json reader Key: ARROW-10249 URL: https://issues.apache.org/jira/browse/ARROW-10249 Project: Apache Arrow Issue Type: New Feature Components: Rust Reporter: Mahmut Bulut Currently, dictionary types are not supported in Arrow JSON reader. It would be nice to add dictionary type support. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-10187) [Rust] Test failures on 32 bit ARM (Raspberry Pi)
[ https://issues.apache.org/jira/browse/ARROW-10187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17208817#comment-17208817 ] Mahmut Bulut commented on ARROW-10187: -- [~andygrove] Hi Andy, I don't have raspberry pi at hand. I want to check the compilation problems on ARM asap, target_pointer_width gate might be a good option for it. What version of rpi did you use? > [Rust] Test failures on 32 bit ARM (Raspberry Pi) > - > > Key: ARROW-10187 > URL: https://issues.apache.org/jira/browse/ARROW-10187 > Project: Apache Arrow > Issue Type: Bug > Components: Rust >Reporter: Andy Grove >Assignee: Andy Grove >Priority: Major > > Perhaps these failures are to be expected and perhaps we can't really support > 32 bit? > > {code:java} > array::array::tests::test_primitive_array_from_vec stdout > thread 'array::array::tests::test_primitive_array_from_vec' panicked at > 'assertion failed: `(left == right)` > left: `144`, > right: `104`', arrow/src/array/array.rs:2383:9 > array::array::tests::test_primitive_array_from_vec_option stdout > thread 'array::array::tests::test_primitive_array_from_vec_option' panicked > at 'assertion failed: `(left == right)` > left: `224`, > right: `176`', arrow/src/array/array.rs:2409:9 > array::null::tests::test_null_array stdout > thread 'array::null::tests::test_null_array' panicked at 'assertion failed: > `(left == right)` > left: `64`, > right: `32`', arrow/src/array/null.rs:134:9 > array::union::tests::test_dense_union_i32 stdout > thread 'array::union::tests::test_dense_union_i32' panicked at 'assertion > failed: `(left == right)` > left: `1024`, > right: `768`', arrow/src/array/union.rs:704:9 > memory::tests::test_allocate stdout > thread 'memory::tests::test_allocate' panicked at 'assertion failed: `(left > == right)` > left: `0`, > right: `32`', arrow/src/memory.rs:243:13 > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-10062) [Rust]: Fix for null elems for DoubleEndedIter for DictArray
[ https://issues.apache.org/jira/browse/ARROW-10062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahmut Bulut updated ARROW-10062: - Description: A bug that I've introduced: during the reverse traversal the last element with null doesn't signal the completion. > [Rust]: Fix for null elems for DoubleEndedIter for DictArray > > > Key: ARROW-10062 > URL: https://issues.apache.org/jira/browse/ARROW-10062 > Project: Apache Arrow > Issue Type: Bug > Components: Rust >Reporter: Mahmut Bulut >Assignee: Mahmut Bulut >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > A bug that I've introduced: during the reverse traversal the last element > with null doesn't signal the completion. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10062) [Rust]: Fix for null elems for DoubleEndedIter for DictArray
Mahmut Bulut created ARROW-10062: Summary: [Rust]: Fix for null elems for DoubleEndedIter for DictArray Key: ARROW-10062 URL: https://issues.apache.org/jira/browse/ARROW-10062 Project: Apache Arrow Issue Type: Bug Components: Rust Reporter: Mahmut Bulut Assignee: Mahmut Bulut -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10055) [Rust] Implement DoubleEndedIterator for NullableIter
Mahmut Bulut created ARROW-10055: Summary: [Rust] Implement DoubleEndedIterator for NullableIter Key: ARROW-10055 URL: https://issues.apache.org/jira/browse/ARROW-10055 Project: Apache Arrow Issue Type: Improvement Components: Rust Reporter: Mahmut Bulut Assignee: Mahmut Bulut Reversing doesn't take place for nullable iter for dictionary keys, so keys can't be reversed or rfolded. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-10002) [Rust] Trait-specialization requries nightly
[ https://issues.apache.org/jira/browse/ARROW-10002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17196859#comment-17196859 ] Mahmut Bulut commented on ARROW-10002: -- Hi [~batmanaod] , I have checked out the code and commented about ornamental changes. I don't see any visible perf implications. since dispatch over PrimitiveArrayOps replaced by the ArrowPrimitiveType, I don't expect that much perf impact. > [Rust] Trait-specialization requries nightly > > > Key: ARROW-10002 > URL: https://issues.apache.org/jira/browse/ARROW-10002 > Project: Apache Arrow > Issue Type: Sub-task > Components: Rust >Reporter: Kyle Strand >Priority: Major > > Trait specialization is widely used in the Rust Arrow implementation. Uses > can be identified by searching for instances of {{default fn}} in the > codebase: > > {code:java} > $> rg -c 'default fn' ../arrow/rust/ > ../arrow/rust/parquet/src/util/test_common/rand_gen.rs:1 > ../arrow/rust/parquet/src/column/writer.rs:2 > ../arrow/rust/parquet/src/encodings/encoding.rs:16 > ../arrow/rust/parquet/src/arrow/record_reader.rs:1 > ../arrow/rust/parquet/src/encodings/decoding.rs:13 > ../arrow/rust/parquet/src/file/statistics.rs:1 > ../arrow/rust/arrow/src/array/builder.rs:7 > ../arrow/rust/arrow/src/array/array.rs:3 > ../arrow/rust/arrow/src/array/equal.rs:3{code} > > This feature requires Nightly Rust. Additionally, there is [no schedule for > stabilization|https://github.com/rust-lang/rust/issues/31844#issue-135807289] > , primarily due to an [unresolved soundness > hole|http://aturon.github.io/blog/2017/07/08/lifetime-dispatch]. (Note: there > has been further discussion and ideas for resolving the soundness issue, but > to my knowledge no definitive action.) > If we can remove specialization from the Rust codebase, we will not be > blocked on the Rust team's stabilization of that feature in order to move to > stable Rust. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-9722) [Rust]: Shorten key lifetime for reverse lookup for dictionary arrays
Mahmut Bulut created ARROW-9722: --- Summary: [Rust]: Shorten key lifetime for reverse lookup for dictionary arrays Key: ARROW-9722 URL: https://issues.apache.org/jira/browse/ARROW-9722 Project: Apache Arrow Issue Type: Improvement Components: Rust Reporter: Mahmut Bulut Assignee: Mahmut Bulut Shorten key lifetime for reverse lookup for dictionary arrays -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-9632) [Rust] Add a "new" method for ExecutionContextSchemaProvider
[ https://issues.apache.org/jira/browse/ARROW-9632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahmut Bulut updated ARROW-9632: Summary: [Rust] Add a "new" method for ExecutionContextSchemaProvider (was: [Rust] add a func "new" for ExecutionContextSchemaProvider) > [Rust] Add a "new" method for ExecutionContextSchemaProvider > > > Key: ARROW-9632 > URL: https://issues.apache.org/jira/browse/ARROW-9632 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust >Affects Versions: 2.0.0 >Reporter: qingcheng wu >Priority: Major > > I use ExecutionContextSchemaProvider in outside app, so i add keyword "pub" > for ExecutionContextSchemaProvider, and add a new func "new" for > ExecutionContextSchemaProvider. > I add keyword "pub" for build_schema also. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-9632) [Rust] add a func "new" for ExecutionContextSchemaProvider
[ https://issues.apache.org/jira/browse/ARROW-9632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahmut Bulut updated ARROW-9632: Summary: [Rust] add a func "new" for ExecutionContextSchemaProvider (was: add a func "new" for ExecutionContextSchemaProvider) > [Rust] add a func "new" for ExecutionContextSchemaProvider > -- > > Key: ARROW-9632 > URL: https://issues.apache.org/jira/browse/ARROW-9632 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust >Affects Versions: 2.0.0 >Reporter: qingcheng wu >Priority: Major > > I use ExecutionContextSchemaProvider in outside app, so i add keyword "pub" > for ExecutionContextSchemaProvider, and add a new func "new" for > ExecutionContextSchemaProvider. > I add keyword "pub" for build_schema also. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-9582) [Rust] Implement Array::memory_size()
[ https://issues.apache.org/jira/browse/ARROW-9582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17168640#comment-17168640 ] Mahmut Bulut commented on ARROW-9582: - Yes I am on it atm. > [Rust] Implement Array::memory_size() > - > > Key: ARROW-9582 > URL: https://issues.apache.org/jira/browse/ARROW-9582 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust >Reporter: Andy Grove >Assignee: Andy Grove >Priority: Major > Labels: pull-request-available > Fix For: 2.0.0 > > Time Spent: 1h > Remaining Estimate: 0h > > I would like to be able to determine how much memory is being used by Arrow > Arrays so that I can better monitor and report on memory usage when profiling > and tuning code. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-9608) [Rust] Remove arrow flight from parquet's feature gating
Mahmut Bulut created ARROW-9608: --- Summary: [Rust] Remove arrow flight from parquet's feature gating Key: ARROW-9608 URL: https://issues.apache.org/jira/browse/ARROW-9608 Project: Apache Arrow Issue Type: Improvement Components: Rust Affects Versions: 1.0.0 Reporter: Mahmut Bulut Assignee: Mahmut Bulut Currently, the parquet is installing arrow-flight and it's dependencies, which breaks the CI builds and it's unnecessary because it is not used. Parquet should work without any default features by default. Simple PR will enable building it leaner. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (ARROW-9582) [Rust] Implement Array::memory_size()
[ https://issues.apache.org/jira/browse/ARROW-9582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17166496#comment-17166496 ] Mahmut Bulut edited comment on ARROW-9582 at 7/28/20, 3:16 PM: --- [~andygrove] I have already a handy code for this one. I can open a pr adapting that. was (Author: vertexclique): [~andygrove] I have already a handy code for this one. I can hand open a pr adapting that. > [Rust] Implement Array::memory_size() > - > > Key: ARROW-9582 > URL: https://issues.apache.org/jira/browse/ARROW-9582 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust >Reporter: Andy Grove >Assignee: Andy Grove >Priority: Major > > I would like to be able to determine how much memory is being used by Arrow > Arrays so that I can better monitor and report on memory usage when profiling > and tuning code. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (ARROW-9582) [Rust] Implement Array::memory_size()
[ https://issues.apache.org/jira/browse/ARROW-9582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17166496#comment-17166496 ] Mahmut Bulut edited comment on ARROW-9582 at 7/28/20, 3:16 PM: --- [~andygrove] I have a snippet for this one. I can open a pr adapting that. was (Author: vertexclique): [~andygrove] I have already a handy code for this one. I can open a pr adapting that. > [Rust] Implement Array::memory_size() > - > > Key: ARROW-9582 > URL: https://issues.apache.org/jira/browse/ARROW-9582 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust >Reporter: Andy Grove >Assignee: Andy Grove >Priority: Major > > I would like to be able to determine how much memory is being used by Arrow > Arrays so that I can better monitor and report on memory usage when profiling > and tuning code. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (ARROW-9582) [Rust] Implement Array::memory_size()
[ https://issues.apache.org/jira/browse/ARROW-9582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17166496#comment-17166496 ] Mahmut Bulut edited comment on ARROW-9582 at 7/28/20, 3:16 PM: --- [~andygrove] I have already a handy code for this one. I can hand open a pr adapting that. was (Author: vertexclique): I have already a handy code for this one. I can hand open a pr adapting that. > [Rust] Implement Array::memory_size() > - > > Key: ARROW-9582 > URL: https://issues.apache.org/jira/browse/ARROW-9582 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust >Reporter: Andy Grove >Assignee: Andy Grove >Priority: Major > > I would like to be able to determine how much memory is being used by Arrow > Arrays so that I can better monitor and report on memory usage when profiling > and tuning code. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-9582) [Rust] Implement Array::memory_size()
[ https://issues.apache.org/jira/browse/ARROW-9582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17166496#comment-17166496 ] Mahmut Bulut commented on ARROW-9582: - I have already a handy code for this one. I can hand open a pr adapting that. > [Rust] Implement Array::memory_size() > - > > Key: ARROW-9582 > URL: https://issues.apache.org/jira/browse/ARROW-9582 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust >Reporter: Andy Grove >Assignee: Andy Grove >Priority: Major > > I would like to be able to determine how much memory is being used by Arrow > Arrays so that I can better monitor and report on memory usage when profiling > and tuning code. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-8480) [Rust] There is no check for allocation failure
[ https://issues.apache.org/jira/browse/ARROW-8480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17156899#comment-17156899 ] Mahmut Bulut commented on ARROW-8480: - Suggested API can't be used until it stabilizes. So leaving this open until it stabilizes. Tracking issue: [https://github.com/rust-lang/rust/issues/32838] > [Rust] There is no check for allocation failure > --- > > Key: ARROW-8480 > URL: https://issues.apache.org/jira/browse/ARROW-8480 > Project: Apache Arrow > Issue Type: Bug > Components: Rust >Reporter: Paddy Horan >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > Reported by bluss on Github: > [https://github.com/rust-ndarray/ndarray/issues/771] > > "What I can see, there is no check for allocation success, so any buffer can > be created with a null pointer, which leads to soundness problems in most > methods. Best look into using {{std::alloc::handle_alloc_error}} or > alternatives. (This problem means that the mutablebuffer is not a safe > abstraction, and it should preferably not be exposed as public API like > this.)" -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-8480) [Rust] There is no check for allocation failure
[ https://issues.apache.org/jira/browse/ARROW-8480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17156893#comment-17156893 ] Mahmut Bulut commented on ARROW-8480: - Workaround for the first set of allocation related considerations: [https://github.com/apache/arrow/pull/7734] > [Rust] There is no check for allocation failure > --- > > Key: ARROW-8480 > URL: https://issues.apache.org/jira/browse/ARROW-8480 > Project: Apache Arrow > Issue Type: Bug > Components: Rust >Reporter: Paddy Horan >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > Reported by bluss on Github: > [https://github.com/rust-ndarray/ndarray/issues/771] > > "What I can see, there is no check for allocation success, so any buffer can > be created with a null pointer, which leads to soundness problems in most > methods. Best look into using {{std::alloc::handle_alloc_error}} or > alternatives. (This problem means that the mutablebuffer is not a safe > abstraction, and it should preferably not be exposed as public API like > this.)" -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-9275) [Rust] – Async Sans IO: R/W into/to Arrow Arrays
[ https://issues.apache.org/jira/browse/ARROW-9275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17150943#comment-17150943 ] Mahmut Bulut commented on ARROW-9275: - Yes, exactly Neville, so users can choose whatever they want to incorporate in their workloads, which enables plenty of projects with different workloads, scenarios, etc. And yes again, I feel like there should be a collaborative effort together to add APIs around crates. Spans a little wider than other tickets. Sure! I will send a similar email with similar content of this ticket. Tagging `[Rust]`. Thanks for the feedback, will send a mail asap. > [Rust] – Async Sans IO: R/W into/to Arrow Arrays > > > Key: ARROW-9275 > URL: https://issues.apache.org/jira/browse/ARROW-9275 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust >Reporter: Mahmut Bulut >Assignee: Mahmut Bulut >Priority: Major > > This issue can be considered an epic level that spans across other arrow > projects. > *Drill down* > Currently, traits like `ParquetReader` only allow synchronous interface which > uses BufReader having 8KB constant buffer. Over the network, this becomes a > problem. This can be easily solvable with differential buffers. In addition > to this shortage, there is a problem of executor engine is needed to schedule > from async trait methods to sync trait methods which should sit somewhere in > between to make requests asynchronous to external IO. On-disk IO is > acceptable with the approach we currently have since no reliable evented IO > exists for on-disk IO on major platforms. > All these considered abstractions that will expose asynchronous IO without > any side from executors, needs to be exposed. > > *Design Suggestions & Considerations* > The design should apply and consider: > * Sans IO, (for more information about Sans approach please see > [https://sans-io.readthedocs.io/] ) > * Not including any executor specific data, at all. > * Tests should work with any executor with little to no modification. > * Buffers are adjusted accordingly and use differential buffers to optimize > network trips. > * Sync IO shouldn't be touched. At all costs. If we try to unify Sync IO > traits or we do overlapping implementation, that will make our life harder in > the future. Sans IO should be compartmentalized. > > *Notes* > If Sans approach is not taken, the project will: > * use an extreme amount of dependencies. > * be not compatible with other Rust code at all. > * break currently working code uses array ingestions. > * integrations tests are going to be harder. > * it will really hard to adapt to completion-based APIs stabilize in the > future. (in the user projects) > * this suggestion is not about the flight format or any flight-related > information atm. This is purely making on-disk, remote IO (provider backends > like AWS etc.) async. > > *Open points* > A couple of open points: > * Identifying traits that are going to be asyncized. > * Designing internal routines. > * package name to expose. > * Gather traits into the designated packages in all file formats. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-9275) [Rust] – Async Sans IO: R/W into/to Arrow Arrays
[ https://issues.apache.org/jira/browse/ARROW-9275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahmut Bulut updated ARROW-9275: Description: This issue can be considered an epic level that spans across other arrow projects. *Drill down* Currently, traits like `ParquetReader` only allow synchronous interface which uses BufReader having 8KB constant buffer. Over the network, this becomes a problem. This can be easily solvable with differential buffers. In addition to this shortage, there is a problem of executor engine is needed to schedule from async trait methods to sync trait methods which should sit somewhere in between to make requests asynchronous to external IO. On-disk IO is acceptable with the approach we currently have since no reliable evented IO exists for on-disk IO on major platforms. All these considered abstractions that will expose asynchronous IO without any side from executors, needs to be exposed. *Design Suggestions & Considerations* The design should apply and consider: * Sans IO, (for more information about Sans approach please see [https://sans-io.readthedocs.io/] ) * Not including any executor specific data, at all. * Tests should work with any executor with little to no modification. * Buffers are adjusted accordingly and use differential buffers to optimize network trips. * Sync IO shouldn't be touched. At all costs. If we try to unify Sync IO traits or we do overlapping implementation, that will make our life harder in the future. Sans IO should be compartmentalized. *Notes* If Sans approach is not taken, the project will: * use an extreme amount of dependencies. * be not compatible with other Rust code at all. * break currently working code uses array ingestions. * integrations tests are going to be harder. * it will really hard to adapt to completion-based APIs stabilize in the future. (in the user projects) * this suggestion is not about the flight format or any flight-related information atm. This is purely making on-disk, remote IO (provider backends like AWS etc.) async. *Open points* A couple of open points: * Identifying traits that are going to be asyncized. * Designing internal routines. * package name to expose. * Gather traits into the designated packages in all file formats. was: This issue can be considered an epic level that spans across other arrow projects. *Drill down* Currently, traits like `ParquetReader` only allow synchronous interface which uses BufReader having 8KB constant buffer. Over the network, this becomes a problem. This can be easily solvable with differential buffers. In addition to this shortage, there is a problem of executor engine is needed to schedule from async trait methods to sync trait methods which should sit somewhere in between to make requests asynchronous to external IO. On-disk IO is acceptable with the approach we currently have since no reliable evented IO exists for on-disk IO on major platforms. All these considered abstractions that will expose asynchronous IO without any side from executors, needs to be exposed. *Design Suggestions & Considerations* The design should apply and consider: * Sans IO, (for more information about Sans approach please see [https://sans-io.readthedocs.io/] ) * Not including any executor specific data, at all. * Tests should work with any executor with little to no modification. * Buffers are adjusted accordingly and use differential buffers to optimize network trips. * Sync IO shouldn't be touched. At all costs. If we try to unify Sync IO traits or we do overlapping implementation, that will make our life harder in the future. Sans IO should be compartmentalized. *Notes* If Sans approach is not taken, the project will: * use an extreme amount of dependencies. * be not compatible with other Rust code at all. * break currently working code uses array ingestions. * integrations tests are going to be harder. * it will really hard to adapt to completion-based APIs stabilize in the future. (in the user projects) * this suggestion is not about the in-flight format or any in-flight related information atm. This is purely making on-disk, remote IO (provider backends like AWS etc.) async. *Open points* A couple of open points: * Identifying traits that are going to be asyncized. * Designing internal routines. * package name to expose. * Gather traits into the designated packages in all file formats. > [Rust] – Async Sans IO: R/W into/to Arrow Arrays > > > Key: ARROW-9275 > URL: https://issues.apache.org/jira/browse/ARROW-9275 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust >Reporter: Mahmut Bulut >Assignee: Mahmut Bulut >Priority: Major > > This issue can be considered an
[jira] [Commented] (ARROW-9275) [Rust] – Async Sans IO: R/W into/to Arrow Arrays
[ https://issues.apache.org/jira/browse/ARROW-9275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17148510#comment-17148510 ] Mahmut Bulut commented on ARROW-9275: - [~nevi_me], [~andygrove], [~paddyhoran] I need input for this from you if possible. > [Rust] – Async Sans IO: R/W into/to Arrow Arrays > > > Key: ARROW-9275 > URL: https://issues.apache.org/jira/browse/ARROW-9275 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust >Reporter: Mahmut Bulut >Assignee: Mahmut Bulut >Priority: Major > > This issue can be considered an epic level that spans across other arrow > projects. > *Drill down* > Currently, traits like `ParquetReader` only allow synchronous interface which > uses BufReader having 8KB constant buffer. Over the network, this becomes a > problem. This can be easily solvable with differential buffers. In addition > to this shortage, there is a problem of executor engine is needed to schedule > from async trait methods to sync trait methods which should sit somewhere in > between to make requests asynchronous to external IO. On-disk IO is > acceptable with the approach we currently have since no reliable evented IO > exists for on-disk IO on major platforms. > All these considered abstractions that will expose asynchronous IO without > any side from executors, needs to be exposed. > > *Design Suggestions & Considerations* > The design should apply and consider: > * Sans IO, (for more information about Sans approach please see > [https://sans-io.readthedocs.io/] ) > * Not including any executor specific data, at all. > * Tests should work with any executor with little to no modification. > * Buffers are adjusted accordingly and use differential buffers to optimize > network trips. > * Sync IO shouldn't be touched. At all costs. If we try to unify Sync IO > traits or we do overlapping implementation, that will make our life harder in > the future. Sans IO should be compartmentalized. > > *Notes* > If Sans approach is not taken, the project will: > * use an extreme amount of dependencies. > * be not compatible with other Rust code at all. > * break currently working code uses array ingestions. > * integrations tests are going to be harder. > * it will really hard to adapt to completion-based APIs stabilize in the > future. (in the user projects) > * this suggestion is not about the in-flight format or any in-flight related > information atm. This is purely making on-disk, remote IO (provider backends > like AWS etc.) async. > > *Open points* > A couple of open points: > * Identifying traits that are going to be asyncized. > * Designing internal routines. > * package name to expose. > * Gather traits into the designated packages in all file formats. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-9275) [Rust] – Async Sans IO: R/W into/to Arrow Arrays
Mahmut Bulut created ARROW-9275: --- Summary: [Rust] – Async Sans IO: R/W into/to Arrow Arrays Key: ARROW-9275 URL: https://issues.apache.org/jira/browse/ARROW-9275 Project: Apache Arrow Issue Type: Improvement Components: Rust Reporter: Mahmut Bulut Assignee: Mahmut Bulut This issue can be considered an epic level that spans across other arrow projects. *Drill down* Currently, traits like `ParquetReader` only allow synchronous interface which uses BufReader having 8KB constant buffer. Over the network, this becomes a problem. This can be easily solvable with differential buffers. In addition to this shortage, there is a problem of executor engine is needed to schedule from async trait methods to sync trait methods which should sit somewhere in between to make requests asynchronous to external IO. On-disk IO is acceptable with the approach we currently have since no reliable evented IO exists for on-disk IO on major platforms. All these considered abstractions that will expose asynchronous IO without any side from executors, needs to be exposed. *Design Suggestions & Considerations* The design should apply and consider: * Sans IO, (for more information about Sans approach please see [https://sans-io.readthedocs.io/] ) * Not including any executor specific data, at all. * Tests should work with any executor with little to no modification. * Buffers are adjusted accordingly and use differential buffers to optimize network trips. * Sync IO shouldn't be touched. At all costs. If we try to unify Sync IO traits or we do overlapping implementation, that will make our life harder in the future. Sans IO should be compartmentalized. *Notes* If Sans approach is not taken, the project will: * use an extreme amount of dependencies. * be not compatible with other Rust code at all. * break currently working code uses array ingestions. * integrations tests are going to be harder. * it will really hard to adapt to completion-based APIs stabilize in the future. (in the user projects) * this suggestion is not about the in-flight format or any in-flight related information atm. This is purely making on-disk, remote IO (provider backends like AWS etc.) async. *Open points* A couple of open points: * Identifying traits that are going to be asyncized. * Designing internal routines. * package name to expose. * Gather traits into the designated packages in all file formats. -- This message was sent by Atlassian Jira (v8.3.4#803005)