[GitHub] [arrow] Dandandan edited a comment on pull request #9271: ARROW-11300: [Rust][DataFusion] Further performance improvements on hash aggregation with small groups

2021-01-23 Thread GitBox
Dandandan edited a comment on pull request #9271: URL: https://github.com/apache/arrow/pull/9271#issuecomment-765952965 @jorgecarleitao Thanks, that it a great summary of the situation! I think removing the `Vec` sounds like it might improve the "metadata overhead", maybe there are

[GitHub] [arrow] github-actions[bot] commented on pull request #9301: ARROW-11357: [Rust]: Fix out-of-bounds reads in `take` and other undefined behavior

2021-01-23 Thread GitBox
github-actions[bot] commented on pull request #9301: URL: https://github.com/apache/arrow/pull/9301#issuecomment-766087862 https://issues.apache.org/jira/browse/ARROW-11357 This is an automated message from the Apache Git

[GitHub] [arrow] jorgecarleitao opened a new pull request #9301: ARROW-11357: [Rust]: Fix out-of-bounds reads in `take` and other undefined behavior

2021-01-23 Thread GitBox
jorgecarleitao opened a new pull request #9301: URL: https://github.com/apache/arrow/pull/9301 This PR fixes two major issues in our `take` kernel for primitive arrays. Background * When using `values()` from an array, it is important to remember that only certain values are

[GitHub] [arrow] alamb commented on pull request #9230: ARROW-11022: [Rust] Upgrade to Tokio 1.0

2021-01-23 Thread GitBox
alamb commented on pull request #9230: URL: https://github.com/apache/arrow/pull/9230#issuecomment-765906871 @houqp -- I think the instructions to run the tests locally are here: https://github.com/apache/arrow/blob/master/docs/source/format/Integration.rst @carols10cents or

[GitHub] [arrow] Dandandan commented on a change in pull request #9293: ARROW-11349: [Rust] Add from_iter_values to create arrays from (non null) values

2021-01-23 Thread GitBox
Dandandan commented on a change in pull request #9293: URL: https://github.com/apache/arrow/pull/9293#discussion_r563150837 ## File path: rust/arrow/src/array/array_primitive.rs ## @@ -94,6 +94,32 @@ impl PrimitiveArray { let offset = i + self.offset();

[GitHub] [arrow] eladroz edited a comment on pull request #9285: ARROW-10349: [Python] Build and publish aarch64 wheels [WIP]

2021-01-23 Thread GitBox
eladroz edited a comment on pull request #9285: URL: https://github.com/apache/arrow/pull/9285#issuecomment-766086589 @xhochy @kszucs hi, I've published my experience with building on AWS arm64 VM's

[GitHub] [arrow] jorgecarleitao commented on a change in pull request #9301: ARROW-11357: [Rust]: Fix out-of-bounds reads in `take` and other undefined behavior

2021-01-23 Thread GitBox
jorgecarleitao commented on a change in pull request #9301: URL: https://github.com/apache/arrow/pull/9301#discussion_r563160480 ## File path: rust/arrow/src/compute/kernels/take.rs ## @@ -1350,20 +1532,32 @@ mod tests { } #[test] -#[should_panic( -

[GitHub] [arrow] jorgecarleitao commented on a change in pull request #9301: ARROW-11357: [Rust]: Fix out-of-bounds reads in `take` and other undefined behavior

2021-01-23 Thread GitBox
jorgecarleitao commented on a change in pull request #9301: URL: https://github.com/apache/arrow/pull/9301#discussion_r563160450 ## File path: rust/arrow/src/compute/kernels/take.rs ## @@ -1350,20 +1532,32 @@ mod tests { } #[test] -#[should_panic( -

[GitHub] [arrow] codecov-io edited a comment on pull request #9301: ARROW-11357: [Rust]: Fix out-of-bounds reads in `take` and other undefined behavior

2021-01-23 Thread GitBox
codecov-io edited a comment on pull request #9301: URL: https://github.com/apache/arrow/pull/9301#issuecomment-766090110 # [Codecov](https://codecov.io/gh/apache/arrow/pull/9301?src=pr=h1) Report > Merging [#9301](https://codecov.io/gh/apache/arrow/pull/9301?src=pr=desc) (d3e2454) into

[GitHub] [arrow] tyrelr commented on a change in pull request #9215: ARROW-11270: [Rust] Array slice accessors

2021-01-23 Thread GitBox
tyrelr commented on a change in pull request #9215: URL: https://github.com/apache/arrow/pull/9215#discussion_r563168268 ## File path: rust/arrow/src/array/array_string.rs ## @@ -50,49 +50,65 @@ pub struct GenericStringArray { } impl GenericStringArray { -/// Returns

[GitHub] [arrow] alamb commented on a change in pull request #9293: ARROW-11349: [Rust] Add from_iter_values to create arrays from (non null) values

2021-01-23 Thread GitBox
alamb commented on a change in pull request #9293: URL: https://github.com/apache/arrow/pull/9293#discussion_r563139376 ## File path: rust/arrow/src/array/array_primitive.rs ## @@ -94,6 +94,32 @@ impl PrimitiveArray { let offset = i + self.offset(); unsafe {

[GitHub] [arrow] Dandandan commented on pull request #9271: ARROW-11300: [Rust][DataFusion] Further performance improvements on hash aggregation with small groups

2021-01-23 Thread GitBox
Dandandan commented on pull request #9271: URL: https://github.com/apache/arrow/pull/9271#issuecomment-765952965 @jorgecarleitao Thanks, that it a great summary of the situation! I think removing the `Vec` sounds like it might improve the "metadata overhead", maybe there are some

[GitHub] [arrow] eladroz commented on pull request #9285: ARROW-10349: [Python] Build and publish aarch64 wheels [WIP]

2021-01-23 Thread GitBox
eladroz commented on pull request #9285: URL: https://github.com/apache/arrow/pull/9285#issuecomment-766086589 @xhochy hi, I've published my experience with building on AWS arm64 VM's

[GitHub] [arrow] jorgecarleitao commented on pull request #9301: ARROW-11357: [Rust]: Fix out-of-bounds reads in `take` and other undefined behavior

2021-01-23 Thread GitBox
jorgecarleitao commented on pull request #9301: URL: https://github.com/apache/arrow/pull/9301#issuecomment-766088562 On a side note, this explains why `take` was always extremely fast when I tried to replace it by `MutableArray`: `MutableArray` was safe xD On another side note,

[GitHub] [arrow] jorgecarleitao edited a comment on pull request #9301: ARROW-11357: [Rust]: Fix out-of-bounds reads in `take` and other undefined behavior

2021-01-23 Thread GitBox
jorgecarleitao edited a comment on pull request #9301: URL: https://github.com/apache/arrow/pull/9301#issuecomment-766088562 On a side note, this explains why I was unable to beat `take` when I tried to replace it by `MutableArray`. On another side note, this just re-inforces the

[GitHub] [arrow] jorgecarleitao commented on pull request #9301: ARROW-11357: [Rust]: Fix out-of-bounds reads in `take` and other undefined behavior

2021-01-23 Thread GitBox
jorgecarleitao commented on pull request #9301: URL: https://github.com/apache/arrow/pull/9301#issuecomment-766092621 This same behavior also exists in the boolean take :/ This is an automated message from the Apache Git

[GitHub] [arrow] mqy opened a new pull request #9299: ARROW-11179: [Format] Make FB comments friendly to rust

2021-01-23 Thread GitBox
mqy opened a new pull request #9299: URL: https://github.com/apache/arrow/pull/9299 ### Problem Currently, comments in FB files are directly copied to rust and c++ source codes. That's great but `rust` suffers from the generated codes, for example: - array element `abc[1]` or

[GitHub] [arrow] Dandandan commented on a change in pull request #9293: ARROW-11349: [Rust] Add from_iter_values to create arrays from (non null) values

2021-01-23 Thread GitBox
Dandandan commented on a change in pull request #9293: URL: https://github.com/apache/arrow/pull/9293#discussion_r563147945 ## File path: rust/arrow/src/array/array_primitive.rs ## @@ -94,6 +94,32 @@ impl PrimitiveArray { let offset = i + self.offset();

[GitHub] [arrow] jorgecarleitao commented on a change in pull request #9215: ARROW-11270: [Rust] Array slice accessors

2021-01-23 Thread GitBox
jorgecarleitao commented on a change in pull request #9215: URL: https://github.com/apache/arrow/pull/9215#discussion_r563153376 ## File path: rust/arrow/src/array/array_string.rs ## @@ -50,49 +50,65 @@ pub struct GenericStringArray { } impl GenericStringArray { -///

[GitHub] [arrow] github-actions[bot] commented on pull request #9298: ARROW-11355: [Rust] Aligned Date DataType with specification.

2021-01-23 Thread GitBox
github-actions[bot] commented on pull request #9298: URL: https://github.com/apache/arrow/pull/9298#issuecomment-765887237 https://issues.apache.org/jira/browse/ARROW-11355 This is an automated message from the Apache Git

[GitHub] [arrow] github-actions[bot] removed a comment on pull request #9298: ARROW-11355: [Rust] Aligned Date DataType with specification.

2021-01-23 Thread GitBox
github-actions[bot] removed a comment on pull request #9298: URL: https://github.com/apache/arrow/pull/9298#issuecomment-765887029 Thanks for opening a pull request! Could you open an issue for this pull request on JIRA? https://issues.apache.org/jira/browse/ARROW

[GitHub] [arrow] github-actions[bot] commented on pull request #9299: ARROW-11179: [Format] Make FB comments friendly to rust

2021-01-23 Thread GitBox
github-actions[bot] commented on pull request #9299: URL: https://github.com/apache/arrow/pull/9299#issuecomment-765891515 https://issues.apache.org/jira/browse/ARROW-11179 This is an automated message from the Apache Git

[GitHub] [arrow] alamb closed pull request #9235: ARROW-11291: [Rust] Add extend to MutableBuffer (-20% for arithmetic, -97% for length)

2021-01-23 Thread GitBox
alamb closed pull request #9235: URL: https://github.com/apache/arrow/pull/9235 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] alamb commented on pull request #9298: ARROW-11355: [Rust] Aligned Date DataType with specification.

2021-01-23 Thread GitBox
alamb commented on pull request #9298: URL: https://github.com/apache/arrow/pull/9298#issuecomment-765910208 I wonder how this would affect @maxburke and his usecase overly negatively This is an automated message from

[GitHub] [arrow] Dandandan opened a new pull request #9300: ARROW-11356: [Rust] Add method to PrimitiveArra::from_value to efficiently create an Array from a value

2021-01-23 Thread GitBox
Dandandan opened a new pull request #9300: URL: https://github.com/apache/arrow/pull/9300 In DataFusion (and probably this would pop up in more use cases) we want to efficiently create an array with `n` items with the same value (instead of creating it from a Vec of values or iterator).

[GitHub] [arrow] github-actions[bot] commented on pull request #9300: ARROW-11356: [Rust] Add method to PrimitiveArra::from_value to efficiently create an Array from a value

2021-01-23 Thread GitBox
github-actions[bot] commented on pull request #9300: URL: https://github.com/apache/arrow/pull/9300#issuecomment-765925175 https://issues.apache.org/jira/browse/ARROW-11356 This is an automated message from the Apache Git

[GitHub] [arrow] codecov-io edited a comment on pull request #9293: ARROW-11349: [Rust] Add from_iter_values to create arrays from (non null) values

2021-01-23 Thread GitBox
codecov-io edited a comment on pull request #9293: URL: https://github.com/apache/arrow/pull/9293#issuecomment-765978775 # [Codecov](https://codecov.io/gh/apache/arrow/pull/9293?src=pr=h1) Report > Merging [#9293](https://codecov.io/gh/apache/arrow/pull/9293?src=pr=desc) (a37941c) into

[GitHub] [arrow] codecov-io edited a comment on pull request #9301: ARROW-11357: [Rust]: Fix out-of-bounds reads in `take` and other undefined behavior

2021-01-23 Thread GitBox
codecov-io edited a comment on pull request #9301: URL: https://github.com/apache/arrow/pull/9301#issuecomment-766090110 # [Codecov](https://codecov.io/gh/apache/arrow/pull/9301?src=pr=h1) Report > Merging [#9301](https://codecov.io/gh/apache/arrow/pull/9301?src=pr=desc) (64309b6) into

[GitHub] [arrow] github-actions[bot] commented on pull request #9302: ARROW-11358: [Rust] Add benchmark for concatenating small arrays

2021-01-23 Thread GitBox
github-actions[bot] commented on pull request #9302: URL: https://github.com/apache/arrow/pull/9302#issuecomment-766094069 https://issues.apache.org/jira/browse/ARROW-11358 This is an automated message from the Apache Git

[GitHub] [arrow] alamb commented on pull request #9264: ARROW-11319: [Rust] [DataFusion] Improve test comparisons to record batch, remove test::format_batch

2021-01-23 Thread GitBox
alamb commented on pull request #9264: URL: https://github.com/apache/arrow/pull/9264#issuecomment-765908175 Rebasing to resolve a logical conflict (a new test that was added) This is an automated message from the Apache Git

[GitHub] [arrow] codecov-io commented on pull request #9264: ARROW-11319: [Rust] [DataFusion] Improve test comparisons to record batch, remove test::format_batch

2021-01-23 Thread GitBox
codecov-io commented on pull request #9264: URL: https://github.com/apache/arrow/pull/9264#issuecomment-765910633 # [Codecov](https://codecov.io/gh/apache/arrow/pull/9264?src=pr=h1) Report > Merging [#9264](https://codecov.io/gh/apache/arrow/pull/9264?src=pr=desc) (363e1f4) into

[GitHub] [arrow] alamb closed pull request #9264: ARROW-11319: [Rust] [DataFusion] Improve test comparisons to record batch, remove test::format_batch

2021-01-23 Thread GitBox
alamb closed pull request #9264: URL: https://github.com/apache/arrow/pull/9264 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] github-actions[bot] commented on pull request #9264: ARROW-11319: [Rust] [DataFusion] Improve test comparisons to record batch, remove test::format_batch

2021-01-23 Thread GitBox
github-actions[bot] commented on pull request #9264: URL: https://github.com/apache/arrow/pull/9264#issuecomment-765912101 https://issues.apache.org/jira/browse/ARROW-11319 This is an automated message from the Apache Git

[GitHub] [arrow] codecov-io commented on pull request #9293: ARROW-11349: [Rust] Add from_iter_values to create arrays from (non null) values

2021-01-23 Thread GitBox
codecov-io commented on pull request #9293: URL: https://github.com/apache/arrow/pull/9293#issuecomment-765978775 # [Codecov](https://codecov.io/gh/apache/arrow/pull/9293?src=pr=h1) Report > Merging [#9293](https://codecov.io/gh/apache/arrow/pull/9293?src=pr=desc) (941ee5d) into

[GitHub] [arrow] tyrelr commented on a change in pull request #9215: ARROW-11270: [Rust] Array slice accessors

2021-01-23 Thread GitBox
tyrelr commented on a change in pull request #9215: URL: https://github.com/apache/arrow/pull/9215#discussion_r563166348 ## File path: rust/arrow/src/array/array_string.rs ## @@ -50,49 +50,65 @@ pub struct GenericStringArray { } impl GenericStringArray { -/// Returns

[GitHub] [arrow] tyrelr commented on pull request #9215: ARROW-11270: [Rust] Array slice accessors

2021-01-23 Thread GitBox
tyrelr commented on pull request #9215: URL: https://github.com/apache/arrow/pull/9215#issuecomment-766105862 current 10% perf impact looks like: ``` critcmp master-499b6d0c array-slice-41c29bc4 -t 10 grouparray-slice-41c29bc4

[GitHub] [arrow] github-actions[bot] commented on pull request #9298: Aligned Date DataType with specification.

2021-01-23 Thread GitBox
github-actions[bot] commented on pull request #9298: URL: https://github.com/apache/arrow/pull/9298#issuecomment-765887029 Thanks for opening a pull request! Could you open an issue for this pull request on JIRA? https://issues.apache.org/jira/browse/ARROW Then

[GitHub] [arrow] jorgecarleitao opened a new pull request #9298: Aligned Date DataType with specification.

2021-01-23 Thread GitBox
jorgecarleitao opened a new pull request #9298: URL: https://github.com/apache/arrow/pull/9298 The arrow spec states that `Date32` represents a date in number of days, and `Date64` represents a date in milliseconds. However, our current implementation allows `Date32(Milliseconds)`

[GitHub] [arrow] codecov-io edited a comment on pull request #9297: ARROW-11354: [Rust] Speed-up cast of dates and times (2-4x)

2021-01-23 Thread GitBox
codecov-io edited a comment on pull request #9297: URL: https://github.com/apache/arrow/pull/9297#issuecomment-765883730 # [Codecov](https://codecov.io/gh/apache/arrow/pull/9297?src=pr=h1) Report > Merging [#9297](https://codecov.io/gh/apache/arrow/pull/9297?src=pr=desc) (b209f72) into

[GitHub] [arrow] Dandandan edited a comment on pull request #9271: ARROW-11300: [Rust][DataFusion] Further performance improvements on hash aggregation with small groups

2021-01-23 Thread GitBox
Dandandan edited a comment on pull request #9271: URL: https://github.com/apache/arrow/pull/9271#issuecomment-765564334 This PR itself is ready for review. I think the performance for slicing small slices would be something to look at later.

[GitHub] [arrow] alamb commented on a change in pull request #9262: ARROW-11317: [Rust] Include the prettyprint feature in CI Coverage

2021-01-23 Thread GitBox
alamb commented on a change in pull request #9262: URL: https://github.com/apache/arrow/pull/9262#discussion_r563109574 ## File path: .github/workflows/rust.yml ## @@ -109,14 +109,14 @@ jobs: export CARGO_HOME="/github/home/.cargo" export

[GitHub] [arrow] jorgecarleitao commented on a change in pull request #9293: ARROW-11349: [Rust] Add from_iter_values to create arrays from (non null) values

2021-01-23 Thread GitBox
jorgecarleitao commented on a change in pull request #9293: URL: https://github.com/apache/arrow/pull/9293#discussion_r563148896 ## File path: rust/arrow/src/array/array_primitive.rs ## @@ -94,6 +94,32 @@ impl PrimitiveArray { let offset = i + self.offset();

[GitHub] [arrow] Dandandan opened a new pull request #9302: ARROW-11358: [Rust] Add benchmark for concatenating small arrays

2021-01-23 Thread GitBox
Dandandan opened a new pull request #9302: URL: https://github.com/apache/arrow/pull/9302 This is a common usecase in DataFusion, for coalescing & hash aggregates. This is an automated message from the Apache Git Service. To

[GitHub] [arrow] codecov-io commented on pull request #9298: ARROW-11355: [Rust] Aligned Date DataType with specification.

2021-01-23 Thread GitBox
codecov-io commented on pull request #9298: URL: https://github.com/apache/arrow/pull/9298#issuecomment-765890484 # [Codecov](https://codecov.io/gh/apache/arrow/pull/9298?src=pr=h1) Report > Merging [#9298](https://codecov.io/gh/apache/arrow/pull/9298?src=pr=desc) (f4d9ca2) into

[GitHub] [arrow] codecov-io edited a comment on pull request #9297: ARROW-11354: [Rust] Speed-up cast of dates and times (2-4x)

2021-01-23 Thread GitBox
codecov-io edited a comment on pull request #9297: URL: https://github.com/apache/arrow/pull/9297#issuecomment-765883730 # [Codecov](https://codecov.io/gh/apache/arrow/pull/9297?src=pr=h1) Report > Merging [#9297](https://codecov.io/gh/apache/arrow/pull/9297?src=pr=desc) (234ed1f) into

[GitHub] [arrow] codecov-io commented on pull request #9301: ARROW-11357: [Rust]: Fix out-of-bounds reads in `take` and other undefined behavior

2021-01-23 Thread GitBox
codecov-io commented on pull request #9301: URL: https://github.com/apache/arrow/pull/9301#issuecomment-766090110 # [Codecov](https://codecov.io/gh/apache/arrow/pull/9301?src=pr=h1) Report > Merging [#9301](https://codecov.io/gh/apache/arrow/pull/9301?src=pr=desc) (90bea5d) into

[GitHub] [arrow] Dandandan commented on a change in pull request #9301: ARROW-11357: [Rust]: Fix out-of-bounds reads in `take` and other undefined behavior

2021-01-23 Thread GitBox
Dandandan commented on a change in pull request #9301: URL: https://github.com/apache/arrow/pull/9301#discussion_r563171151 ## File path: rust/arrow/src/compute/kernels/take.rs ## @@ -269,64 +417,44 @@ fn take_primitive( ) -> Result> where T: ArrowPrimitiveType, -

[GitHub] [arrow] eladroz edited a comment on pull request #9285: ARROW-10349: [Python] Build and publish aarch64 wheels [WIP]

2021-01-23 Thread GitBox
eladroz edited a comment on pull request #9285: URL: https://github.com/apache/arrow/pull/9285#issuecomment-766086589 @xhochy @kszucs hi, I've published my experience with building on AWS arm64 VM's

[GitHub] [arrow] tyrelr commented on pull request #9215: ARROW-11270: [Rust] Array slice accessors

2021-01-23 Thread GitBox
tyrelr commented on pull request #9215: URL: https://github.com/apache/arrow/pull/9215#issuecomment-766153344 The 10% performance impact looks like mostly improvements: ``` critcmp master-499b6d0c array-slice-78f360b7c -t 10 group array-slice-78f360b7c

[GitHub] [arrow] codecov-io edited a comment on pull request #9215: ARROW-11270: [Rust] Array slice accessors

2021-01-23 Thread GitBox
codecov-io edited a comment on pull request #9215: URL: https://github.com/apache/arrow/pull/9215#issuecomment-761621256 # [Codecov](https://codecov.io/gh/apache/arrow/pull/9215?src=pr=h1) Report > Merging [#9215](https://codecov.io/gh/apache/arrow/pull/9215?src=pr=desc) (78f360b) into

[GitHub] [arrow] jorgecarleitao commented on pull request #9215: ARROW-11270: [Rust] Array slice accessors

2021-01-23 Thread GitBox
jorgecarleitao commented on pull request #9215: URL: https://github.com/apache/arrow/pull/9215#issuecomment-766158069 @tyrelr , I believe that you may have to run the benches against the latest master vs your branch rebased, for consistency: 499b6d0c is before a significant change to the

[GitHub] [arrow] codecov-io edited a comment on pull request #9303: ARROW-11360: [Rust] [DataFusion] Improve CSV "No files found" error message

2021-01-23 Thread GitBox
codecov-io edited a comment on pull request #9303: URL: https://github.com/apache/arrow/pull/9303#issuecomment-766158982 # [Codecov](https://codecov.io/gh/apache/arrow/pull/9303?src=pr=h1) Report > Merging [#9303](https://codecov.io/gh/apache/arrow/pull/9303?src=pr=desc) (eb38e3e) into

[GitHub] [arrow] tyrelr commented on a change in pull request #9304: ARROW-11361: [Rust] Build MutableBuffer/Buffer from iterator of bools

2021-01-23 Thread GitBox
tyrelr commented on a change in pull request #9304: URL: https://github.com/apache/arrow/pull/9304#discussion_r563185759 ## File path: rust/arrow/src/buffer.rs ## @@ -241,6 +241,67 @@ impl> From for Buffer { } } +/// Creating a `Buffer` instance by storing the boolean

[GitHub] [arrow] jorgecarleitao commented on a change in pull request #9301: ARROW-11357: [Rust]: Fix out-of-bounds reads in `take` and other undefined behavior

2021-01-23 Thread GitBox
jorgecarleitao commented on a change in pull request #9301: URL: https://github.com/apache/arrow/pull/9301#discussion_r563186250 ## File path: rust/arrow/src/compute/kernels/take.rs ## @@ -254,6 +254,154 @@ impl Default for TakeOptions { } } +#[inline] +fn maybe_take(

[GitHub] [arrow] codecov-io edited a comment on pull request #9303: ARROW-11360: [Rust] [DataFusion] Improve CSV "No files found" error message

2021-01-23 Thread GitBox
codecov-io edited a comment on pull request #9303: URL: https://github.com/apache/arrow/pull/9303#issuecomment-766158982 # [Codecov](https://codecov.io/gh/apache/arrow/pull/9303?src=pr=h1) Report > Merging [#9303](https://codecov.io/gh/apache/arrow/pull/9303?src=pr=desc) (3f3c08e) into

[GitHub] [arrow] jorgecarleitao commented on pull request #9301: ARROW-11357: [Rust]: Fix out-of-bounds reads in `take` and other undefined behavior

2021-01-23 Thread GitBox
jorgecarleitao commented on pull request #9301: URL: https://github.com/apache/arrow/pull/9301#issuecomment-766163142 Now I get your point. On the PR description, I mention to _independent_ sources of unsoundness: * we are using indices from null slots * we are not performing

[GitHub] [arrow] codecov-io edited a comment on pull request #9304: ARROW-11361: [Rust] Build MutableBuffer/Buffer from iterator of bools

2021-01-23 Thread GitBox
codecov-io edited a comment on pull request #9304: URL: https://github.com/apache/arrow/pull/9304#issuecomment-766160749 # [Codecov](https://codecov.io/gh/apache/arrow/pull/9304?src=pr=h1) Report > Merging [#9304](https://codecov.io/gh/apache/arrow/pull/9304?src=pr=desc) (466c41b) into

[GitHub] [arrow] tyrelr commented on pull request #9215: ARROW-11270: [Rust] Array slice accessors

2021-01-23 Thread GitBox
tyrelr commented on pull request #9215: URL: https://github.com/apache/arrow/pull/9215#issuecomment-766288246 I've rebased locally and will be running benchmarks overnight. This is an automated message from the Apache Git

[GitHub] [arrow] BryanCutler commented on pull request #9187: ARROW-11223: [Java] Fix: BaseVariableWidthVector/BaseLargeVariableWidthVector setNull() and getBufferSizeFor() trigger offset buffer overf

2021-01-23 Thread GitBox
BryanCutler commented on pull request #9187: URL: https://github.com/apache/arrow/pull/9187#issuecomment-766152565 I think the intention of `getBufferSizeFor(final int valueCount)` is to provide an estimated buffer size of the vector and it doesn't make sense that the vector should have

[GitHub] [arrow] jorgecarleitao commented on pull request #9301: ARROW-11357: [Rust]: Fix out-of-bounds reads in `take` and other undefined behavior

2021-01-23 Thread GitBox
jorgecarleitao commented on pull request #9301: URL: https://github.com/apache/arrow/pull/9301#issuecomment-766157047 @Dandandan , > Wouldn't it be possible to have a similar as previously implementation for `take_no_nulls` and `take_values_nulls`? I think those are the most common

[GitHub] [arrow] tyrelr commented on a change in pull request #9304: ARROW-11361: [Rust] Build MutableBuffer/Buffer from iterator of bools

2021-01-23 Thread GitBox
tyrelr commented on a change in pull request #9304: URL: https://github.com/apache/arrow/pull/9304#discussion_r563185541 ## File path: rust/arrow/src/buffer.rs ## @@ -1188,6 +1260,57 @@ impl Drop for SetLenOnDrop<'_> { } } +/// Creating a `MutableBuffer` instance by

[GitHub] [arrow] andygrove commented on a change in pull request #9303: ARROW-11360: [Rust] [DataFusion] Improve CSV "No files found" error message

2021-01-23 Thread GitBox
andygrove commented on a change in pull request #9303: URL: https://github.com/apache/arrow/pull/9303#discussion_r563191456 ## File path: rust/arrow-flight/src/arrow.flight.protocol.rs ## @@ -498,9 +498,8 @@ pub mod flight_service_server { #[async_trait] pub trait

[GitHub] [arrow] jorgecarleitao commented on pull request #9304: ARROW-11361: [Rust] Build MutableBuffer/Buffer from iterator of bools

2021-01-23 Thread GitBox
jorgecarleitao commented on pull request #9304: URL: https://github.com/apache/arrow/pull/9304#issuecomment-766172368 Hi. This looks good. I tried this some time ago but was unable to make it efficient. Do you have the benchmarks for the comparison kernels?

[GitHub] [arrow] tyrelr commented on pull request #9304: ARROW-11361: [Rust] Build MutableBuffer/Buffer from iterator of bools

2021-01-23 Thread GitBox
tyrelr commented on pull request #9304: URL: https://github.com/apache/arrow/pull/9304#issuecomment-766189316 Big things I haven't tried that MIGHT give some gains would be: * manually unrolling the loop for the last 7 bits * relying on iter in the API (but that would require callers

[GitHub] [arrow] Dandandan opened a new pull request #9305: ARROW-11362:[Rust][DataFusion] Use iterator APIs in to_array_of_size to improve performance

2021-01-23 Thread GitBox
Dandandan opened a new pull request #9305: URL: https://github.com/apache/arrow/pull/9305 This function `to_array_of_size` is about 8.3% of instructions in the db-benchmark queries. This uses the PR https://github.com/apache/arrow/pull/9293 The case of converting an int32 to

[GitHub] [arrow] github-actions[bot] commented on pull request #9305: ARROW-11362:[Rust][DataFusion] Use iterator APIs in to_array_of_size to improve performance

2021-01-23 Thread GitBox
github-actions[bot] commented on pull request #9305: URL: https://github.com/apache/arrow/pull/9305#issuecomment-766195307 https://issues.apache.org/jira/browse/ARROW-11362 This is an automated message from the Apache Git

[GitHub] [arrow] github-actions[bot] commented on pull request #9303: ARROW-11360: [Rust] [DataFusion] Improve CSV "No files found" error message

2021-01-23 Thread GitBox
github-actions[bot] commented on pull request #9303: URL: https://github.com/apache/arrow/pull/9303#issuecomment-766155985 https://issues.apache.org/jira/browse/ARROW-11360 This is an automated message from the Apache Git

[GitHub] [arrow] jameswinegar opened a new pull request #9303: ARROW-11360: [Rust] [DataFusion] Improve CSV "No files found" error message

2021-01-23 Thread GitBox
jameswinegar opened a new pull request #9303: URL: https://github.com/apache/arrow/pull/9303 Helps with the debugging process. This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [arrow] Dandandan commented on pull request #9301: ARROW-11357: [Rust]: Fix out-of-bounds reads in `take` and other undefined behavior

2021-01-23 Thread GitBox
Dandandan commented on pull request #9301: URL: https://github.com/apache/arrow/pull/9301#issuecomment-766160716 > @Dandandan , > > > Wouldn't it be possible to have a similar as previously implementation for `take_no_nulls` and `take_values_nulls`? I think those are the most

[GitHub] [arrow] codecov-io commented on pull request #9304: ARROW-11361: [Rust] Build MutableBuffer/Buffer from iterator of bools

2021-01-23 Thread GitBox
codecov-io commented on pull request #9304: URL: https://github.com/apache/arrow/pull/9304#issuecomment-766160749 # [Codecov](https://codecov.io/gh/apache/arrow/pull/9304?src=pr=h1) Report > Merging [#9304](https://codecov.io/gh/apache/arrow/pull/9304?src=pr=desc) (0e75531) into

[GitHub] [arrow] jorgecarleitao edited a comment on pull request #9301: ARROW-11357: [Rust]: Fix out-of-bounds reads in `take` and other undefined behavior

2021-01-23 Thread GitBox
jorgecarleitao edited a comment on pull request #9301: URL: https://github.com/apache/arrow/pull/9301#issuecomment-766163142 Now I get your point. On the PR description, I mention two _independent_ sources of unsoundness: * we are using indices from null slots * we are not

[GitHub] [arrow] codecov-io edited a comment on pull request #9303: ARROW-11360: [Rust] [DataFusion] Improve CSV "No files found" error message

2021-01-23 Thread GitBox
codecov-io edited a comment on pull request #9303: URL: https://github.com/apache/arrow/pull/9303#issuecomment-766158982 # [Codecov](https://codecov.io/gh/apache/arrow/pull/9303?src=pr=h1) Report > Merging [#9303](https://codecov.io/gh/apache/arrow/pull/9303?src=pr=desc) (91fcd3a) into

[GitHub] [arrow] codecov-io edited a comment on pull request #9303: ARROW-11360: [Rust] [DataFusion] Improve CSV "No files found" error message

2021-01-23 Thread GitBox
codecov-io edited a comment on pull request #9303: URL: https://github.com/apache/arrow/pull/9303#issuecomment-766158982 # [Codecov](https://codecov.io/gh/apache/arrow/pull/9303?src=pr=h1) Report > Merging [#9303](https://codecov.io/gh/apache/arrow/pull/9303?src=pr=desc) (7fa024a) into

[GitHub] [arrow] codecov-io commented on pull request #9303: ARROW-11360: [Rust] [DataFusion] Improve CSV "No files found" error message

2021-01-23 Thread GitBox
codecov-io commented on pull request #9303: URL: https://github.com/apache/arrow/pull/9303#issuecomment-766158982 # [Codecov](https://codecov.io/gh/apache/arrow/pull/9303?src=pr=h1) Report > Merging [#9303](https://codecov.io/gh/apache/arrow/pull/9303?src=pr=desc) (bc2f754) into

[GitHub] [arrow] tyrelr commented on pull request #9304: ARROW-11361: [Rust] Build MutableBuffer/Buffer from iterator of bools

2021-01-23 Thread GitBox
tyrelr commented on pull request #9304: URL: https://github.com/apache/arrow/pull/9304#issuecomment-766177143 The 10% impacts are: ``` critcmp master-67d0c2e3 bool-38d4e395 -t 10 groupbool-38d4e395 master-67d0c2e3 -

[GitHub] [arrow] sunchao commented on a change in pull request #9215: ARROW-11270: [Rust] Array slice accessors

2021-01-23 Thread GitBox
sunchao commented on a change in pull request #9215: URL: https://github.com/apache/arrow/pull/9215#discussion_r563210211 ## File path: rust/arrow/src/array/array_string.rs ## @@ -50,49 +50,65 @@ pub struct GenericStringArray { } impl GenericStringArray { -/// Returns

[GitHub] [arrow] maxburke commented on pull request #9298: ARROW-11355: [Rust] Aligned Date DataType with specification.

2021-01-23 Thread GitBox
maxburke commented on pull request #9298: URL: https://github.com/apache/arrow/pull/9298#issuecomment-766135476 @alamb for what it's worth we don't use the Date32 or Date64 types, so we're good with this  This is an

[GitHub] [arrow] Dandandan commented on pull request #9301: ARROW-11357: [Rust]: Fix out-of-bounds reads in `take` and other undefined behavior

2021-01-23 Thread GitBox
Dandandan commented on pull request #9301: URL: https://github.com/apache/arrow/pull/9301#issuecomment-766150958 Wouldn't it be possible to have a similar as previously implementation for `take_no_nulls` and `take_values_nulls`? I think those are the most common anyway.

[GitHub] [arrow] jorgecarleitao edited a comment on pull request #9301: ARROW-11357: [Rust]: Fix out-of-bounds reads in `take` and other undefined behavior

2021-01-23 Thread GitBox
jorgecarleitao edited a comment on pull request #9301: URL: https://github.com/apache/arrow/pull/9301#issuecomment-766163142 Now I get your point. On the PR description, I mention to _independent_ sources of unsoundness: * we are using indices from null slots * we are not

[GitHub] [arrow] jorgecarleitao edited a comment on pull request #9301: ARROW-11357: [Rust]: Fix out-of-bounds reads in `take` and other undefined behavior

2021-01-23 Thread GitBox
jorgecarleitao edited a comment on pull request #9301: URL: https://github.com/apache/arrow/pull/9301#issuecomment-766163142 Now I get your point. On the PR description, I mention two _independent_ sources of unsoundness: * we are using indices from null slots * we are not

[GitHub] [arrow] codecov-io edited a comment on pull request #9304: ARROW-11361: [Rust] Build MutableBuffer/Buffer from iterator of bools

2021-01-23 Thread GitBox
codecov-io edited a comment on pull request #9304: URL: https://github.com/apache/arrow/pull/9304#issuecomment-766160749 # [Codecov](https://codecov.io/gh/apache/arrow/pull/9304?src=pr=h1) Report > Merging [#9304](https://codecov.io/gh/apache/arrow/pull/9304?src=pr=desc) (38d4e39) into

[GitHub] [arrow] jameswinegar commented on a change in pull request #9303: ARROW-11360: [Rust] [DataFusion] Improve CSV "No files found" error message

2021-01-23 Thread GitBox
jameswinegar commented on a change in pull request #9303: URL: https://github.com/apache/arrow/pull/9303#discussion_r563197517 ## File path: rust/arrow-flight/src/arrow.flight.protocol.rs ## @@ -498,9 +498,8 @@ pub mod flight_service_server { #[async_trait] pub trait

[GitHub] [arrow] jameswinegar commented on a change in pull request #9303: ARROW-11360: [Rust] [DataFusion] Improve CSV "No files found" error message

2021-01-23 Thread GitBox
jameswinegar commented on a change in pull request #9303: URL: https://github.com/apache/arrow/pull/9303#discussion_r563197536 ## File path: rust/datafusion/src/datasource/csv.rs ## @@ -66,7 +66,10 @@ impl CsvFile { let mut filenames: Vec = vec![];

[GitHub] [arrow] codecov-io edited a comment on pull request #9303: ARROW-11360: [Rust] [DataFusion] Improve CSV "No files found" error message

2021-01-23 Thread GitBox
codecov-io edited a comment on pull request #9303: URL: https://github.com/apache/arrow/pull/9303#issuecomment-766158982 # [Codecov](https://codecov.io/gh/apache/arrow/pull/9303?src=pr=h1) Report > Merging [#9303](https://codecov.io/gh/apache/arrow/pull/9303?src=pr=desc) (a424064) into

[GitHub] [arrow] jorgecarleitao commented on pull request #9301: ARROW-11357: [Rust]: Fix out-of-bounds reads in `take` and other undefined behavior

2021-01-23 Thread GitBox
jorgecarleitao commented on pull request #9301: URL: https://github.com/apache/arrow/pull/9301#issuecomment-766189110 I think that you are right, the perf degradation comes from other parts. I will update the PR tomorrow. well spotted!

[GitHub] [arrow] codecov-io edited a comment on pull request #9215: ARROW-11270: [Rust] Array slice accessors

2021-01-23 Thread GitBox
codecov-io edited a comment on pull request #9215: URL: https://github.com/apache/arrow/pull/9215#issuecomment-761621256 # [Codecov](https://codecov.io/gh/apache/arrow/pull/9215?src=pr=h1) Report > Merging [#9215](https://codecov.io/gh/apache/arrow/pull/9215?src=pr=desc) (e18e319) into

[GitHub] [arrow] tyrelr opened a new pull request #9304: ARROW-11361: [Rust] Build MutableBuffer/Buffer from iterator of bools

2021-01-23 Thread GitBox
tyrelr opened a new pull request #9304: URL: https://github.com/apache/arrow/pull/9304 create a from_iter implementation for Buffer to collect boolean flags into a buffer that will: * pre-allocate the appropriate size based on iterator site_hint() * 'buffer' a byte in-memory until

[GitHub] [arrow] github-actions[bot] commented on pull request #9304: ARROW-11361: [Rust] Build MutableBuffer/Buffer from iterator of bools

2021-01-23 Thread GitBox
github-actions[bot] commented on pull request #9304: URL: https://github.com/apache/arrow/pull/9304#issuecomment-766158211 https://issues.apache.org/jira/browse/ARROW-11361 This is an automated message from the Apache Git

[GitHub] [arrow] jorgecarleitao edited a comment on pull request #9301: ARROW-11357: [Rust]: Fix out-of-bounds reads in `take` and other undefined behavior

2021-01-23 Thread GitBox
jorgecarleitao edited a comment on pull request #9301: URL: https://github.com/apache/arrow/pull/9301#issuecomment-766163142 Now I get your point. On the PR description, I mention to _independent_ sources of unsoundness: * we are using indices from null slots * we are not

[GitHub] [arrow] jorgecarleitao edited a comment on pull request #9301: ARROW-11357: [Rust]: Fix out-of-bounds reads in `take` and other undefined behavior

2021-01-23 Thread GitBox
jorgecarleitao edited a comment on pull request #9301: URL: https://github.com/apache/arrow/pull/9301#issuecomment-766163142 Now I get your point. On the PR description, I mention to _independent_ sources of unsoundness: * we are using indices from null slots * we are not

[GitHub] [arrow] codecov-io edited a comment on pull request #9301: ARROW-11357: [Rust]: Fix out-of-bounds reads in `take` and other undefined behavior

2021-01-23 Thread GitBox
codecov-io edited a comment on pull request #9301: URL: https://github.com/apache/arrow/pull/9301#issuecomment-766090110 # [Codecov](https://codecov.io/gh/apache/arrow/pull/9301?src=pr=h1) Report > Merging [#9301](https://codecov.io/gh/apache/arrow/pull/9301?src=pr=desc) (3ec21b0) into

[GitHub] [arrow] Dandandan commented on a change in pull request #9301: ARROW-11357: [Rust]: Fix out-of-bounds reads in `take` and other undefined behavior

2021-01-23 Thread GitBox
Dandandan commented on a change in pull request #9301: URL: https://github.com/apache/arrow/pull/9301#discussion_r563175843 ## File path: rust/arrow/src/compute/kernels/take.rs ## @@ -254,6 +254,154 @@ impl Default for TakeOptions { } } +#[inline] +fn maybe_take( +

[GitHub] [arrow] eladroz edited a comment on pull request #9285: ARROW-10349: [Python] Build and publish aarch64 wheels [WIP]

2021-01-23 Thread GitBox
eladroz edited a comment on pull request #9285: URL: https://github.com/apache/arrow/pull/9285#issuecomment-766086589 @xhochy @kszucs hi, I've published my experience with building on AWS arm64 VM's

[GitHub] [arrow] andygrove commented on a change in pull request #9303: ARROW-11360: [Rust] [DataFusion] Improve CSV "No files found" error message

2021-01-23 Thread GitBox
andygrove commented on a change in pull request #9303: URL: https://github.com/apache/arrow/pull/9303#discussion_r563191338 ## File path: rust/datafusion/src/datasource/csv.rs ## @@ -66,7 +66,10 @@ impl CsvFile { let mut filenames: Vec = vec![];

[GitHub] [arrow] Dandandan commented on pull request #9301: ARROW-11357: [Rust]: Fix out-of-bounds reads in `take` and other undefined behavior

2021-01-23 Thread GitBox
Dandandan commented on pull request #9301: URL: https://github.com/apache/arrow/pull/9301#issuecomment-766173983 @jorgecarleitao yes, that's clear. I wouldn't expect a ~2x difference in performance in bounds checking alone, I would guess some other slowdown could be caused by the change

[GitHub] [arrow] codecov-io commented on pull request #9305: ARROW-11362:[Rust][DataFusion] Use iterator APIs in to_array_of_size to improve performance

2021-01-23 Thread GitBox
codecov-io commented on pull request #9305: URL: https://github.com/apache/arrow/pull/9305#issuecomment-766197876 # [Codecov](https://codecov.io/gh/apache/arrow/pull/9305?src=pr=h1) Report > Merging [#9305](https://codecov.io/gh/apache/arrow/pull/9305?src=pr=desc) (69c298e) into