[GitHub] [arrow] thisisnic edited a comment on pull request #10519: ARROW-12867: [R] Bindings for abs()

2021-06-14 Thread GitBox
thisisnic edited a comment on pull request #10519: URL: https://github.com/apache/arrow/pull/10519#issuecomment-860526967 Nice, thanks for submitting this PR! There is a test failure caused by the calls to `abs()` on lines 146 and 149 in `test-dplyr-arrange.R` - the test they're part of

[GitHub] [arrow] kiszk commented on pull request #10501: ARROW-13032: [Java] Update guava version

2021-06-14 Thread GitBox
kiszk commented on pull request #10501: URL: https://github.com/apache/arrow/pull/10501#issuecomment-860560735 LGTM -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For

[GitHub] [arrow-rs] alamb commented on a change in pull request #443: parquet: improve BOOLEAN writing logic and report error on encoding fail

2021-06-14 Thread GitBox
alamb commented on a change in pull request #443: URL: https://github.com/apache/arrow-rs/pull/443#discussion_r650959936 ## File path: parquet/src/data_type.rs ## @@ -661,8 +661,15 @@ pub(crate) mod private { _: W, bit_writer: BitWriter, )

[GitHub] [arrow] HashidaTKS opened a new pull request #10527: ARROW-6870: [C#] Add Support for Dictionary Arrays and Dictionary Encoding

2021-06-14 Thread GitBox
HashidaTKS opened a new pull request #10527: URL: https://github.com/apache/arrow/pull/10527 This is a implementation of `DictionaryBatch` (de)serialization for the streaming format. - The (de)serialization for the file format or Flight are not implemented yet - `isDelta` is not

[GitHub] [arrow] lidavidm commented on a change in pull request #10520: ARROW-12709: [C++] Add var_args_join

2021-06-14 Thread GitBox
lidavidm commented on a change in pull request #10520: URL: https://github.com/apache/arrow/pull/10520#discussion_r650967107 ## File path: docs/source/cpp/compute.rst ## @@ -682,17 +682,23 @@ String joining This function does the inverse of string splitting.

[GitHub] [arrow-rs] kszucs commented on a change in pull request #439: [WIP] FFI bridge for Schema, Field and DataType

2021-06-14 Thread GitBox
kszucs commented on a change in pull request #439: URL: https://github.com/apache/arrow-rs/pull/439#discussion_r650767379 ## File path: arrow/src/ffi.rs ## @@ -208,15 +213,25 @@ impl FFI_ArrowSchema { pub fn name() -> { assert!(!self.name.is_null()); //

[GitHub] [arrow] thisisnic commented on pull request #10519: ARROW-12867: [R] Bindings for abs()

2021-06-14 Thread GitBox
thisisnic commented on pull request #10519: URL: https://github.com/apache/arrow/pull/10519#issuecomment-860526967 Nice, thanks for submitting this PR! There is a test failure caused by the calls to `abs()` on lines 146 and 149 in `test-dplyr-arrange.R` - the test they're part of

[GitHub] [arrow] AlenkaF commented on pull request #10519: ARROW-12867: [R] Bindings for abs()

2021-06-14 Thread GitBox
AlenkaF commented on pull request #10519: URL: https://github.com/apache/arrow/pull/10519#issuecomment-860533901 Thank you for the comment @thisisnic! Of course, now I understand. I will remove `expect_warning` from `test-dplyr-arrange.R` right away. -- This is an automated message from

[GitHub] [arrow] jorisvandenbossche commented on pull request #10525: ARROW-12709: [CI] Use LLVM 10 for s390x

2021-06-14 Thread GitBox
jorisvandenbossche commented on pull request #10525: URL: https://github.com/apache/arrow/pull/10525#issuecomment-860546883 I think ARROW-13026 is the correct issue? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [arrow] github-actions[bot] commented on pull request #10525: ARROW-13026: [CI] Use LLVM 10 for s390x

2021-06-14 Thread GitBox
github-actions[bot] commented on pull request #10525: URL: https://github.com/apache/arrow/pull/10525#issuecomment-860547152 https://issues.apache.org/jira/browse/ARROW-13026 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [arrow] ianmcook commented on a change in pull request #10520: ARROW-12709: [C++] Add var_args_join

2021-06-14 Thread GitBox
ianmcook commented on a change in pull request #10520: URL: https://github.com/apache/arrow/pull/10520#discussion_r650963788 ## File path: docs/source/cpp/compute.rst ## @@ -682,17 +682,23 @@ String joining This function does the inverse of string splitting.

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #10476: ARROW-12499: [C++][Compute] Add ScalarAggregateOptions to Any and All kernels

2021-06-14 Thread GitBox
jorisvandenbossche commented on a change in pull request #10476: URL: https://github.com/apache/arrow/pull/10476#discussion_r650962333 ## File path: cpp/src/arrow/compute/kernels/aggregate_basic.cc ## @@ -166,32 +168,48 @@ struct BooleanAnyImpl : public ScalarAggregator {

[GitHub] [arrow] ggershinsky commented on pull request #10450: ARROW-9947: [Python] High-level Python API for Parquet encryption of files.

2021-06-14 Thread GitBox
ggershinsky commented on pull request #10450: URL: https://github.com/apache/arrow/pull/10450#issuecomment-860738074 @pitrou @wesm This is the core PR for bringing Parquet encryption to PyArrow and pandas. Due to possible threading differences between the two frameworks, this PR might

[GitHub] [arrow] github-actions[bot] commented on pull request #10526: ARROW-13048: [C++] Fix copying objects with special characters on S3FS

2021-06-14 Thread GitBox
github-actions[bot] commented on pull request #10526: URL: https://github.com/apache/arrow/pull/10526#issuecomment-860684718 https://issues.apache.org/jira/browse/ARROW-13048 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [arrow] lidavidm opened a new pull request #10526: ARROW-13048: [C++] Fix copying objects with special characters on S3FS

2021-06-14 Thread GitBox
lidavidm opened a new pull request #10526: URL: https://github.com/apache/arrow/pull/10526 Although the AWS SDK docs claim the caller must URL-encode the source path, the actual SDK source URL-encodes the path for you. This double-encoding was causing CopyFile to fail with a 404 not found

[GitHub] [arrow-rs] garyanaplan commented on pull request #443: parquet: improve BOOLEAN writing logic and report error on encoding fail

2021-06-14 Thread GitBox
garyanaplan commented on pull request #443: URL: https://github.com/apache/arrow-rs/pull/443#issuecomment-860712452 The problem with writing an effective test is that the error was only detected on file read and the read behaviour was to hang indefinitely. Taken together, those

[GitHub] [arrow-rs] alamb commented on pull request #443: parquet: improve BOOLEAN writing logic and report error on encoding fail

2021-06-14 Thread GitBox
alamb commented on pull request #443: URL: https://github.com/apache/arrow-rs/pull/443#issuecomment-860740823 > I can think of ways to do that with a timeout and assume that if the read doesn't finish within timeout, then it must have failed. @garyanaplan I don't think we need to

[GitHub] [arrow] lidavidm commented on pull request #10526: ARROW-13048: [C++] Fix copying objects with special characters on S3FS

2021-06-14 Thread GitBox
lidavidm commented on pull request #10526: URL: https://github.com/apache/arrow/pull/10526#issuecomment-860689176 Reference for AWS encoding the path for you:

[GitHub] [arrow-rs] codecov-commenter edited a comment on pull request #439: [WIP] FFI bridge for Schema, Field and DataType

2021-06-14 Thread GitBox
codecov-commenter edited a comment on pull request #439: URL: https://github.com/apache/arrow-rs/pull/439#issuecomment-857974778 #

[GitHub] [arrow-rs] alamb opened a new pull request #457: Add sort boolean benchmark

2021-06-14 Thread GitBox
alamb opened a new pull request #457: URL: https://github.com/apache/arrow-rs/pull/457 # Which issue does this PR close? Re #447 # Rationale for this change While reviewing #448 I wanted to measure performance change # What changes are included in this PR?

[GitHub] [arrow-datafusion] edrevo commented on a change in pull request #543: Ballista: Implement map-side shuffle

2021-06-14 Thread GitBox
edrevo commented on a change in pull request #543: URL: https://github.com/apache/arrow-datafusion/pull/543#discussion_r650881738 ## File path: ballista/rust/core/src/execution_plans/query_stage.rs ## @@ -150,32 +159,150 @@ impl ExecutionPlan for QueryStageExec {

[GitHub] [arrow-datafusion] edrevo commented on a change in pull request #543: Ballista: Implement map-side shuffle

2021-06-14 Thread GitBox
edrevo commented on a change in pull request #543: URL: https://github.com/apache/arrow-datafusion/pull/543#discussion_r650883972 ## File path: ballista/rust/core/src/execution_plans/query_stage.rs ## @@ -150,32 +159,150 @@ impl ExecutionPlan for QueryStageExec {

[GitHub] [arrow] github-actions[bot] commented on pull request #10527: ARROW-6870: [C#] Add Support for Dictionary Arrays and Dictionary Encoding

2021-06-14 Thread GitBox
github-actions[bot] commented on pull request #10527: URL: https://github.com/apache/arrow/pull/10527#issuecomment-860702579 https://issues.apache.org/jira/browse/ARROW-6870 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [arrow] ggershinsky edited a comment on pull request #10450: ARROW-9947: [Python] High-level Python API for Parquet encryption of files.

2021-06-14 Thread GitBox
ggershinsky edited a comment on pull request #10450: URL: https://github.com/apache/arrow/pull/10450#issuecomment-860738074 @pitrou @wesm This is the core PR for bringing Parquet encryption to PyArrow and pandas. Due to possible threading differences between the two frameworks, this PR

[GitHub] [arrow-rs] ritchie46 opened a new issue #458: Arrow 4.3.0 does not compile for feature gates `["simd", "avx512"], ["simd"]`

2021-06-14 Thread GitBox
ritchie46 opened a new issue #458: URL: https://github.com/apache/arrow-rs/issues/458 The released arrow version 4.3.0 does not compile with SIMD feature flags: ``` # compiles =4.2: features = ["simd", "avx512"]# does not compile =4.2: features = ["simd"] =4.3: features =

[GitHub] [arrow-rs] kszucs commented on pull request #439: [WIP] FFI bridge for Schema, Field and DataType

2021-06-14 Thread GitBox
kszucs commented on pull request #439: URL: https://github.com/apache/arrow-rs/pull/439#issuecomment-860523380 Thanks for the review, I'm going to add more tests. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [arrow-rs] alamb commented on issue #456: Arrow 4.3 release: error[E0658]: use of unstable library feature 'partition_point': new API

2021-06-14 Thread GitBox
alamb commented on issue #456: URL: https://github.com/apache/arrow-rs/issues/456#issuecomment-860585543 It turns out my project had pinned the Rust toolchain to 1.51 and arrow now requires 1.52. Updating the rust-toolchain pin solves the problem ``` diff --git a/rust-toolchain

[GitHub] [arrow-datafusion] alamb merged pull request #548: reuse code for now function expr creation

2021-06-14 Thread GitBox
alamb merged pull request #548: URL: https://github.com/apache/arrow-datafusion/pull/548 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service,

[GitHub] [arrow-rs] alamb merged pull request #414: Doctests for DecimalArray.

2021-06-14 Thread GitBox
alamb merged pull request #414: URL: https://github.com/apache/arrow-rs/pull/414 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please

[GitHub] [arrow-rs] alamb commented on a change in pull request #414: Doctests for DecimalArray.

2021-06-14 Thread GitBox
alamb commented on a change in pull request #414: URL: https://github.com/apache/arrow-rs/pull/414#discussion_r650956897 ## File path: arrow/src/array/array_binary.rs ## @@ -613,6 +613,30 @@ impl Array for FixedSizeBinaryArray { } /// A type of `DecimalArray` whose

[GitHub] [arrow-datafusion] codecov-commenter commented on pull request #558: Implement window functions with `partition_by` clause

2021-06-14 Thread GitBox
codecov-commenter commented on pull request #558: URL: https://github.com/apache/arrow-datafusion/pull/558#issuecomment-860569767 #

[GitHub] [arrow-rs] alamb commented on issue #394: my contribution not marged in 4.2 release

2021-06-14 Thread GitBox
alamb commented on issue #394: URL: https://github.com/apache/arrow-rs/issues/394#issuecomment-860576942 @kazuk 4.3.0 has been released https://crates.io/crates/arrow/4.3.0 -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [arrow-rs] alamb opened a new issue #456: Arrow 4.3 release: error[E0658]: use of unstable library feature 'partition_point': new API

2021-06-14 Thread GitBox
alamb opened a new issue #456: URL: https://github.com/apache/arrow-rs/issues/456 **Describe the bug** When I upgraded my project via `cargo update` and it picked up https://crates.io/crates/arrow/4.3.0 I got the following error: ``` Compiling tracker v0.1.0

[GitHub] [arrow-rs] alamb commented on pull request #443: parquet: improve BOOLEAN writing logic and report error on encoding fail

2021-06-14 Thread GitBox
alamb commented on pull request #443: URL: https://github.com/apache/arrow-rs/pull/443#issuecomment-860697522 @garyanaplan what would you say to using the reproducer from https://github.com/apache/arrow-rs/issues/349 to test this issue? I realize like it probably seems unnecessary for

[GitHub] [arrow] HashidaTKS commented on pull request #10527: ARROW-6870: [C#] Add Support for Dictionary Arrays and Dictionary Encoding

2021-06-14 Thread GitBox
HashidaTKS commented on pull request #10527: URL: https://github.com/apache/arrow/pull/10527#issuecomment-860738406 cc: @eerhardt Would you please review this when you have time? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [arrow] ggershinsky edited a comment on pull request #10450: ARROW-9947: [Python] High-level Python API for Parquet encryption of files.

2021-06-14 Thread GitBox
ggershinsky edited a comment on pull request #10450: URL: https://github.com/apache/arrow/pull/10450#issuecomment-860738074 @pitrou @wesm This is the core PR for bringing Parquet encryption to PyArrow and pandas. Due to possible threading differences between the two frameworks, this PR

[GitHub] [arrow-rs] alamb commented on a change in pull request #443: parquet: improve BOOLEAN writing logic and report error on encoding fail

2021-06-14 Thread GitBox
alamb commented on a change in pull request #443: URL: https://github.com/apache/arrow-rs/pull/443#discussion_r651058305 ## File path: parquet/tests/boolean_writer.rs ## @@ -0,0 +1,100 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor

[GitHub] [arrow-datafusion] alamb commented on a change in pull request #520: Implement window functions with `order_by` clause

2021-06-14 Thread GitBox
alamb commented on a change in pull request #520: URL: https://github.com/apache/arrow-datafusion/pull/520#discussion_r651068287 ## File path: datafusion/src/physical_plan/expressions/nth_value.rs ## @@ -113,54 +111,32 @@ impl BuiltInWindowFunctionExpr for NthValue {

[GitHub] [arrow] dianaclarke opened a new pull request #10528: ARROW-13073: [Developer] archery benchmark list: unexpected keyword 'benchmark_filter'

2021-06-14 Thread GitBox
dianaclarke opened a new pull request #10528: URL: https://github.com/apache/arrow/pull/10528 ``` $ archery benchmark list Traceback (most recent call last): File "/Users/diana/envs/arrow/bin/archery", line 33, in sys.exit(load_entry_point('archery', 'console_scripts',

[GitHub] [arrow] github-actions[bot] commented on pull request #10528: ARROW-13073: [Developer] archery benchmark list: unexpected keyword 'benchmark_filter'

2021-06-14 Thread GitBox
github-actions[bot] commented on pull request #10528: URL: https://github.com/apache/arrow/pull/10528#issuecomment-860812473 https://issues.apache.org/jira/browse/ARROW-13073 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [arrow-datafusion] alamb commented on a change in pull request #55: Support qualified columns in queries

2021-06-14 Thread GitBox
alamb commented on a change in pull request #55: URL: https://github.com/apache/arrow-datafusion/pull/55#discussion_r651124509 ## File path: datafusion/src/physical_plan/expressions/nth_value.rs ## @@ -195,7 +195,7 @@ mod tests { fn first_value() -> Result<()> {

[GitHub] [arrow] kszucs opened a new pull request #10529: ARROW-13075: [Python] Expose C data interface API for pyarrow.Field

2021-06-14 Thread GitBox
kszucs opened a new pull request #10529: URL: https://github.com/apache/arrow/pull/10529 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service,

[GitHub] [arrow-datafusion] alamb commented on a change in pull request #561: Revert pruning on not equal predicate

2021-06-14 Thread GitBox
alamb commented on a change in pull request #561: URL: https://github.com/apache/arrow-datafusion/pull/561#discussion_r651166362 ## File path: datafusion/src/physical_optimizer/pruning.rs ## @@ -1190,6 +1162,34 @@ mod tests { assert_eq!(result, expected); } +

[GitHub] [arrow-rs] alamb commented on pull request #443: parquet: improve BOOLEAN writing logic and report error on encoding fail

2021-06-14 Thread GitBox
alamb commented on pull request #443: URL: https://github.com/apache/arrow-rs/pull/443#issuecomment-860783118 > I'd already written the test, just been in meetings. If we'd rather rely on the test framework to terminate hanging tests, just remove the thread/mpsc/channel stuff and do a

[GitHub] [arrow] github-actions[bot] commented on pull request #10529: ARROW-13075: [Python] Expose C data interface API for pyarrow.Field

2021-06-14 Thread GitBox
github-actions[bot] commented on pull request #10529: URL: https://github.com/apache/arrow/pull/10529#issuecomment-860868416 https://issues.apache.org/jira/browse/ARROW-13075 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [arrow] BryanCutler edited a comment on pull request #10513: ARROW-13044: [Java] Change UnionVector and DenseUnionVector to extend AbstractContainerVector

2021-06-14 Thread GitBox
BryanCutler edited a comment on pull request #10513: URL: https://github.com/apache/arrow/pull/10513#issuecomment-860884676 Thanks @lidavidm @liyafan82 , I made https://issues.apache.org/jira/browse/ARROW-13076 for ExtensionTypeVector to use ValueVector. If this PR looks ok for

[GitHub] [arrow] BryanCutler commented on pull request #10513: ARROW-13044: [Java] Change UnionVector and DenseUnionVector to extend AbstractContainerVector

2021-06-14 Thread GitBox
BryanCutler commented on pull request #10513: URL: https://github.com/apache/arrow/pull/10513#issuecomment-860884676 Thanks @lidavidm , I made https://issues.apache.org/jira/browse/ARROW-13076 for ExtensionTypeVector to use ValueVector. If this PR looks ok for union vectors, I'll

[GitHub] [arrow-datafusion] alamb commented on a change in pull request #55: Support qualified columns in queries

2021-06-14 Thread GitBox
alamb commented on a change in pull request #55: URL: https://github.com/apache/arrow-datafusion/pull/55#discussion_r651112853 ## File path: datafusion/src/logical_plan/dfschema.rs ## @@ -208,6 +297,28 @@ impl Into for DFSchema { } } +impl Into for { +/// Convert

[GitHub] [arrow-datafusion] alamb opened a new issue #560: Pruning on `!=` predicate results in incorrect results

2021-06-14 Thread GitBox
alamb opened a new issue #560: URL: https://github.com/apache/arrow-datafusion/issues/560 **Describe the bug** The logic for pruning on `!=` predicates introduced in https://github.com/apache/arrow-datafusion/pull/544 is incorrect **To Reproduce** Use the pruning logic with a

[GitHub] [arrow-datafusion] alamb edited a comment on pull request #544: Not equal predicate in physical_planning pruning

2021-06-14 Thread GitBox
alamb edited a comment on pull request #544: URL: https://github.com/apache/arrow-datafusion/pull/544#issuecomment-860882163 I think we got the logic slightly backwards here -- see https://github.com/apache/arrow-datafusion/issues/560. FYI @jgoday -- This is an automated message from

[GitHub] [arrow-datafusion] alamb commented on pull request #544: Not equal predicate in physical_planning pruning

2021-06-14 Thread GitBox
alamb commented on pull request #544: URL: https://github.com/apache/arrow-datafusion/pull/544#issuecomment-860882163 I think we got the logic slightly backwards here -- see https://github.com/apache/arrow-datafusion/issues/560 -- This is an automated message from the Apache Git

[GitHub] [arrow-datafusion] jgoday commented on pull request #561: Fix pruning on not equal predicate

2021-06-14 Thread GitBox
jgoday commented on pull request #561: URL: https://github.com/apache/arrow-datafusion/pull/561#issuecomment-860937695 > fyi @jgoday @alamb, Sorry for introducing the bug, prune_not_eq_data test is much clearer to me now. Thank you! -- This is an automated message from the Apache

[GitHub] [arrow-datafusion] alamb commented on pull request #510: Support modulus op

2021-06-14 Thread GitBox
alamb commented on pull request #510: URL: https://github.com/apache/arrow-datafusion/pull/510#issuecomment-860786176 Arrow 4.3.0 has been released so if you rebase this PR it will likely be ready to go -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [arrow-datafusion] alamb commented on pull request #55: Support qualified columns in queries

2021-06-14 Thread GitBox
alamb commented on pull request #55: URL: https://github.com/apache/arrow-datafusion/pull/55#issuecomment-860846672 I suggest we get this PR rebased and merged asap to minimize conflicts -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [arrow] bkietz commented on pull request #10511: ARROW-13025: [C++][Python] Add FunctionOptions::Equals/ToString/Serialize

2021-06-14 Thread GitBox
bkietz commented on pull request #10511: URL: https://github.com/apache/arrow/pull/10511#issuecomment-860852122 @lidavidm thanks for working on this! >I don't like adding protected methods to a struct, and it's inconsistent with how equality is implemented for other structs (via

[GitHub] [arrow-datafusion] alamb commented on pull request #342: Left join could use bitmap for left join instead of Vec

2021-06-14 Thread GitBox
alamb commented on pull request #342: URL: https://github.com/apache/arrow-datafusion/pull/342#issuecomment-860786614 FYI arrow 4.3.0 has been released with the code in BooleanBufferBuilder -- This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [arrow-datafusion] Dandandan closed issue #512: hash_join.rs's create_hashes function panics with float columns with nightly rustc

2021-06-14 Thread GitBox
Dandandan closed issue #512: URL: https://github.com/apache/arrow-datafusion/issues/512 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service,

[GitHub] [arrow-datafusion] Dandandan merged pull request #556: hash float arrays using primitive usigned integer type

2021-06-14 Thread GitBox
Dandandan merged pull request #556: URL: https://github.com/apache/arrow-datafusion/pull/556 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this

[GitHub] [arrow-datafusion] Dandandan commented on pull request #556: hash float arrays using primitive usigned integer type

2021-06-14 Thread GitBox
Dandandan commented on pull request #556: URL: https://github.com/apache/arrow-datafusion/pull/556#issuecomment-860795232 Thanks @houqp -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow-datafusion] alamb commented on pull request #546: parallelize window function evaluations

2021-06-14 Thread GitBox
alamb commented on pull request #546: URL: https://github.com/apache/arrow-datafusion/pull/546#issuecomment-860810784 Just to be super clear, I am not suggesting we add a task scheduler as part of adding window functions -- I was trying to say that I felt following the existing pattern of

[GitHub] [arrow-datafusion] alamb commented on a change in pull request #561: Revert pruning on not equal predicate

2021-06-14 Thread GitBox
alamb commented on a change in pull request #561: URL: https://github.com/apache/arrow-datafusion/pull/561#discussion_r651166362 ## File path: datafusion/src/physical_optimizer/pruning.rs ## @@ -1190,6 +1162,34 @@ mod tests { assert_eq!(result, expected); } +

[GitHub] [arrow] jorisvandenbossche commented on pull request #10457: ARROW-12980: [C++] Kernels to extract datetime components should be timezone aware

2021-06-14 Thread GitBox
jorisvandenbossche commented on pull request #10457: URL: https://github.com/apache/arrow/pull/10457#issuecomment-860765541 > I'll wait for the consensus to build on the timezone handling discussions before closing the PR and moving the python tests to a new PR. I think there was no

[GitHub] [arrow-rs] codecov-commenter commented on pull request #457: Add sort boolean benchmark

2021-06-14 Thread GitBox
codecov-commenter commented on pull request #457: URL: https://github.com/apache/arrow-rs/pull/457#issuecomment-860769544 #

[GitHub] [arrow] lidavidm commented on pull request #10520: ARROW-12709: [C++] Add var_args_join

2021-06-14 Thread GitBox
lidavidm commented on pull request #10520: URL: https://github.com/apache/arrow/pull/10520#issuecomment-860785077 Or perhaps `element_wise_binary_join` since it's also `element_wise_min`? -- This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [arrow] jorisvandenbossche commented on pull request #10520: ARROW-12709: [C++] Add var_args_join

2021-06-14 Thread GitBox
jorisvandenbossche commented on pull request #10520: URL: https://github.com/apache/arrow/pull/10520#issuecomment-860792611 That would indeed be more consistent. Personally, searching for the function / "tab-completion" in mind, I think having the name start with "binary_join" or

[GitHub] [arrow] thisisnic commented on pull request #10519: ARROW-12867: [R] Bindings for abs()

2021-06-14 Thread GitBox
thisisnic commented on pull request #10519: URL: https://github.com/apache/arrow/pull/10519#issuecomment-860792424 Awesome, thanks! Another really minor change to suggest: the approach to your unit tests is great; however, there's a helper function in the Arrow package called

[GitHub] [arrow-datafusion] Dandandan commented on a change in pull request #543: Ballista: Implement map-side shuffle

2021-06-14 Thread GitBox
Dandandan commented on a change in pull request #543: URL: https://github.com/apache/arrow-datafusion/pull/543#discussion_r651076922 ## File path: ballista/rust/core/src/execution_plans/query_stage.rs ## @@ -150,32 +159,150 @@ impl ExecutionPlan for QueryStageExec {

[GitHub] [arrow-datafusion] alamb commented on a change in pull request #543: Ballista: Implement map-side shuffle

2021-06-14 Thread GitBox
alamb commented on a change in pull request #543: URL: https://github.com/apache/arrow-datafusion/pull/543#discussion_r651085152 ## File path: ballista/rust/core/src/execution_plans/query_stage.rs ## @@ -150,32 +159,150 @@ impl ExecutionPlan for QueryStageExec {

[GitHub] [arrow] kiszk closed pull request #10525: ARROW-13026: [CI] Use LLVM 10 for s390x

2021-06-14 Thread GitBox
kiszk closed pull request #10525: URL: https://github.com/apache/arrow/pull/10525 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please

[GitHub] [arrow] lidavidm commented on pull request #10520: ARROW-12709: [C++] Add var_args_join

2021-06-14 Thread GitBox
lidavidm commented on pull request #10520: URL: https://github.com/apache/arrow/pull/10520#issuecomment-860807360 Ah, maybe then we should rename `element_wise_min` to `min_element_wise`. -- This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [arrow-datafusion] alamb opened a new pull request #561: Revert pruning on not equal predicate

2021-06-14 Thread GitBox
alamb opened a new pull request #561: URL: https://github.com/apache/arrow-datafusion/pull/561 Closes #560 # Rationale for this change Logic is incorrect # What changes are included in this PR? 1. Revert

[GitHub] [arrow-rs] garyanaplan commented on pull request #443: parquet: improve BOOLEAN writing logic and report error on encoding fail

2021-06-14 Thread GitBox
garyanaplan commented on pull request #443: URL: https://github.com/apache/arrow-rs/pull/443#issuecomment-860768278 I'd already written the test, just been in meetings. If we'd rather rely on the test framework to terminate hanging tests, just remove the thread/mpsc/channel stuff and do a

[GitHub] [arrow] jorisvandenbossche commented on pull request #10520: ARROW-12709: [C++] Add var_args_join

2021-06-14 Thread GitBox
jorisvandenbossche commented on pull request #10520: URL: https://github.com/apache/arrow/pull/10520#issuecomment-860784051 Some naming nitpicks ;) I think "var_args_join" is not super clear. Having a notion about it being for string data would be good, and the scalar list of string

[GitHub] [arrow-datafusion] Dandandan closed issue #557: Filters aren't passed down to table scans in a union

2021-06-14 Thread GitBox
Dandandan closed issue #557: URL: https://github.com/apache/arrow-datafusion/issues/557 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service,

[GitHub] [arrow-datafusion] Dandandan merged pull request #559: Filter push down for Union

2021-06-14 Thread GitBox
Dandandan merged pull request #559: URL: https://github.com/apache/arrow-datafusion/pull/559 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this

[GitHub] [arrow-datafusion] alamb commented on pull request #561: Revert pruning on not equal predicate

2021-06-14 Thread GitBox
alamb commented on pull request #561: URL: https://github.com/apache/arrow-datafusion/pull/561#issuecomment-860886806 I actually think I can still support not equals, I just need to make it a bit more restricted -- This is an automated message from the Apache Git Service. To respond

[GitHub] [arrow-datafusion] alamb commented on a change in pull request #561: Revert pruning on not equal predicate

2021-06-14 Thread GitBox
alamb commented on a change in pull request #561: URL: https://github.com/apache/arrow-datafusion/pull/561#discussion_r651166601 ## File path: datafusion/src/physical_optimizer/pruning.rs ## @@ -1190,6 +1162,34 @@ mod tests { assert_eq!(result, expected); } +

[GitHub] [arrow-datafusion] alamb edited a comment on pull request #544: Not equal predicate in physical_planning pruning

2021-06-14 Thread GitBox
alamb edited a comment on pull request #544: URL: https://github.com/apache/arrow-datafusion/pull/544#issuecomment-860882163 I think we got the logic a bit too aggressive see -- see https://github.com/apache/arrow-datafusion/issues/560. FYI @jgoday -- This is an automated message from

[GitHub] [arrow] ianmcook commented on pull request #10520: ARROW-12709: [C++] Add var_args_join

2021-06-14 Thread GitBox
ianmcook commented on pull request #10520: URL: https://github.com/apache/arrow/pull/10520#issuecomment-860779304 Thanks for working on this @lidavidm! I will add the relevant functions to the R bindings after this is merged (ARROW-11514). -- This is an automated message from the

[GitHub] [arrow] jorisvandenbossche edited a comment on pull request #10520: ARROW-12709: [C++] Add var_args_join

2021-06-14 Thread GitBox
jorisvandenbossche edited a comment on pull request #10520: URL: https://github.com/apache/arrow/pull/10520#issuecomment-860784051 Some naming nitpicks ;) I think "var_args_join" is not super clear. Having a notion about it being for string data would be good, and the scalar list of

[GitHub] [arrow] ianmcook commented on pull request #10520: ARROW-12709: [C++] Add var_args_join

2021-06-14 Thread GitBox
ianmcook commented on pull request #10520: URL: https://github.com/apache/arrow/pull/10520#issuecomment-860817694 The use of "binary" in the names of these string join kernels is unfortunate; it's not clear at first glance whether "binary" is a reference to the arity or to the input type.

[GitHub] [arrow-datafusion] alamb commented on a change in pull request #55: Support qualified columns in queries

2021-06-14 Thread GitBox
alamb commented on a change in pull request #55: URL: https://github.com/apache/arrow-datafusion/pull/55#discussion_r651125970 ## File path: datafusion/src/physical_plan/planner.rs ## @@ -56,6 +56,121 @@ use expressions::col; use log::debug; use std::sync::Arc; +fn

[GitHub] [arrow] AlenkaF commented on pull request #10519: ARROW-12867: [R] Bindings for abs()

2021-06-14 Thread GitBox
AlenkaF commented on pull request #10519: URL: https://github.com/apache/arrow/pull/10519#issuecomment-860853969 Sure, I am happy to do it. Thank you so much @thisisnic! Will let you know if I get stuck ;) -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [arrow] pitrou closed pull request #10529: ARROW-13075: [Python] Expose C data interface API for pyarrow.Field

2021-06-14 Thread GitBox
pitrou closed pull request #10529: URL: https://github.com/apache/arrow/pull/10529 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please

[GitHub] [arrow-datafusion] alamb commented on pull request #561: Revert pruning on not equal predicate

2021-06-14 Thread GitBox
alamb commented on pull request #561: URL: https://github.com/apache/arrow-datafusion/pull/561#issuecomment-860896807 fyi @jgoday -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [arrow-datafusion] alamb removed a comment on pull request #561: Revert pruning on not equal predicate

2021-06-14 Thread GitBox
alamb removed a comment on pull request #561: URL: https://github.com/apache/arrow-datafusion/pull/561#issuecomment-860886806 I actually think I can still support not equals, I just need to make it a bit more restricted -- This is an automated message from the Apache Git Service. To

[GitHub] [arrow-datafusion] alamb commented on pull request #561: Fix pruning on not equal predicate

2021-06-14 Thread GitBox
alamb commented on pull request #561: URL: https://github.com/apache/arrow-datafusion/pull/561#issuecomment-860947331 > @alamb, Sorry for introducing the bug, prune_not_eq_data test is much clearer to me now. Thank you! No worries @jgoday -- both @Dandandan and I missed it on the

[GitHub] [arrow-datafusion] Dandandan commented on a change in pull request #561: Fix pruning on not equal predicate

2021-06-14 Thread GitBox
Dandandan commented on a change in pull request #561: URL: https://github.com/apache/arrow-datafusion/pull/561#discussion_r651243733 ## File path: datafusion/src/physical_optimizer/pruning.rs ## @@ -553,12 +553,14 @@ fn build_predicate_expression( let corrected_op =

[GitHub] [arrow-datafusion] Jimexist commented on a change in pull request #520: Implement window functions with `order_by` clause

2021-06-14 Thread GitBox
Jimexist commented on a change in pull request #520: URL: https://github.com/apache/arrow-datafusion/pull/520#discussion_r651356971 ## File path: datafusion/src/physical_plan/expressions/nth_value.rs ## @@ -113,54 +111,32 @@ impl BuiltInWindowFunctionExpr for NthValue {

[GitHub] [arrow-datafusion] Jimexist commented on pull request #429: implement lead and lag built-in window function

2021-06-14 Thread GitBox
Jimexist commented on pull request #429: URL: https://github.com/apache/arrow-datafusion/pull/429#issuecomment-861103557 putting this back to draft as this relies on https://github.com/apache/arrow-rs/pull/388 which is not yet in arrow 4.3 -- This is an automated message from the Apache

[GitHub] [arrow] lidavidm commented on pull request #10412: ARROW-9430: [C++] Implement replace_with_mask kernel

2021-06-14 Thread GitBox
lidavidm commented on pull request #10412: URL: https://github.com/apache/arrow/pull/10412#issuecomment-860961527 @bkietz @jorisvandenbossche I know y'all are busy, but any other comments? Once this is in, @nirandaperera can get started on ARROW-9431 on top of this -- This is an

[GitHub] [arrow] ianmcook commented on pull request #10520: ARROW-12709: [C++] Add binary_join_element_wise

2021-06-14 Thread GitBox
ianmcook commented on pull request #10520: URL: https://github.com/apache/arrow/pull/10520#issuecomment-861034008 I'm unclear on what the intended behavior is when you choose the {{SKIP}} null handling behavior and the separator is an array. Could you describe that please? Thanks --

[GitHub] [arrow] ianmcook edited a comment on pull request #10520: ARROW-12709: [C++] Add binary_join_element_wise

2021-06-14 Thread GitBox
ianmcook edited a comment on pull request #10520: URL: https://github.com/apache/arrow/pull/10520#issuecomment-861034008 I'm unclear on what the intended behavior is when you choose the `SKIP` null handling behavior and the separator is an array. Could you describe that please? Thanks

[GitHub] [arrow] lidavidm edited a comment on pull request #10520: ARROW-12709: [C++] Add binary_join_element_wise

2021-06-14 Thread GitBox
lidavidm edited a comment on pull request #10520: URL: https://github.com/apache/arrow/pull/10520#issuecomment-861035926 The null handling behavior never affects the separator: it only describes how to handle the other values. It's intended to let you mimic libcudf. ```

[GitHub] [arrow-datafusion] Jimexist commented on a change in pull request #520: Implement window functions with `order_by` clause

2021-06-14 Thread GitBox
Jimexist commented on a change in pull request #520: URL: https://github.com/apache/arrow-datafusion/pull/520#discussion_r651358846 ## File path: datafusion/src/physical_plan/windows.rs ## @@ -156,31 +162,72 @@ impl WindowExpr for BuiltInWindowExpr {

[GitHub] [arrow-datafusion] alamb closed issue #560: Pruning on `!=` predicate results in incorrect results

2021-06-14 Thread GitBox
alamb closed issue #560: URL: https://github.com/apache/arrow-datafusion/issues/560 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service,

[GitHub] [arrow-datafusion] alamb merged pull request #561: Fix pruning on not equal predicate

2021-06-14 Thread GitBox
alamb merged pull request #561: URL: https://github.com/apache/arrow-datafusion/pull/561 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service,

[GitHub] [arrow-rs] Jimexist commented on pull request #457: Add sort boolean benchmark

2021-06-14 Thread GitBox
Jimexist commented on pull request #457: URL: https://github.com/apache/arrow-rs/pull/457#issuecomment-861078612 was wondering what if you increase it to say 2^16? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [arrow] lidavidm opened a new pull request #10530: ARROW-13072: [C++] Add bit-wise arithmetic kernels

2021-06-14 Thread GitBox
lidavidm opened a new pull request #10530: URL: https://github.com/apache/arrow/pull/10530 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this

[GitHub] [arrow-datafusion] Dandandan commented on a change in pull request #561: Fix pruning on not equal predicate

2021-06-14 Thread GitBox
Dandandan commented on a change in pull request #561: URL: https://github.com/apache/arrow-datafusion/pull/561#discussion_r651241366 ## File path: datafusion/src/physical_optimizer/pruning.rs ## @@ -1190,6 +1192,34 @@ mod tests { assert_eq!(result, expected); }

  1   2   3   4   >