[GitHub] [arrow] kszucs commented on a change in pull request #7478: ARROW-9055: [C++] Add sum/mean/minmax kernels for Boolean type
kszucs commented on a change in pull request #7478: URL: https://github.com/apache/arrow/pull/7478#discussion_r446266184 ## File path: cpp/src/arrow/compute/kernels/aggregate_test.cc ## @@ -399,15 +434,59 @@ class TestNumericMinMaxKernel : public ::testing::Test { }; template -class TestFloatingMinMaxKernel : public TestNumericMinMaxKernel {}; +class TestBooleanMinMaxKernel : public TestPrimitiveMinMaxKernel {}; + +template +class TestIntegerMinMaxKernel : public TestPrimitiveMinMaxKernel {}; + +template +class TestFloatingMinMaxKernel : public TestPrimitiveMinMaxKernel {}; + +TYPED_TEST_SUITE(TestBooleanMinMaxKernel, BooleanArrowType); Review comment: Right. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kszucs commented on a change in pull request #7478: ARROW-9055: [C++] Add sum/mean/minmax kernels for Boolean type
kszucs commented on a change in pull request #7478: URL: https://github.com/apache/arrow/pull/7478#discussion_r446264863 ## File path: cpp/src/arrow/compute/kernels/aggregate_basic.cc ## @@ -397,24 +452,26 @@ struct MinMaxImpl : public ScalarAggregator { ArrayType arr(batch[0].array()); -local.has_nulls = arr.null_count() > 0; +const auto null_count = arr.null_count(); +local.has_nulls = null_count > 0; +local.has_values = (arr.length() - null_count) > 0; + if (local.has_nulls && options.null_handling == MinMaxOptions::OUTPUT_NULL) { this->state = local; return; } -const auto values = arr.raw_values(); -if (arr.null_count() > 0) { +if (local.has_nulls) { BitmapReader reader(arr.null_bitmap_data(), arr.offset(), arr.length()); for (int64_t i = 0; i < arr.length(); i++) { if (reader.IsSet()) { - local.MergeOne(values[i]); + local.MergeOne(arr.Value(i)); } reader.Next(); } } else { for (int64_t i = 0; i < arr.length(); i++) { -local.MergeOne(values[i]); +local.MergeOne(arr.Value(i)); } Review comment: We could optimize it further by early returning on the first occurrence of true/false values. Going to create a jira once it's merged. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kszucs commented on a change in pull request #7478: ARROW-9055: [C++] Add sum/mean/minmax kernels for Boolean type
kszucs commented on a change in pull request #7478: URL: https://github.com/apache/arrow/pull/7478#discussion_r446219965 ## File path: cpp/src/arrow/compute/kernels/aggregate_test.cc ## @@ -399,15 +434,59 @@ class TestNumericMinMaxKernel : public ::testing::Test { }; template -class TestFloatingMinMaxKernel : public TestNumericMinMaxKernel {}; +class TestBooleanMinMaxKernel : public TestPrimitiveMinMaxKernel {}; + +template +class TestIntegerMinMaxKernel : public TestPrimitiveMinMaxKernel {}; + +template +class TestFloatingMinMaxKernel : public TestPrimitiveMinMaxKernel {}; + +TYPED_TEST_SUITE(TestBooleanMinMaxKernel, BooleanArrowType); Review comment: I chose it in order to reuse `AssertMinMaxIsNull` and `AssertMinMaxIs` methods. @bkietz can I instantiate `TestPrimitiveMinMaxKernel` without TYPED_TEST macro? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kszucs commented on a change in pull request #7478: ARROW-9055: [C++] Add sum/mean/minmax kernels for Boolean type
kszucs commented on a change in pull request #7478: URL: https://github.com/apache/arrow/pull/7478#discussion_r445007543 ## File path: cpp/src/arrow/compute/kernels/aggregate_basic.cc ## @@ -397,24 +452,26 @@ struct MinMaxImpl : public ScalarAggregator { ArrayType arr(batch[0].array()); -local.has_nulls = arr.null_count() > 0; +const auto null_count = arr.null_count(); +local.has_nulls = null_count > 0; +local.has_values = (arr.length() - null_count) > 0; + if (local.has_nulls && options.null_handling == MinMaxOptions::OUTPUT_NULL) { this->state = local; return; } -const auto values = arr.raw_values(); -if (arr.null_count() > 0) { +if (local.has_nulls) { BitmapReader reader(arr.null_bitmap_data(), arr.offset(), arr.length()); for (int64_t i = 0; i < arr.length(); i++) { if (reader.IsSet()) { - local.MergeOne(values[i]); + local.MergeOne(arr.Value(i)); } reader.Next(); } } else { for (int64_t i = 0; i < arr.length(); i++) { -local.MergeOne(values[i]); +local.MergeOne(arr.Value(i)); } Review comment: I was thinking of that, but the current false_count imlementation triggers true_count, so we would't iterate over the values at least twice to get both true and false count - it might be more efficient though. I'm going to write a benchmark for this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kszucs commented on a change in pull request #7478: ARROW-9055: [C++] Add sum/mean/minmax kernels for Boolean type
kszucs commented on a change in pull request #7478: URL: https://github.com/apache/arrow/pull/7478#discussion_r445006111 ## File path: cpp/src/arrow/testing/gtest_util.h ## @@ -137,6 +137,8 @@ namespace arrow { // -- // Useful testing::Types declarations +typedef ::testing::Types BooleanArrowType; Review comment: Updating. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kszucs commented on a change in pull request #7478: ARROW-9055: [C++] Add sum/mean/minmax kernels for Boolean type
kszucs commented on a change in pull request #7478: URL: https://github.com/apache/arrow/pull/7478#discussion_r442864766 ## File path: cpp/src/arrow/compute/kernels/aggregate_test.cc ## @@ -399,15 +434,59 @@ class TestNumericMinMaxKernel : public ::testing::Test { }; template -class TestFloatingMinMaxKernel : public TestNumericMinMaxKernel {}; +class TestBooleanMinMaxKernel : public TestPrimitiveMinMaxKernel {}; + +template +class TestIntegerMinMaxKernel : public TestPrimitiveMinMaxKernel {}; + +template +class TestFloatingMinMaxKernel : public TestPrimitiveMinMaxKernel {}; + +TYPED_TEST_SUITE(TestBooleanMinMaxKernel, BooleanArrowType); +TYPED_TEST(TestBooleanMinMaxKernel, Basics) { + MinMaxOptions options; + std::vector chunked_input0 = {"[]", "[]"}; + std::vector chunked_input1 = {"[true, true, null]", "[true, null]"}; + std::vector chunked_input2 = {"[false, false, false]", "[false]"}; + std::vector chunked_input3 = {"[true, null]", "[null, false]"}; + + // SKIP nulls by default + this->AssertMinMaxIsNull("[]", options); + this->AssertMinMaxIsNull("[null, null, null]", options); + this->AssertMinMaxIs("[false, false, false]", false, false, options); + this->AssertMinMaxIs("[false, false, false, null]", false, false, options); + this->AssertMinMaxIs("[true, null, true, true]", true, true, options); + this->AssertMinMaxIs("[true, null, true, true]", true, true, options); + this->AssertMinMaxIs("[true, null, false, true]", false, true, options); + this->AssertMinMaxIsNull(chunked_input0, options); + this->AssertMinMaxIs(chunked_input1, true, true, options); + this->AssertMinMaxIs(chunked_input2, false, false, options); + this->AssertMinMaxIs(chunked_input3, false, true, options); + + options = MinMaxOptions(MinMaxOptions::OUTPUT_NULL); + this->AssertMinMaxIsNull("[]", options); + this->AssertMinMaxIsNull("[null, null, null]", options); + this->AssertMinMaxIsNull("[false, null, false]", options); + this->AssertMinMaxIsNull("[true, null]", options); + this->AssertMinMaxIs("[true, true, true]", true, true, options); + this->AssertMinMaxIs("[false, false]", false, false, options); + this->AssertMinMaxIs("[false, true]", false, true, options); + this->AssertMinMaxIsNull(chunked_input0, options); + this->AssertMinMaxIsNull(chunked_input1, options); + this->AssertMinMaxIs(chunked_input2, false, false, options); + this->AssertMinMaxIsNull(chunked_input3, options); +} -TYPED_TEST_SUITE(TestNumericMinMaxKernel, IntegralArrowTypes); -TYPED_TEST(TestNumericMinMaxKernel, Basics) { +TYPED_TEST_SUITE(TestIntegerMinMaxKernel, IntegralArrowTypes); +TYPED_TEST(TestIntegerMinMaxKernel, Basics) { MinMaxOptions options; std::vector chunked_input1 = {"[5, 1, 2, 3, 4]", "[9, 1, null, 3, 4]"}; std::vector chunked_input2 = {"[5, null, 2, 3, 4]", "[9, 1, 2, 3, 4]"}; std::vector chunked_input3 = {"[5, 1, 2, 3, null]", "[9, 1, null, 3, 4]"}; + // SKIP nulls by default + this->AssertMinMaxIsNull("[]", options); + this->AssertMinMaxIsNull("[null, null, null]", options); Review comment: So these were returning the types' extreme values. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kszucs commented on a change in pull request #7478: ARROW-9055: [C++] Add sum/mean/minmax kernels for Boolean type
kszucs commented on a change in pull request #7478: URL: https://github.com/apache/arrow/pull/7478#discussion_r442864422 ## File path: cpp/src/arrow/compute/kernels/aggregate_basic.cc ## @@ -339,13 +366,38 @@ std::unique_ptr MeanInit(KernelContext* ctx, const KernelInitArgs& template struct MinMaxState {}; +template +struct MinMaxState> { + using ThisType = MinMaxState; + using T = typename ArrowType::c_type; + + ThisType& operator+=(const ThisType& rhs) { +this->has_nulls |= rhs.has_nulls; +this->has_values |= rhs.has_values; Review comment: @wesm I had to introduce a flag to check whether any of the arrays have valid values, otherwise the default inverse extreme values were returned instead of null. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kszucs commented on a change in pull request #7478: ARROW-9055: [C++] Add sum/mean/minmax kernels for Boolean type
kszucs commented on a change in pull request #7478: URL: https://github.com/apache/arrow/pull/7478#discussion_r442863137 ## File path: cpp/src/arrow/compute/kernels/aggregate_basic.cc ## @@ -397,24 +452,26 @@ struct MinMaxImpl : public ScalarAggregator { ArrayType arr(batch[0].array()); -local.has_nulls = arr.null_count() > 0; +const auto null_count = arr.null_count(); +local.has_nulls = null_count > 0; +local.has_values = (arr.length() - null_count) > 0; + if (local.has_nulls && options.null_handling == MinMaxOptions::OUTPUT_NULL) { this->state = local; return; } -const auto values = arr.raw_values(); Review comment: BooleanArray has no `raw_values()` method. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org