[GitHub] [arrow] kszucs commented on a change in pull request #7478: ARROW-9055: [C++] Add sum/mean/minmax kernels for Boolean type

2020-06-26 Thread GitBox


kszucs commented on a change in pull request #7478:
URL: https://github.com/apache/arrow/pull/7478#discussion_r446266184



##
File path: cpp/src/arrow/compute/kernels/aggregate_test.cc
##
@@ -399,15 +434,59 @@ class TestNumericMinMaxKernel : public ::testing::Test {
 };
 
 template 
-class TestFloatingMinMaxKernel : public TestNumericMinMaxKernel {};
+class TestBooleanMinMaxKernel : public TestPrimitiveMinMaxKernel {};
+
+template 
+class TestIntegerMinMaxKernel : public TestPrimitiveMinMaxKernel {};
+
+template 
+class TestFloatingMinMaxKernel : public TestPrimitiveMinMaxKernel 
{};
+
+TYPED_TEST_SUITE(TestBooleanMinMaxKernel, BooleanArrowType);

Review comment:
   Right.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kszucs commented on a change in pull request #7478: ARROW-9055: [C++] Add sum/mean/minmax kernels for Boolean type

2020-06-26 Thread GitBox


kszucs commented on a change in pull request #7478:
URL: https://github.com/apache/arrow/pull/7478#discussion_r446264863



##
File path: cpp/src/arrow/compute/kernels/aggregate_basic.cc
##
@@ -397,24 +452,26 @@ struct MinMaxImpl : public ScalarAggregator {
 
 ArrayType arr(batch[0].array());
 
-local.has_nulls = arr.null_count() > 0;
+const auto null_count = arr.null_count();
+local.has_nulls = null_count > 0;
+local.has_values = (arr.length() - null_count) > 0;
+
 if (local.has_nulls && options.null_handling == 
MinMaxOptions::OUTPUT_NULL) {
   this->state = local;
   return;
 }
 
-const auto values = arr.raw_values();
-if (arr.null_count() > 0) {
+if (local.has_nulls) {
   BitmapReader reader(arr.null_bitmap_data(), arr.offset(), arr.length());
   for (int64_t i = 0; i < arr.length(); i++) {
 if (reader.IsSet()) {
-  local.MergeOne(values[i]);
+  local.MergeOne(arr.Value(i));
 }
 reader.Next();
   }
 } else {
   for (int64_t i = 0; i < arr.length(); i++) {
-local.MergeOne(values[i]);
+local.MergeOne(arr.Value(i));
   }

Review comment:
   We could optimize it further by early returning on the first occurrence 
of true/false values. Going to create a jira once it's merged.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kszucs commented on a change in pull request #7478: ARROW-9055: [C++] Add sum/mean/minmax kernels for Boolean type

2020-06-26 Thread GitBox


kszucs commented on a change in pull request #7478:
URL: https://github.com/apache/arrow/pull/7478#discussion_r446219965



##
File path: cpp/src/arrow/compute/kernels/aggregate_test.cc
##
@@ -399,15 +434,59 @@ class TestNumericMinMaxKernel : public ::testing::Test {
 };
 
 template 
-class TestFloatingMinMaxKernel : public TestNumericMinMaxKernel {};
+class TestBooleanMinMaxKernel : public TestPrimitiveMinMaxKernel {};
+
+template 
+class TestIntegerMinMaxKernel : public TestPrimitiveMinMaxKernel {};
+
+template 
+class TestFloatingMinMaxKernel : public TestPrimitiveMinMaxKernel 
{};
+
+TYPED_TEST_SUITE(TestBooleanMinMaxKernel, BooleanArrowType);

Review comment:
   I chose it in order to reuse `AssertMinMaxIsNull` and `AssertMinMaxIs` 
methods. 
   @bkietz can I instantiate `TestPrimitiveMinMaxKernel` without TYPED_TEST 
macro?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kszucs commented on a change in pull request #7478: ARROW-9055: [C++] Add sum/mean/minmax kernels for Boolean type

2020-06-24 Thread GitBox


kszucs commented on a change in pull request #7478:
URL: https://github.com/apache/arrow/pull/7478#discussion_r445007543



##
File path: cpp/src/arrow/compute/kernels/aggregate_basic.cc
##
@@ -397,24 +452,26 @@ struct MinMaxImpl : public ScalarAggregator {
 
 ArrayType arr(batch[0].array());
 
-local.has_nulls = arr.null_count() > 0;
+const auto null_count = arr.null_count();
+local.has_nulls = null_count > 0;
+local.has_values = (arr.length() - null_count) > 0;
+
 if (local.has_nulls && options.null_handling == 
MinMaxOptions::OUTPUT_NULL) {
   this->state = local;
   return;
 }
 
-const auto values = arr.raw_values();
-if (arr.null_count() > 0) {
+if (local.has_nulls) {
   BitmapReader reader(arr.null_bitmap_data(), arr.offset(), arr.length());
   for (int64_t i = 0; i < arr.length(); i++) {
 if (reader.IsSet()) {
-  local.MergeOne(values[i]);
+  local.MergeOne(arr.Value(i));
 }
 reader.Next();
   }
 } else {
   for (int64_t i = 0; i < arr.length(); i++) {
-local.MergeOne(values[i]);
+local.MergeOne(arr.Value(i));
   }

Review comment:
   I was thinking of that, but the current false_count imlementation 
triggers true_count, so we would't iterate over the values at least twice to 
get both true and false count - it might be more efficient though.
   
   I'm going to write a benchmark for this.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kszucs commented on a change in pull request #7478: ARROW-9055: [C++] Add sum/mean/minmax kernels for Boolean type

2020-06-24 Thread GitBox


kszucs commented on a change in pull request #7478:
URL: https://github.com/apache/arrow/pull/7478#discussion_r445006111



##
File path: cpp/src/arrow/testing/gtest_util.h
##
@@ -137,6 +137,8 @@ namespace arrow {
 // --
 // Useful testing::Types declarations
 
+typedef ::testing::Types BooleanArrowType;

Review comment:
   Updating.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kszucs commented on a change in pull request #7478: ARROW-9055: [C++] Add sum/mean/minmax kernels for Boolean type

2020-06-19 Thread GitBox


kszucs commented on a change in pull request #7478:
URL: https://github.com/apache/arrow/pull/7478#discussion_r442864766



##
File path: cpp/src/arrow/compute/kernels/aggregate_test.cc
##
@@ -399,15 +434,59 @@ class TestNumericMinMaxKernel : public ::testing::Test {
 };
 
 template 
-class TestFloatingMinMaxKernel : public TestNumericMinMaxKernel {};
+class TestBooleanMinMaxKernel : public TestPrimitiveMinMaxKernel {};
+
+template 
+class TestIntegerMinMaxKernel : public TestPrimitiveMinMaxKernel {};
+
+template 
+class TestFloatingMinMaxKernel : public TestPrimitiveMinMaxKernel 
{};
+
+TYPED_TEST_SUITE(TestBooleanMinMaxKernel, BooleanArrowType);
+TYPED_TEST(TestBooleanMinMaxKernel, Basics) {
+  MinMaxOptions options;
+  std::vector chunked_input0 = {"[]", "[]"};
+  std::vector chunked_input1 = {"[true, true, null]", "[true, 
null]"};
+  std::vector chunked_input2 = {"[false, false, false]", 
"[false]"};
+  std::vector chunked_input3 = {"[true, null]", "[null, false]"};
+
+  // SKIP nulls by default
+  this->AssertMinMaxIsNull("[]", options);
+  this->AssertMinMaxIsNull("[null, null, null]", options);
+  this->AssertMinMaxIs("[false, false, false]", false, false, options);
+  this->AssertMinMaxIs("[false, false, false, null]", false, false, options);
+  this->AssertMinMaxIs("[true, null, true, true]", true, true, options);
+  this->AssertMinMaxIs("[true, null, true, true]", true, true, options);
+  this->AssertMinMaxIs("[true, null, false, true]", false, true, options);
+  this->AssertMinMaxIsNull(chunked_input0, options);
+  this->AssertMinMaxIs(chunked_input1, true, true, options);
+  this->AssertMinMaxIs(chunked_input2, false, false, options);
+  this->AssertMinMaxIs(chunked_input3, false, true, options);
+
+  options = MinMaxOptions(MinMaxOptions::OUTPUT_NULL);
+  this->AssertMinMaxIsNull("[]", options);
+  this->AssertMinMaxIsNull("[null, null, null]", options);
+  this->AssertMinMaxIsNull("[false, null, false]", options);
+  this->AssertMinMaxIsNull("[true, null]", options);
+  this->AssertMinMaxIs("[true, true, true]", true, true, options);
+  this->AssertMinMaxIs("[false, false]", false, false, options);
+  this->AssertMinMaxIs("[false, true]", false, true, options);
+  this->AssertMinMaxIsNull(chunked_input0, options);
+  this->AssertMinMaxIsNull(chunked_input1, options);
+  this->AssertMinMaxIs(chunked_input2, false, false, options);
+  this->AssertMinMaxIsNull(chunked_input3, options);
+}
 
-TYPED_TEST_SUITE(TestNumericMinMaxKernel, IntegralArrowTypes);
-TYPED_TEST(TestNumericMinMaxKernel, Basics) {
+TYPED_TEST_SUITE(TestIntegerMinMaxKernel, IntegralArrowTypes);
+TYPED_TEST(TestIntegerMinMaxKernel, Basics) {
   MinMaxOptions options;
   std::vector chunked_input1 = {"[5, 1, 2, 3, 4]", "[9, 1, null, 
3, 4]"};
   std::vector chunked_input2 = {"[5, null, 2, 3, 4]", "[9, 1, 2, 
3, 4]"};
   std::vector chunked_input3 = {"[5, 1, 2, 3, null]", "[9, 1, 
null, 3, 4]"};
 
+  // SKIP nulls by default
+  this->AssertMinMaxIsNull("[]", options);
+  this->AssertMinMaxIsNull("[null, null, null]", options);

Review comment:
   So these were returning the types' extreme values.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kszucs commented on a change in pull request #7478: ARROW-9055: [C++] Add sum/mean/minmax kernels for Boolean type

2020-06-19 Thread GitBox


kszucs commented on a change in pull request #7478:
URL: https://github.com/apache/arrow/pull/7478#discussion_r442864422



##
File path: cpp/src/arrow/compute/kernels/aggregate_basic.cc
##
@@ -339,13 +366,38 @@ std::unique_ptr MeanInit(KernelContext* ctx, 
const KernelInitArgs&
 template 
 struct MinMaxState {};
 
+template 
+struct MinMaxState> {
+  using ThisType = MinMaxState;
+  using T = typename ArrowType::c_type;
+
+  ThisType& operator+=(const ThisType& rhs) {
+this->has_nulls |= rhs.has_nulls;
+this->has_values |= rhs.has_values;

Review comment:
   @wesm I had to introduce a flag to check whether any of the arrays have 
valid values, otherwise the default inverse extreme values were returned 
instead of null.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kszucs commented on a change in pull request #7478: ARROW-9055: [C++] Add sum/mean/minmax kernels for Boolean type

2020-06-19 Thread GitBox


kszucs commented on a change in pull request #7478:
URL: https://github.com/apache/arrow/pull/7478#discussion_r442863137



##
File path: cpp/src/arrow/compute/kernels/aggregate_basic.cc
##
@@ -397,24 +452,26 @@ struct MinMaxImpl : public ScalarAggregator {
 
 ArrayType arr(batch[0].array());
 
-local.has_nulls = arr.null_count() > 0;
+const auto null_count = arr.null_count();
+local.has_nulls = null_count > 0;
+local.has_values = (arr.length() - null_count) > 0;
+
 if (local.has_nulls && options.null_handling == 
MinMaxOptions::OUTPUT_NULL) {
   this->state = local;
   return;
 }
 
-const auto values = arr.raw_values();

Review comment:
   BooleanArray has no `raw_values()` method.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org