[GitHub] [arrow] wesm commented on a change in pull request #7410: ARROW-971: [C++][Compute] IsValid, IsNull kernels

2020-06-17 Thread GitBox


wesm commented on a change in pull request #7410:
URL: https://github.com/apache/arrow/pull/7410#discussion_r441588468



##
File path: cpp/src/arrow/compute/kernels/scalar_validity.cc
##
@@ -0,0 +1,107 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include "arrow/compute/kernels/common.h"
+
+#include "arrow/util/bit_util.h"
+#include "arrow/util/bitmap_ops.h"
+
+namespace arrow {
+
+using internal::CopyBitmap;
+using internal::InvertBitmap;
+
+namespace compute {
+namespace {
+
+struct IsValidOperator {
+  static void Call(KernelContext* ctx, const Scalar& in, Scalar* out) {
+checked_cast(out)->value = in.is_valid;
+  }
+
+  static void Call(KernelContext* ctx, const ArrayData& arr, ArrayData* out) {
+DCHECK_EQ(out->offset, 0);
+DCHECK_LE(out->length, arr.length);
+if (arr.buffers[0] != nullptr) {
+  out->buffers[1] = arr.offset == 0
+? arr.buffers[0]
+: SliceBuffer(arr.buffers[0], arr.offset / 8, 
arr.length / 8);
+  out->offset = arr.offset % 8;

Review comment:
   It's not worth thinking too hard right now because kernel pipelining 
(temporary elision) is not implemented yet so until that happens it would be 
just speculation. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on a change in pull request #7410: ARROW-971: [C++][Compute] IsValid, IsNull kernels

2020-06-16 Thread GitBox


wesm commented on a change in pull request #7410:
URL: https://github.com/apache/arrow/pull/7410#discussion_r441024436



##
File path: cpp/src/arrow/compute/kernels/scalar_validity.cc
##
@@ -0,0 +1,110 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include "arrow/compute/kernels/common.h"
+
+#include "arrow/util/bit_util.h"
+#include "arrow/util/bitmap_ops.h"
+
+namespace arrow {
+
+using internal::CopyBitmap;
+using internal::InvertBitmap;
+
+namespace compute {
+namespace {
+
+struct IsValidOperator {
+  static void Call(KernelContext* ctx, const Scalar& in, Scalar* out) {
+checked_cast(out)->value = in.is_valid;
+  }
+
+  static void Call(KernelContext* ctx, const ArrayData& arr, ArrayData* out) {
+if (arr.buffers[0] != nullptr && out->offset == arr.offset &&
+out->length == arr.length) {

Review comment:
   Ah yes, indeed zero copy is not possible here in general. What do you 
think about doing the zero copy when `arr.offset % 8 == 0`? Then you can slice 
the bitmap. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on a change in pull request #7410: ARROW-971: [C++][Compute] IsValid, IsNull kernels

2020-06-16 Thread GitBox


wesm commented on a change in pull request #7410:
URL: https://github.com/apache/arrow/pull/7410#discussion_r440952996



##
File path: cpp/src/arrow/testing/random.h
##
@@ -250,6 +250,21 @@ class ARROW_EXPORT RandomArrayGenerator {
int32_t min_length, int32_t 
max_length,
double null_probability = 0);
 
+  /// \brief Randomly generate an Array of the specified type, size, and 
null_probability.
+  ///
+  /// Generation parameters other than size and null_probability are 
determined based on
+  /// the type of Array to be generated.
+  /// If boolean the probabilities of true,false values are 0.25,0.75 
respectively.
+  /// If numeric min,max will be the least and greatest representable values.
+  /// If string min_length,max_length will be 0,sqrt(size) respectively.
+  ///
+  /// \param[in] type the type of Array to generate
+  /// \param[in] size the size of the Array to generate
+  /// \param[in] null_probability the probability of a slot being null
+  /// \return a generated Array
+  std::shared_ptr ArrayOf(std::shared_ptr type, int64_t size,
+ double null_probability);

Review comment:
   Perhaps as a follow up JIRA, can you introduce an `ArrayOfOptions` 
struct that allows some of the type-specific configurables to be set up 
en-masse and then passed to this function, such as
   
   * The boolean true probability
   * string min/max lengths

##
File path: cpp/src/arrow/compute/kernels/scalar_validity.cc
##
@@ -0,0 +1,110 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include "arrow/compute/kernels/common.h"
+
+#include "arrow/util/bit_util.h"
+#include "arrow/util/bitmap_ops.h"
+
+namespace arrow {
+
+using internal::CopyBitmap;
+using internal::InvertBitmap;
+
+namespace compute {
+namespace {
+
+struct IsValidOperator {
+  static void Call(KernelContext* ctx, const Scalar& in, Scalar* out) {
+checked_cast(out)->value = in.is_valid;
+  }
+
+  static void Call(KernelContext* ctx, const ArrayData& arr, ArrayData* out) {
+if (arr.buffers[0] != nullptr && out->offset == arr.offset &&
+out->length == arr.length) {
+  out->buffers[1] = arr.buffers[0];
+  return;
+}
+
+KERNEL_RETURN_IF_ERROR(ctx, 
ctx->AllocateBitmap(out->length).Value(>buffers[1]));
+
+if (arr.null_count == 0 || arr.buffers[0] == nullptr) {
+  BitUtil::SetBitsTo(out->buffers[1]->mutable_data(), out->offset, 
out->length, true);
+  return;
+}
+
+CopyBitmap(arr.buffers[0]->data(), arr.offset, arr.length,
+   out->buffers[1]->mutable_data(), out->offset);
+  }
+};
+
+struct IsNullOperator {
+  static void Call(KernelContext* ctx, const Scalar& in, Scalar* out) {
+checked_cast(out)->value = !in.is_valid;
+  }
+
+  static void Call(KernelContext* ctx, const ArrayData& arr, ArrayData* out) {
+if (arr.null_count == 0 || arr.buffers[0] == nullptr) {
+  BitUtil::SetBitsTo(out->buffers[1]->mutable_data(), out->offset, 
out->length,
+ false);
+  return;
+}
+
+InvertBitmap(arr.buffers[0]->data(), arr.offset, arr.length,
+ out->buffers[1]->mutable_data(), out->offset);
+  }
+};
+}  // namespace
+
+namespace codegen {
+namespace {
+
+using arrow::compute::codegen::SimpleUnary;

Review comment:
   This using not needed

##
File path: cpp/src/arrow/compute/kernels/scalar_validity.cc
##
@@ -0,0 +1,110 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or 

[GitHub] [arrow] wesm commented on a change in pull request #7410: ARROW-971: [C++][Compute] IsValid, IsNull kernels

2020-06-15 Thread GitBox


wesm commented on a change in pull request #7410:
URL: https://github.com/apache/arrow/pull/7410#discussion_r44024



##
File path: cpp/src/arrow/compute/kernels/scalar_validity_test.cc
##
@@ -0,0 +1,151 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include 
+
+#include "arrow/array.h"
+#include "arrow/compute/api.h"
+#include "arrow/compute/kernels/test_util.h"
+#include "arrow/testing/gtest_common.h"
+#include "arrow/testing/gtest_util.h"
+#include "arrow/testing/random.h"
+#include "arrow/type.h"
+#include "arrow/type_traits.h"
+#include "arrow/util/bitmap_reader.h"
+#include "arrow/util/checked_cast.h"
+
+namespace arrow {
+namespace compute {
+
+class TestValidityKernels : public ::testing::Test {
+ protected:
+  // XXX Since IsValid and IsNull don't touch any buffers but the null bitmap
+  // testing multiple types seems redundant.
+  using ArrowType = BooleanType;
+
+  using CType = typename ArrowType::c_type;
+
+  static std::shared_ptr type_singleton() {
+return TypeTraits::type_singleton();
+  }
+
+  void AssertUnary(Datum arg, Datum expected) {
+ASSERT_OK_AND_ASSIGN(auto actual, func_(arg, nullptr));
+ASSERT_EQ(actual.kind(), expected.kind());
+if (actual.kind() == Datum::ARRAY) {
+  ASSERT_OK(actual.make_array()->ValidateFull());
+  AssertArraysApproxEqual(*expected.make_array(), *actual.make_array());
+} else {
+  AssertScalarsEqual(*expected.scalar(), *actual.scalar());
+}
+  }
+
+  void AssertUnary(const std::string& arg_json, const std::string& 
expected_json) {
+AssertUnary(ArrayFromJSON(type_singleton(), arg_json),
+ArrayFromJSON(type_singleton(), expected_json));
+  }
+
+  using UnaryFunction = std::function(const Datum&, 
ExecContext*)>;
+  UnaryFunction func_;
+};
+
+TEST_F(TestValidityKernels, ArrayIsValid) {
+  func_ = arrow::compute::IsValid;
+
+  this->AssertUnary("[]", "[]");
+  this->AssertUnary("[null]", "[false]");
+  this->AssertUnary("[1]", "[true]");
+  this->AssertUnary("[null, 1, 0, null]", "[false, true, true, false]");
+}
+
+TEST_F(TestValidityKernels, ArrayIsValidBufferPassthruOptimization) {
+  Datum arg = ArrayFromJSON(boolean(), "[null, 1, 0, null]");
+  ASSERT_OK_AND_ASSIGN(auto validity, arrow::compute::IsValid(arg));
+  ASSERT_EQ(validity.array()->buffers[1], arg.array()->buffers[0]);
+}
+
+TEST_F(TestValidityKernels, ScalarIsValid) {
+  func_ = arrow::compute::IsValid;
+
+  AssertUnary(Datum(19.7), Datum(true));
+  AssertUnary(MakeNullScalar(float64()), Datum(false));
+}
+
+TEST_F(TestValidityKernels, ArrayIsNull) {
+  func_ = arrow::compute::IsNull;
+
+  this->AssertUnary("[]", "[]");
+  this->AssertUnary("[null]", "[true]");
+  this->AssertUnary("[1]", "[false]");
+  this->AssertUnary("[null, 1, 0, null]", "[true, false, false, true]");
+}
+
+TEST_F(TestValidityKernels, DISABLED_ScalarIsNull) {
+  func_ = arrow::compute::IsNull;
+
+  AssertUnary(Datum(19.7), Datum(false));
+  AssertUnary(MakeNullScalar(float64()), Datum(true));
+}
+
+class IsValidProperty : public ScalarFunctionPropertyMixin {
+ public:
+  std::shared_ptr GetFunction() override {
+return internal::checked_pointer_cast(
+*GetFunctionRegistry()->GetFunction("is_valid"));
+  }
+
+  Result> Contract(const ScalarVector& args,
+   const FunctionOptions*) override {
+return std::make_shared(args[0]->is_valid);
+  }
+};
+
+TEST_P(IsValidProperty, TestIsValidProperty) { Validate(); }
+
+INSTANTIATE_TEST_SUITE_P(IsValidPropertyTest, IsValidProperty,
+ ScalarFunctionPropertyTestParam::Values({
+ {0, 0.0},
+ {1, 0.0},
+ {2, 0.0},
+ {1024, 0.25},
+ }));
+
+class IsNullProperty : public ScalarFunctionPropertyMixin {
+ public:
+  std::shared_ptr GetFunction() override {
+return internal::checked_pointer_cast(
+*GetFunctionRegistry()->GetFunction("is_null"));
+  }
+
+  Result> Contract(const ScalarVector& args,
+   const FunctionOptions*) override {
+return std::make_shared(!args[0]->is_valid);
+  }
+};

[GitHub] [arrow] wesm commented on a change in pull request #7410: ARROW-971: [C++][Compute] IsValid, IsNull kernels

2020-06-15 Thread GitBox


wesm commented on a change in pull request #7410:
URL: https://github.com/apache/arrow/pull/7410#discussion_r440225706



##
File path: cpp/src/arrow/compute/kernels/scalar_validity_test.cc
##
@@ -0,0 +1,151 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include 
+
+#include "arrow/array.h"
+#include "arrow/compute/api.h"
+#include "arrow/compute/kernels/test_util.h"
+#include "arrow/testing/gtest_common.h"
+#include "arrow/testing/gtest_util.h"
+#include "arrow/testing/random.h"
+#include "arrow/type.h"
+#include "arrow/type_traits.h"
+#include "arrow/util/bitmap_reader.h"
+#include "arrow/util/checked_cast.h"
+
+namespace arrow {
+namespace compute {
+
+class TestValidityKernels : public ::testing::Test {
+ protected:
+  // XXX Since IsValid and IsNull don't touch any buffers but the null bitmap
+  // testing multiple types seems redundant.
+  using ArrowType = BooleanType;
+
+  using CType = typename ArrowType::c_type;
+
+  static std::shared_ptr type_singleton() {
+return TypeTraits::type_singleton();
+  }
+
+  void AssertUnary(Datum arg, Datum expected) {
+ASSERT_OK_AND_ASSIGN(auto actual, func_(arg, nullptr));
+ASSERT_EQ(actual.kind(), expected.kind());
+if (actual.kind() == Datum::ARRAY) {
+  ASSERT_OK(actual.make_array()->ValidateFull());
+  AssertArraysApproxEqual(*expected.make_array(), *actual.make_array());
+} else {
+  AssertScalarsEqual(*expected.scalar(), *actual.scalar());
+}
+  }
+
+  void AssertUnary(const std::string& arg_json, const std::string& 
expected_json) {
+AssertUnary(ArrayFromJSON(type_singleton(), arg_json),
+ArrayFromJSON(type_singleton(), expected_json));
+  }
+
+  using UnaryFunction = std::function(const Datum&, 
ExecContext*)>;
+  UnaryFunction func_;
+};
+
+TEST_F(TestValidityKernels, ArrayIsValid) {
+  func_ = arrow::compute::IsValid;
+
+  this->AssertUnary("[]", "[]");
+  this->AssertUnary("[null]", "[false]");
+  this->AssertUnary("[1]", "[true]");
+  this->AssertUnary("[null, 1, 0, null]", "[false, true, true, false]");
+}
+
+TEST_F(TestValidityKernels, ArrayIsValidBufferPassthruOptimization) {
+  Datum arg = ArrayFromJSON(boolean(), "[null, 1, 0, null]");
+  ASSERT_OK_AND_ASSIGN(auto validity, arrow::compute::IsValid(arg));
+  ASSERT_EQ(validity.array()->buffers[1], arg.array()->buffers[0]);
+}
+
+TEST_F(TestValidityKernels, ScalarIsValid) {
+  func_ = arrow::compute::IsValid;
+
+  AssertUnary(Datum(19.7), Datum(true));
+  AssertUnary(MakeNullScalar(float64()), Datum(false));
+}
+
+TEST_F(TestValidityKernels, ArrayIsNull) {
+  func_ = arrow::compute::IsNull;
+
+  this->AssertUnary("[]", "[]");
+  this->AssertUnary("[null]", "[true]");
+  this->AssertUnary("[1]", "[false]");
+  this->AssertUnary("[null, 1, 0, null]", "[true, false, false, true]");
+}
+
+TEST_F(TestValidityKernels, DISABLED_ScalarIsNull) {
+  func_ = arrow::compute::IsNull;
+
+  AssertUnary(Datum(19.7), Datum(false));
+  AssertUnary(MakeNullScalar(float64()), Datum(true));
+}
+
+class IsValidProperty : public ScalarFunctionPropertyMixin {
+ public:
+  std::shared_ptr GetFunction() override {
+return internal::checked_pointer_cast(
+*GetFunctionRegistry()->GetFunction("is_valid"));
+  }
+
+  Result> Contract(const ScalarVector& args,
+   const FunctionOptions*) override {
+return std::make_shared(args[0]->is_valid);
+  }
+};
+
+TEST_P(IsValidProperty, TestIsValidProperty) { Validate(); }
+
+INSTANTIATE_TEST_SUITE_P(IsValidPropertyTest, IsValidProperty,
+ ScalarFunctionPropertyTestParam::Values({
+ {0, 0.0},
+ {1, 0.0},
+ {2, 0.0},
+ {1024, 0.25},
+ }));
+
+class IsNullProperty : public ScalarFunctionPropertyMixin {
+ public:
+  std::shared_ptr GetFunction() override {
+return internal::checked_pointer_cast(
+*GetFunctionRegistry()->GetFunction("is_null"));
+  }
+
+  Result> Contract(const ScalarVector& args,
+   const FunctionOptions*) override {
+return std::make_shared(!args[0]->is_valid);
+  }
+};

[GitHub] [arrow] wesm commented on a change in pull request #7410: ARROW-971: [C++][Compute] IsValid, IsNull kernels

2020-06-13 Thread GitBox


wesm commented on a change in pull request #7410:
URL: https://github.com/apache/arrow/pull/7410#discussion_r439739768



##
File path: cpp/src/arrow/compute/kernels/scalar_validity.cc
##
@@ -0,0 +1,97 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include "arrow/compute/kernels/common.h"
+
+#include "arrow/util/bit_util.h"
+#include "arrow/util/bitmap_ops.h"
+
+namespace arrow {
+namespace compute {
+
+struct IsValid {
+  static void Call(KernelContext* ctx, const Scalar& in, Scalar* out) {
+checked_cast(out)->value = in.is_valid;
+  }
+
+  static void Call(KernelContext* ctx, const ArrayData& arr, ArrayData* out) {
+if (arr.buffers[0] != nullptr && out->offset == arr.offset &&
+out->length == arr.length) {
+  out->buffers[1] = arr.buffers[0];
+  return;
+}

Review comment:
   If you want to do a zero-copy optimization, you need to configure the 
kernel to use  `MemAllocation::COMPUTED_NO_PREALLOCATE`. This also makes it so 
you don't have to check the offset and length. You will however have to 
allocate memory when the zero copy path is not possible





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on a change in pull request #7410: ARROW-971: [C++][Compute] IsValid, IsNull kernels

2020-06-12 Thread GitBox


wesm commented on a change in pull request #7410:
URL: https://github.com/apache/arrow/pull/7410#discussion_r439704017



##
File path: cpp/src/arrow/compute/kernels/scalar_validity_test.cc
##
@@ -0,0 +1,151 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include 
+
+#include "arrow/array.h"
+#include "arrow/compute/api.h"
+#include "arrow/compute/kernels/test_util.h"
+#include "arrow/testing/gtest_common.h"
+#include "arrow/testing/gtest_util.h"
+#include "arrow/testing/random.h"
+#include "arrow/type.h"
+#include "arrow/type_traits.h"
+#include "arrow/util/bitmap_reader.h"
+#include "arrow/util/checked_cast.h"
+
+namespace arrow {
+namespace compute {
+
+class TestValidityKernels : public ::testing::Test {
+ protected:
+  // XXX Since IsValid and IsNull don't touch any buffers but the null bitmap
+  // testing multiple types seems redundant.
+  using ArrowType = BooleanType;
+
+  using CType = typename ArrowType::c_type;
+
+  static std::shared_ptr type_singleton() {
+return TypeTraits::type_singleton();
+  }
+
+  void AssertUnary(Datum arg, Datum expected) {
+ASSERT_OK_AND_ASSIGN(auto actual, func_(arg, nullptr));
+ASSERT_EQ(actual.kind(), expected.kind());
+if (actual.kind() == Datum::ARRAY) {
+  ASSERT_OK(actual.make_array()->ValidateFull());
+  AssertArraysApproxEqual(*expected.make_array(), *actual.make_array());
+} else {
+  AssertScalarsEqual(*expected.scalar(), *actual.scalar());
+}
+  }
+
+  void AssertUnary(const std::string& arg_json, const std::string& 
expected_json) {
+AssertUnary(ArrayFromJSON(type_singleton(), arg_json),
+ArrayFromJSON(type_singleton(), expected_json));
+  }
+
+  using UnaryFunction = std::function(const Datum&, 
ExecContext*)>;
+  UnaryFunction func_;
+};
+
+TEST_F(TestValidityKernels, ArrayIsValid) {
+  func_ = arrow::compute::IsValid;
+
+  this->AssertUnary("[]", "[]");
+  this->AssertUnary("[null]", "[false]");
+  this->AssertUnary("[1]", "[true]");
+  this->AssertUnary("[null, 1, 0, null]", "[false, true, true, false]");
+}
+
+TEST_F(TestValidityKernels, ArrayIsValidBufferPassthruOptimization) {
+  Datum arg = ArrayFromJSON(boolean(), "[null, 1, 0, null]");
+  ASSERT_OK_AND_ASSIGN(auto validity, arrow::compute::IsValid(arg));
+  ASSERT_EQ(validity.array()->buffers[1], arg.array()->buffers[0]);
+}
+
+TEST_F(TestValidityKernels, ScalarIsValid) {
+  func_ = arrow::compute::IsValid;
+
+  AssertUnary(Datum(19.7), Datum(true));
+  AssertUnary(MakeNullScalar(float64()), Datum(false));
+}
+
+TEST_F(TestValidityKernels, ArrayIsNull) {
+  func_ = arrow::compute::IsNull;
+
+  this->AssertUnary("[]", "[]");
+  this->AssertUnary("[null]", "[true]");
+  this->AssertUnary("[1]", "[false]");
+  this->AssertUnary("[null, 1, 0, null]", "[true, false, false, true]");
+}
+
+TEST_F(TestValidityKernels, DISABLED_ScalarIsNull) {
+  func_ = arrow::compute::IsNull;
+
+  AssertUnary(Datum(19.7), Datum(false));
+  AssertUnary(MakeNullScalar(float64()), Datum(true));
+}
+
+class IsValidProperty : public ScalarFunctionPropertyMixin {
+ public:
+  std::shared_ptr GetFunction() override {
+return internal::checked_pointer_cast(
+*GetFunctionRegistry()->GetFunction("is_valid"));
+  }
+
+  Result> Contract(const ScalarVector& args,
+   const FunctionOptions*) override {
+return std::make_shared(args[0]->is_valid);
+  }
+};
+
+TEST_P(IsValidProperty, TestIsValidProperty) { Validate(); }
+
+INSTANTIATE_TEST_SUITE_P(IsValidPropertyTest, IsValidProperty,
+ ScalarFunctionPropertyTestParam::Values({
+ {0, 0.0},
+ {1, 0.0},
+ {2, 0.0},
+ {1024, 0.25},
+ }));
+
+class IsNullProperty : public ScalarFunctionPropertyMixin {
+ public:
+  std::shared_ptr GetFunction() override {
+return internal::checked_pointer_cast(
+*GetFunctionRegistry()->GetFunction("is_null"));
+  }
+
+  Result> Contract(const ScalarVector& args,
+   const FunctionOptions*) override {
+return std::make_shared(!args[0]->is_valid);
+  }
+};

[GitHub] [arrow] wesm commented on a change in pull request #7410: ARROW-971: [C++][Compute] IsValid, IsNull kernels

2020-06-12 Thread GitBox


wesm commented on a change in pull request #7410:
URL: https://github.com/apache/arrow/pull/7410#discussion_r439703255



##
File path: cpp/src/arrow/testing/random.h
##
@@ -250,6 +250,9 @@ class ARROW_EXPORT RandomArrayGenerator {
int32_t min_length, int32_t 
max_length,
double null_probability = 0);
 
+  std::shared_ptr Of(std::shared_ptr type, int64_t size,

Review comment:
   docstring

##
File path: cpp/src/arrow/array/builder_base.cc
##
@@ -111,4 +115,43 @@ void ArrayBuilder::UnsafeSetNull(int64_t length) {
   null_bitmap_builder_.UnsafeAppend(length, false);
 }
 
+struct ArrayBuilderAppendScalarImpl {
+  template ::BuilderType>
+  B& builder() {
+return *internal::checked_cast(builder_);
+  }
+
+  template 
+  auto Visit(const S& scalar) -> decltype(builder().Append(scalar.value)) {
+return builder().Append(scalar.value);
+  }
+
+  template 
+  enable_if_base_binary Visit(const S& scalar) {
+return builder().Append(scalar.value->data(), scalar.value->size());
+  }
+
+  Status Visit(const Scalar& scalar) {
+return Status::NotImplemented("Appending scalars to builders of type ", 
*scalar.type);
+  }
+
+  Status Finish() && { return VisitScalarInline(scalar_, this); }
+  ArrayBuilder* builder_;
+  const Scalar& scalar_;
+};

Review comment:
   If this is just to enable the test harness created below I'm not sure 
it's worth it

##
File path: cpp/src/arrow/compute/kernels/scalar_validity_test.cc
##
@@ -0,0 +1,151 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include 
+
+#include "arrow/array.h"
+#include "arrow/compute/api.h"
+#include "arrow/compute/kernels/test_util.h"
+#include "arrow/testing/gtest_common.h"
+#include "arrow/testing/gtest_util.h"
+#include "arrow/testing/random.h"
+#include "arrow/type.h"
+#include "arrow/type_traits.h"
+#include "arrow/util/bitmap_reader.h"
+#include "arrow/util/checked_cast.h"
+
+namespace arrow {
+namespace compute {
+
+class TestValidityKernels : public ::testing::Test {
+ protected:
+  // XXX Since IsValid and IsNull don't touch any buffers but the null bitmap
+  // testing multiple types seems redundant.
+  using ArrowType = BooleanType;
+
+  using CType = typename ArrowType::c_type;
+
+  static std::shared_ptr type_singleton() {
+return TypeTraits::type_singleton();
+  }
+
+  void AssertUnary(Datum arg, Datum expected) {
+ASSERT_OK_AND_ASSIGN(auto actual, func_(arg, nullptr));
+ASSERT_EQ(actual.kind(), expected.kind());
+if (actual.kind() == Datum::ARRAY) {
+  ASSERT_OK(actual.make_array()->ValidateFull());
+  AssertArraysApproxEqual(*expected.make_array(), *actual.make_array());
+} else {
+  AssertScalarsEqual(*expected.scalar(), *actual.scalar());
+}
+  }
+
+  void AssertUnary(const std::string& arg_json, const std::string& 
expected_json) {
+AssertUnary(ArrayFromJSON(type_singleton(), arg_json),
+ArrayFromJSON(type_singleton(), expected_json));
+  }
+
+  using UnaryFunction = std::function(const Datum&, 
ExecContext*)>;
+  UnaryFunction func_;

Review comment:
   Is this meaningfully different from
   
   
https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/kernels/test_util.cc#L33?

##
File path: cpp/src/arrow/compute/kernels/scalar_validity_test.cc
##
@@ -0,0 +1,151 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and 

[GitHub] [arrow] wesm commented on a change in pull request #7410: ARROW-971: [C++][Compute] IsValid, IsNull kernels

2020-06-12 Thread GitBox


wesm commented on a change in pull request #7410:
URL: https://github.com/apache/arrow/pull/7410#discussion_r439703006



##
File path: cpp/src/arrow/compute/kernels/codegen_internal.h
##
@@ -181,8 +180,7 @@ struct GetOutputType> {
 };
 
 template 
-struct GetOutputType<
-Type, enable_if_t::value>> {
+struct GetOutputType::value>> {
   using T = std::string;
 };

Review comment:
   Fixed. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on a change in pull request #7410: ARROW-971: [C++][Compute] IsValid, IsNull kernels

2020-06-11 Thread GitBox


wesm commented on a change in pull request #7410:
URL: https://github.com/apache/arrow/pull/7410#discussion_r439108275



##
File path: cpp/src/arrow/compute/kernels/codegen_internal.h
##
@@ -181,8 +180,7 @@ struct GetOutputType> {
 };
 
 template 
-struct GetOutputType<
-Type, enable_if_t::value>> {
+struct GetOutputType::value>> {
   using T = std::string;
 };

Review comment:
   What is going on with all the code reformatting in this patch?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org