[GitHub] [arrow] wesm commented on a change in pull request #7449: ARROW-9133: [C++] Add utf8_upper and utf8_lower

2020-06-25 Thread GitBox


wesm commented on a change in pull request #7449:
URL: https://github.com/apache/arrow/pull/7449#discussion_r445574693



##
File path: cpp/src/arrow/compute/kernels/scalar_string_test.cc
##
@@ -68,6 +76,64 @@ TYPED_TEST(TestStringKernels, AsciiLower) {
this->string_type(), "[\"aaazzæÆ&\", null, \"\", \"bbb\"]");
 }
 
+TEST(TestStringKernels, Utf8Upper32bitGrowth) {

Review comment:
   I think it's fine to have this test but it definitely should be marked 
as LARGE_MEMORY_TEST





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on a change in pull request #7449: ARROW-9133: [C++] Add utf8_upper and utf8_lower

2020-06-16 Thread GitBox


wesm commented on a change in pull request #7449:
URL: https://github.com/apache/arrow/pull/7449#discussion_r441146931



##
File path: cpp/src/arrow/compute/kernels/scalar_string.cc
##
@@ -30,6 +31,21 @@ namespace internal {
 
 namespace {
 
+// Code units in the range [a-z] can only be an encoding of an ascii
+// character/codepoint, not the 2nd, 3rd or 4th code unit (byte) of an 
different
+// codepoint. This guaranteed by non-overlap design of the unicode standard. 
(see
+// section 2.5 of Unicode Standard Core Specification v13.0)
+
+uint8_t ascii_tolower(uint8_t utf8_code_unit) {

Review comment:
   I think you will want this to be `static inline` (not sure all compilers 
will inline it otherwise)

##
File path: cpp/src/arrow/compute/kernels/scalar_string.cc
##
@@ -159,11 +282,23 @@ void MakeUnaryStringBatchKernel(std::string name, 
ArrayKernelExec exec,
   DCHECK_OK(registry->AddFunction(std::move(func)));
 }
 
+template  typename Transformer>

Review comment:
   I think some compilers demand that "class" be used for the last 
"typename"





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org