Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/14353 )
Change subject: KUDU-1938 Make UTF-8 truncation faster pt 1 ...................................................................... Patch Set 7: (7 comments) http://gerrit.cloudera.org:8080/#/c/14353/7//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/14353/7//COMMIT_MSG@33 PS7, Line 33: After: I presume these measurements were taken from char_util.cc compiled in RELEASE mode? http://gerrit.cloudera.org:8080/#/c/14353/7/src/kudu/util/char_util-test.cc File src/kudu/util/char_util-test.cc: http://gerrit.cloudera.org:8080/#/c/14353/7/src/kudu/util/char_util-test.cc@37 PS7, Line 37: const int kNumCycles = 100000; This can be declared private. http://gerrit.cloudera.org:8080/#/c/14353/7/src/kudu/util/char_util-test.cc@44 PS7, Line 44: void ReadFile(const std::string& file_name, Slice* slice, std::string* string) { You can reuse ReadFileToString (defined in env.h). http://gerrit.cloudera.org:8080/#/c/14353/7/src/kudu/util/char_util-test.cc@68 PS7, Line 68: ASSERT_EQ(10549, result.size()); If this fails, the delete won't be called. Instead of explicitly deleting, set up a unique_ptr around the memory you want freed right after the call to UTF8Truncate. Same goes for the other tests here. Probably best to wrap the call to UTF8Truncate in something that calls it and also returns a unique_ptr. http://gerrit.cloudera.org:8080/#/c/14353/7/src/kudu/util/char_util.cc File src/kudu/util/char_util.cc: http://gerrit.cloudera.org:8080/#/c/14353/7/src/kudu/util/char_util.cc@32 PS7, Line 32: while (num_bytes < size) { Should the two > conditions below be >=? I'd also doc the net effect of the four conditions: if the next 8 bytes (8-byte aligned) are all ASCII characters, we can fast path them. http://gerrit.cloudera.org:8080/#/c/14353/7/src/kudu/util/char_util.cc@33 PS7, Line 33: if (num_bytes % 8 == 0 : && size - num_bytes > 8 : && max_utf8_length - num_utf8_chars > 8 : && (*(reinterpret_cast<const int64_t*>(str)) & 0x8080808080808080) == 0) { Nit: move && to the end of the previous line. http://gerrit.cloudera.org:8080/#/c/14353/7/src/kudu/util/char_util.cc@43 PS7, Line 43: num_bytes++; Nit: add FALLTHROUGH_INTENDED to the end of each non-empty case where we fall through to the next one. -- To view, visit http://gerrit.cloudera.org:8080/14353 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iebb98e18a3619029d9b0bc224c7dead89a3d7374 Gerrit-Change-Number: 14353 Gerrit-PatchSet: 7 Gerrit-Owner: Attila Bukor <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: Attila Bukor <[email protected]> Gerrit-Reviewer: Grant Henke <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Sat, 05 Oct 2019 00:55:39 +0000 Gerrit-HasComments: Yes
