Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14353 )

Change subject: KUDU-1938 Make UTF-8 truncation faster pt 1
......................................................................


Patch Set 7:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/14353/7//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/14353/7//COMMIT_MSG@33
PS7, Line 33: After:
I presume these measurements were taken from char_util.cc compiled in RELEASE 
mode?


http://gerrit.cloudera.org:8080/#/c/14353/7/src/kudu/util/char_util-test.cc
File src/kudu/util/char_util-test.cc:

http://gerrit.cloudera.org:8080/#/c/14353/7/src/kudu/util/char_util-test.cc@37
PS7, Line 37:   const int kNumCycles = 100000;
This can be declared private.


http://gerrit.cloudera.org:8080/#/c/14353/7/src/kudu/util/char_util-test.cc@44
PS7, Line 44:   void ReadFile(const std::string& file_name, Slice* slice, 
std::string* string) {
You can reuse ReadFileToString (defined in env.h).


http://gerrit.cloudera.org:8080/#/c/14353/7/src/kudu/util/char_util-test.cc@68
PS7, Line 68:   ASSERT_EQ(10549, result.size());
If this fails, the delete won't be called. Instead of explicitly deleting, set 
up a unique_ptr around the memory you want freed right after the call to 
UTF8Truncate.

Same goes for the other tests here. Probably best to wrap the call to 
UTF8Truncate in something that calls it and also returns a unique_ptr.


http://gerrit.cloudera.org:8080/#/c/14353/7/src/kudu/util/char_util.cc
File src/kudu/util/char_util.cc:

http://gerrit.cloudera.org:8080/#/c/14353/7/src/kudu/util/char_util.cc@32
PS7, Line 32:   while (num_bytes < size) {
Should the two > conditions below be >=?

I'd also doc the net effect of the four conditions: if the next 8 bytes (8-byte 
aligned) are all ASCII characters, we can fast path them.


http://gerrit.cloudera.org:8080/#/c/14353/7/src/kudu/util/char_util.cc@33
PS7, Line 33:     if (num_bytes % 8 == 0
            :         && size - num_bytes > 8
            :         && max_utf8_length - num_utf8_chars > 8
            :         && (*(reinterpret_cast<const int64_t*>(str)) & 
0x8080808080808080) == 0) {
Nit: move && to the end of the previous line.


http://gerrit.cloudera.org:8080/#/c/14353/7/src/kudu/util/char_util.cc@43
PS7, Line 43:           num_bytes++;
Nit: add FALLTHROUGH_INTENDED to the end of each non-empty case where we fall 
through to the next one.



--
To view, visit http://gerrit.cloudera.org:8080/14353
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iebb98e18a3619029d9b0bc224c7dead89a3d7374
Gerrit-Change-Number: 14353
Gerrit-PatchSet: 7
Gerrit-Owner: Attila Bukor <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: Attila Bukor <[email protected]>
Gerrit-Reviewer: Grant Henke <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Sat, 05 Oct 2019 00:55:39 +0000
Gerrit-HasComments: Yes

Reply via email to