Attila Bukor has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/14353 )
Change subject: KUDU-1938 Make UTF-8 truncation faster pt 1 ...................................................................... KUDU-1938 Make UTF-8 truncation faster pt 1 This commit adds a fast path for ASCII strings where if the MSB is a 0-bit on each byte in a chunk of string it advances the counter and the iterator by the chunk size. This way if a chunk contains only ASCII characters there's no need to count each individual character. Thanks to Todd Lipcon for the initial idea and Zoltan Chovan and Istvan Farmosi for the brainstorming and the help in figuring out how this should be done. Before: [ RUN ] CharUtilTest.StressTestUtf8 [ OK ] CharUtilTest.StressTestUtf8 (6698 ms) [ RUN ] CharUtilTest.StressTestAscii [ OK ] CharUtilTest.StressTestAscii (6161 ms) After: [ RUN ] CharUtilTest.StressTestUtf8 [ OK ] CharUtilTest.StressTestUtf8 (7746 ms) [ RUN ] CharUtilTest.StressTestAscii [ OK ] CharUtilTest.StressTestAscii (1028 ms) Change-Id: Iebb98e18a3619029d9b0bc224c7dead89a3d7374 Reviewed-on: http://gerrit.cloudera.org:8080/14353 Reviewed-by: Adar Dembo <[email protected]> Tested-by: Kudu Jenkins --- M src/kudu/util/CMakeLists.txt A src/kudu/util/char_util-test.cc M src/kudu/util/char_util.cc A src/kudu/util/testdata/char_truncate_ascii.txt A src/kudu/util/testdata/char_truncate_utf8.txt 5 files changed, 421 insertions(+), 11 deletions(-) Approvals: Adar Dembo: Looks good to me, approved Kudu Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/14353 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Iebb98e18a3619029d9b0bc224c7dead89a3d7374 Gerrit-Change-Number: 14353 Gerrit-PatchSet: 12 Gerrit-Owner: Attila Bukor <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: Attila Bukor <[email protected]> Gerrit-Reviewer: Grant Henke <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241)
