Hello Tidy Bot, Kudu Jenkins, Adar Dembo, Grant Henke,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/14353
to look at the new patch set (#9).
Change subject: KUDU-1938 Make UTF-8 truncation faster pt 1
......................................................................
KUDU-1938 Make UTF-8 truncation faster pt 1
This commit adds a fast path for ASCII strings where if the MSB is a
0-bit on each byte in a chunk of string it advances the counter and the
iterator by the chunk size. This way if a chunk contains only ASCII
characters there's no need to count each individual character.
Thanks to Todd Lipcon for the initial idea and Zoltan Chovan and Istvan
Farmosi for the brainstorming and the help in figuring out how this
should be done.
Before:
[ RUN ] CharUtilTest.StressTestUtf8
[ OK ] CharUtilTest.StressTestUtf8 (6698 ms)
[ RUN ] CharUtilTest.StressTestAscii
[ OK ] CharUtilTest.StressTestAscii (6161 ms)
After:
[ RUN ] CharUtilTest.StressTestUtf8
[ OK ] CharUtilTest.StressTestUtf8 (7746 ms)
[ RUN ] CharUtilTest.StressTestAscii
[ OK ] CharUtilTest.StressTestAscii (1028 ms)
Change-Id: Iebb98e18a3619029d9b0bc224c7dead89a3d7374
---
M src/kudu/util/CMakeLists.txt
A src/kudu/util/char_util-test.cc
M src/kudu/util/char_util.cc
A src/kudu/util/testdata/char_truncate_ascii.txt
A src/kudu/util/testdata/char_truncate_utf8.txt
5 files changed, 420 insertions(+), 11 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/53/14353/9
--
To view, visit http://gerrit.cloudera.org:8080/14353
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iebb98e18a3619029d9b0bc224c7dead89a3d7374
Gerrit-Change-Number: 14353
Gerrit-PatchSet: 9
Gerrit-Owner: Attila Bukor <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: Attila Bukor <[email protected]>
Gerrit-Reviewer: Grant Henke <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)