Attila Bukor has posted comments on this change. ( http://gerrit.cloudera.org:8080/14354 )
Change subject: KUDU-1938 Make UTF-8 truncation faster pt 2 ...................................................................... Patch Set 12: (3 comments) http://gerrit.cloudera.org:8080/#/c/14354/11//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/14354/11//COMMIT_MSG@22 PS11, Line 22: [ RUN ] CharUtilTest.StressTestUtf8 : [ OK ] CharUtilTest.StressTestUtf8 (10599 ms) > Looks like this got slower? Why? I think it's due to the fact that this way we can only fast runs of 16 ASCII characters instead of 8 characters. I was wondering if I should make ASCII optimization optional via a flag. What do you think? http://gerrit.cloudera.org:8080/#/c/14354/11/src/kudu/util/char_util-test.cc File src/kudu/util/char_util-test.cc: http://gerrit.cloudera.org:8080/#/c/14354/11/src/kudu/util/char_util-test.cc@97 PS11, Line 97: Slice data; : : data = "ááááááááááááááááááááááááááááááááaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"; > Combine? I'm not sure what you mean. http://gerrit.cloudera.org:8080/#/c/14354/9/src/kudu/util/char_util.cc File src/kudu/util/char_util.cc: PS9: > Think you missed this. I'm still thinking how to do this exactly. This will be used by the clients instead of the server-side and AFAIK we don't have any CPU restrictions there - if someone wants to write to Kudu through the C++ client from a Raspberry Pi or some embedded system, we should support it as best as we reasonably can. So I was thinking maybe putting the whole thing behind #if __SSE4_1__ so that we don't even compile the part if the target -march doesn't support it and then do a has_sse41 check just in case and if it doesn't, throw an exception? Also, if this is not part of the Kudu server targets/binaries and it can be easily disabled with setting the appropriate -march, do we still need to limit ourselves to SSE4.2 or can we use AVX* behind #if? -- To view, visit http://gerrit.cloudera.org:8080/14354 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9a491157dd5c8b4815030bbda921a0afc0bafd28 Gerrit-Change-Number: 14354 Gerrit-PatchSet: 12 Gerrit-Owner: Attila Bukor <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: Attila Bukor <[email protected]> Gerrit-Reviewer: Grant Henke <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Tue, 12 Nov 2019 08:31:03 +0000 Gerrit-HasComments: Yes
