Attila Bukor has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14354 )

Change subject: KUDU-1938 Make UTF-8 truncation faster pt 2
......................................................................


Patch Set 12:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/14354/11//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/14354/11//COMMIT_MSG@22
PS11, Line 22: [ RUN      ] CharUtilTest.StressTestUtf8
             : [       OK ] CharUtilTest.StressTestUtf8 (10599 ms)
> Looks like this got slower? Why?
I think it's due to the fact that this way we can only fast runs of 16 ASCII 
characters instead of 8 characters. I was wondering if I should make ASCII 
optimization optional via a flag. What do you think?


http://gerrit.cloudera.org:8080/#/c/14354/11/src/kudu/util/char_util-test.cc
File src/kudu/util/char_util-test.cc:

http://gerrit.cloudera.org:8080/#/c/14354/11/src/kudu/util/char_util-test.cc@97
PS11, Line 97:   Slice data;
             :
             :   data = 
"ááááááááááááááááááááááááááááááááaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa";
> Combine?
I'm not sure what you mean.


http://gerrit.cloudera.org:8080/#/c/14354/9/src/kudu/util/char_util.cc
File src/kudu/util/char_util.cc:

PS9:
> Think you missed this.
I'm still thinking how to do this exactly. This will be used by the clients 
instead of the server-side and AFAIK we don't have any CPU restrictions there - 
if someone wants to write to Kudu through the C++ client from a Raspberry Pi or 
some embedded system, we should support it as best as we reasonably can.

So I was thinking maybe putting the whole thing behind #if __SSE4_1__ so that 
we don't even compile the part if the target -march doesn't support it and then 
do a has_sse41 check just in case and if it doesn't, throw an exception?

Also, if this is not part of the Kudu server targets/binaries and it can be 
easily disabled with setting the appropriate -march, do we still need to limit 
ourselves to SSE4.2 or can we use AVX* behind #if?



--
To view, visit http://gerrit.cloudera.org:8080/14354
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9a491157dd5c8b4815030bbda921a0afc0bafd28
Gerrit-Change-Number: 14354
Gerrit-PatchSet: 12
Gerrit-Owner: Attila Bukor <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: Attila Bukor <[email protected]>
Gerrit-Reviewer: Grant Henke <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Tue, 12 Nov 2019 08:31:03 +0000
Gerrit-HasComments: Yes

Reply via email to