Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/13928 )
Change subject: KUDU-1938 Add non-copy setters to partial row pt 3 ...................................................................... Patch Set 3: (2 comments) http://gerrit.cloudera.org:8080/#/c/13928/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/13928/3//COMMIT_MSG@14 PS3, Line 14: to already be truncated (which it is in Impala's case) and only check is this a safe assumption? last I was aware, Impala's treatment of "string" is actually not UTF8, so their CHAR(8) is 8 bytes, not 8 unicode characters. Based on the rest of this commit message it sounds like we treat CHAR(8) as 8 unicode characters, which might be more than 8 bytes http://gerrit.cloudera.org:8080/#/c/13928/3//COMMIT_MSG@17 PS3, Line 17: to avoid having to count each character manually. is the unicode character counting not already fast-pathed for the ASCII subset of utf8? it seems like that should be a pretty easy optimization. It's still O(n) but probably can be several bytes per cycle (eg load 8 butes and & with 0x8080808080808080 to check for high bits) -- To view, visit http://gerrit.cloudera.org:8080/13928 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1f2aba098d649eb94e0314f6606cc33600e8d766 Gerrit-Change-Number: 13928 Gerrit-PatchSet: 3 Gerrit-Owner: Attila Bukor <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: Attila Bukor <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Todd Lipcon <[email protected]> Gerrit-Comment-Date: Fri, 26 Jul 2019 22:52:42 +0000 Gerrit-HasComments: Yes
