Attila Bukor has posted comments on this change. ( http://gerrit.cloudera.org:8080/13928 )
Change subject: KUDU-1938 Add non-copy setters to partial row pt 3 ...................................................................... Patch Set 3: (2 comments) http://gerrit.cloudera.org:8080/#/c/13928/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/13928/3//COMMIT_MSG@14 PS3, Line 14: to already be truncated (which it is in Impala's case) and only check > More context on bytes vs. characters (from part 1 of this series): https:// We do treat it as UTF8 characters, I didn't know Impala treated them differently. For some additional context, I did some testing, Oracle treats lengths based as either bytes or characters based on the NLS_LENGTH_SEMANTICS parameter, but it can be explicitly set in table creation time (e.g. CHAR(10 CHAR) or CHAR(10 BYTE)). MySQL is supposed to treat them as characters in recent versions, but in my version at least I couldn't make it work. Postgres treats them as characters. Anyway, if Impala treats them as bytes, it won't fail here as the length will be always <= than the allowed max length. http://gerrit.cloudera.org:8080/#/c/13928/3//COMMIT_MSG@17 PS3, Line 17: to avoid having to count each character manually. > is the unicode character counting not already fast-pathed for the ASCII sub We could do this, but & 0x8080808080808080 would only tell us whether it has UTF8 characters and it would still need to check the whole string. In this case I decided to sacrifice absolute correctness for performance, let me know if I should still check properly. The & 0x8080808080808080 is a good idea to optimize where I actually count the characters and thinking about it gave me an idea to optimize it a bit better even, I'll do some perf tests and submit another patch with the optimizations later if it makes sense. -- To view, visit http://gerrit.cloudera.org:8080/13928 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1f2aba098d649eb94e0314f6606cc33600e8d766 Gerrit-Change-Number: 13928 Gerrit-PatchSet: 3 Gerrit-Owner: Attila Bukor <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: Attila Bukor <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Todd Lipcon <[email protected]> Gerrit-Comment-Date: Mon, 29 Jul 2019 16:02:14 +0000 Gerrit-HasComments: Yes
