Attila Bukor has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13928 )

Change subject: KUDU-1938 Add non-copy setters to partial row pt 3
......................................................................


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/13928/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/13928/3//COMMIT_MSG@14
PS3, Line 14: to already be truncated (which it is in Impala's case) and only 
check
> More context on bytes vs. characters (from part 1 of this series): https://
We do treat it as UTF8 characters, I didn't know Impala treated them 
differently. For some additional context, I did some testing, Oracle treats 
lengths based as either bytes or characters based on the NLS_LENGTH_SEMANTICS 
parameter, but it can be explicitly set in table creation time (e.g. CHAR(10 
CHAR) or CHAR(10 BYTE)). MySQL is supposed to treat them as characters in 
recent versions, but in my version at least I couldn't make it work. Postgres 
treats them as characters.

Anyway, if Impala treats them as bytes, it won't fail here as the length will 
be always <= than the allowed max length.


http://gerrit.cloudera.org:8080/#/c/13928/3//COMMIT_MSG@17
PS3, Line 17: to avoid having to count each character manually.
> is the unicode character counting not already fast-pathed for the ASCII sub
We could do this, but & 0x8080808080808080 would only tell us whether it has 
UTF8 characters and it would still need to check the whole string. In this case 
I decided to sacrifice absolute correctness for performance, let me know if I 
should still check properly.

The & 0x8080808080808080 is a good idea to optimize where I actually count the 
characters and thinking about it gave me an idea to optimize it a bit better 
even, I'll do some perf tests and submit another patch with the optimizations 
later if it makes sense.



--
To view, visit http://gerrit.cloudera.org:8080/13928
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1f2aba098d649eb94e0314f6606cc33600e8d766
Gerrit-Change-Number: 13928
Gerrit-PatchSet: 3
Gerrit-Owner: Attila Bukor <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: Attila Bukor <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon <[email protected]>
Gerrit-Comment-Date: Mon, 29 Jul 2019 16:02:14 +0000
Gerrit-HasComments: Yes

Reply via email to