Qifan Chen has posted comments on this change. (
http://gerrit.cloudera.org:8080/17580 )
Change subject: IMPALA-2019(Part-2): Provide UTF-8 support in instr() and
locate()
......................................................................
Patch Set 9:
The following loop can be further optimized because 2nd, 3rd or 4th byte in a
UTF8 character starts with 0x10.
int find_one_CHAR_backwards(int pos) {
int last_pos = pos;
for (int i=0; i<4; i++) {
if (BitUtil::IsUtf8StartByte(ptr[pos])) return pos;
pos--;
if (pos<0) break;
}
// non UTF8 character
return last_pos;
}
--
To view, visit http://gerrit.cloudera.org:8080/17580
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic13c3d04649c1aea56c1aaa464799b5e4674f662
Gerrit-Change-Number: 17580
Gerrit-PatchSet: 9
Gerrit-Owner: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Qifan Chen <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Comment-Date: Thu, 15 Jul 2021 15:06:59 +0000
Gerrit-HasComments: No