Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17580 )

Change subject: IMPALA-2019(Part-2): Provide UTF-8 support in instr() and 
locate()
......................................................................


Patch Set 9:

The following loop can be further optimized because 2nd, 3rd or 4th byte in a 
UTF8 character starts with 0x10.

int find_one_CHAR_backwards(int pos) {
  int last_pos = pos;
  for (int i=0; i<4; i++) {
    if (BitUtil::IsUtf8StartByte(ptr[pos])) return pos;
    pos--;
    if (pos<0) break;
  }
  // non UTF8 character
  return last_pos;
}


--
To view, visit http://gerrit.cloudera.org:8080/17580
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic13c3d04649c1aea56c1aaa464799b5e4674f662
Gerrit-Change-Number: 17580
Gerrit-PatchSet: 9
Gerrit-Owner: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Qifan Chen <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Comment-Date: Thu, 15 Jul 2021 15:06:59 +0000
Gerrit-HasComments: No

Reply via email to