Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17580 )

Change subject: IMPALA-2019(Part-2): Provide UTF-8 support in instr() and 
locate()
......................................................................


Patch Set 9:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17580/9/be/src/exprs/string-functions-ir.cc
File be/src/exprs/string-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/17580/9/be/src/exprs/string-functions-ir.cc@273
PS9, Line 273: if (BitUtil::IsUtf8StartByte(ptr[i])) ++cnt
> nit. Performance-wise, I wonder if we can skip all its #bytes when the firs
As we discussed in 
https://gerrit.cloudera.org/c/17580/5/be/src/exprs/string-functions-ir.cc#269
this can't handle malformed characters that have less bytes than expected. BTW, 
for performance, the current version can use SIMD to speed up, i.e. check first 
bit of all bytes, if all of them are 0, just return the length (we can do this 
in another JIRA).



--
To view, visit http://gerrit.cloudera.org:8080/17580
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic13c3d04649c1aea56c1aaa464799b5e4674f662
Gerrit-Change-Number: 17580
Gerrit-PatchSet: 9
Gerrit-Owner: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Qifan Chen <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Comment-Date: Thu, 15 Jul 2021 23:09:57 +0000
Gerrit-HasComments: Yes

Reply via email to