Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17580 )

Change subject: IMPALA-2019(Part-2): Provide UTF-8 support in instr() and 
locate()
......................................................................


Patch Set 10: Code-Review+2

Sure, resolving illegal UTF8 can be postponed to IMPALA-10761, where I hope 
that we can resolve it better.

My concern is the complexity added to deal with such characters for the entire 
UTF8 feature. On paper, such complexity can be reduced/managed as follows.

1. A common place to check validity of UTF8 characters and raise error if 
necessary;
2. New UTF8 functions that only deal with UTF8 strings.

It is possible that we can do this in FE where a non-trusted source S as an 
input to a UTF8 func F() is translated to F(CHECK(S)), where CHECK() implements 
step 1).


--
To view, visit http://gerrit.cloudera.org:8080/17580
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic13c3d04649c1aea56c1aaa464799b5e4674f662
Gerrit-Change-Number: 17580
Gerrit-PatchSet: 10
Gerrit-Owner: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Qifan Chen <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Comment-Date: Sun, 18 Jul 2021 12:27:38 +0000
Gerrit-HasComments: No

Reply via email to