Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/21131 )
Change subject: IMPALA-11499: Refactor UrlEncode function to handle special characters ...................................................................... Patch Set 6: (4 comments) http://gerrit.cloudera.org:8080/#/c/21131/6//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21131/6//COMMIT_MSG@8 PS6, Line 8: characters nit: let's put the title in one line http://gerrit.cloudera.org:8080/#/c/21131/6//COMMIT_MSG@13 PS6, Line 13: that accurately maps special characters to their URL-encoded forms. It's unclear to me how the unicode characters are handled incorrectly. E.g. the one mentioned in the JIRA description is "?" which is encoded into 3 bytes in UTF-8: 0xe8 0xbf 0x90. Could you mention in the commit message how this example fails before and works now? FWIW, there are online tools to convert Unicode to UTF-8, e.g. https://onlinetools.com/unicode/convert-unicode-to-utf8 http://gerrit.cloudera.org:8080/#/c/21131/6/be/src/util/coding-util.cc File be/src/util/coding-util.cc: http://gerrit.cloudera.org:8080/#/c/21131/6/be/src/util/coding-util.cc@80 PS6, Line 80: std::isalnum(ch) It's an existing issue but I think we should cast 'ch' to unsigned char as memtioned in https://en.cppreference.com/w/cpp/string/byte/isalnum > the behavior of std::isalnum is undefined if the argument's value is neither > representable as unsigned char nor equal to EOF. To use these functions > safely with plain chars (or signed chars), the argument should first be > converted to unsigned char: > std::isalnum(static_cast<unsigned char>(ch)); http://gerrit.cloudera.org:8080/#/c/21131/6/be/src/util/coding-util.cc@85 PS6, Line 85: std:: nit: I think we can ignore explicitly using "std::" since std::uppercase is introduced at L37. For std::hex, it's introduced in "common/names.h": https://github.com/apache/impala/blob/b39cd79ae84c415e0aebec2c2b4d7690d2a0cc7a/be/src/common/names.h#L82 Maybe the same for "std::isalnum". -- To view, visit http://gerrit.cloudera.org:8080/21131 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb Gerrit-Change-Number: 21131 Gerrit-PatchSet: 6 Gerrit-Owner: Anonymous Coward <[email protected]> Gerrit-Reviewer: Anonymous Coward <[email protected]> Gerrit-Reviewer: Daniel Becker <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Michael Smith <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: Zihao Ye <[email protected]> Gerrit-Comment-Date: Fri, 26 Apr 2024 09:03:05 +0000 Gerrit-HasComments: Yes
