[email protected] has posted comments on this change. ( http://gerrit.cloudera.org:8080/21131 )
Change subject: IMPALA-11499: Refactor UrlEncode function to handle special characters ...................................................................... Patch Set 3: (4 comments) > Uploaded patch set 3. http://gerrit.cloudera.org:8080/#/c/21131/1/be/src/util/coding-util.cc File be/src/util/coding-util.cc: http://gerrit.cloudera.org:8080/#/c/21131/1/be/src/util/coding-util.cc@47 PS1, Line 47: // Mapping of special characters to their URL-encoded forms > With this change it doesn't look like we need this list or ShouldNotEscape Done http://gerrit.cloudera.org:8080/#/c/21131/1/be/src/util/coding-util.cc@75 PS1, Line 75: boost::replace_all(input_str, "%", "%25"); > line too long (94 > 90) Done http://gerrit.cloudera.org:8080/#/c/21131/1/be/src/util/coding-util.cc@84 PS1, Line 84: // Encode forward slashes to prevent misinterpretation in paths or URLs > So this works because it only replaces specific chars - that won't appear i Yes, that's correct. The UrlEncode function in the provided code is designed to work with UTF-8 encoded strings, where each character can be represented by one to four bytes. The key is that it replaces specific characters with their URL-encoded forms, and these replacements are carefully chosen to avoid interfering with multi-byte characters.The replacements are safe because they don't disrupt the structure of multi-byte characters in UTF-8 encoding. The % sign followed by two hexadecimal digits, used in URL encoding, ensures that the replacements won't be confused with the actual encoding of multi-byte characters. On the other hand, using a generic isalnum check to determine whether a character should be replaced might not be suitable for UTF-8 encoded strings. It could inadvertently affect multi-byte characters, potentially leading to incorrect or corrupted results. http://gerrit.cloudera.org:8080/#/c/21131/1/be/src/util/coding-util.cc@95 PS1, Line 95: void UrlEncode(const vector<uint8_t>& in, string* out, bool hive_compat) { > We could return unused capacity with https://en.cppreference.com/w/cpp/stri Done -- To view, visit http://gerrit.cloudera.org:8080/21131 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb Gerrit-Change-Number: 21131 Gerrit-PatchSet: 3 Gerrit-Owner: Anonymous Coward <[email protected]> Gerrit-Reviewer: Anonymous Coward <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Michael Smith <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: Zihao Ye <[email protected]> Gerrit-Comment-Date: Tue, 12 Mar 2024 05:56:40 +0000 Gerrit-HasComments: Yes
