[email protected] has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
......................................................................


Patch Set 3:

(4 comments)

> Uploaded patch set 3.

http://gerrit.cloudera.org:8080/#/c/21131/1/be/src/util/coding-util.cc
File be/src/util/coding-util.cc:

http://gerrit.cloudera.org:8080/#/c/21131/1/be/src/util/coding-util.cc@47
PS1, Line 47: // Mapping of special characters to their URL-encoded forms
> With this change it doesn't look like we need this list or ShouldNotEscape
Done


http://gerrit.cloudera.org:8080/#/c/21131/1/be/src/util/coding-util.cc@75
PS1, Line 75:   boost::replace_all(input_str, "%", "%25");
> line too long (94 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/21131/1/be/src/util/coding-util.cc@84
PS1, Line 84:     // Encode forward slashes to prevent misinterpretation in 
paths or URLs
> So this works because it only replaces specific chars - that won't appear i
Yes, that's correct. The UrlEncode function in the provided code is designed to 
work with UTF-8 encoded strings, where each character can be represented by one 
to four bytes. The key is that it replaces specific characters with their 
URL-encoded forms, and these replacements are carefully chosen to avoid 
interfering with multi-byte characters.The replacements are safe because they 
don't disrupt the structure of multi-byte characters in UTF-8 encoding. The % 
sign followed by two hexadecimal digits, used in URL encoding, ensures that the 
replacements won't be confused with the actual encoding of multi-byte 
characters.

On the other hand, using a generic isalnum check to determine whether a 
character should be replaced might not be suitable for UTF-8 encoded strings. 
It could inadvertently affect multi-byte characters, potentially leading to 
incorrect or corrupted results.


http://gerrit.cloudera.org:8080/#/c/21131/1/be/src/util/coding-util.cc@95
PS1, Line 95: void UrlEncode(const vector<uint8_t>& in, string* out, bool 
hive_compat) {
> We could return unused capacity with https://en.cppreference.com/w/cpp/stri
Done



--
To view, visit http://gerrit.cloudera.org:8080/21131
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Gerrit-Change-Number: 21131
Gerrit-PatchSet: 3
Gerrit-Owner: Anonymous Coward <[email protected]>
Gerrit-Reviewer: Anonymous Coward <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Michael Smith <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Zihao Ye <[email protected]>
Gerrit-Comment-Date: Tue, 12 Mar 2024 05:56:40 +0000
Gerrit-HasComments: Yes

Reply via email to