Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
......................................................................


Patch Set 3:

(9 comments)

Thanks, Pranav!

http://gerrit.cloudera.org:8080/#/c/21131/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/21131/1//COMMIT_MSG@9
PS1, Line 9: In this commit, a comprehensive mapping of special characters, 
including
You should explain what the error in the previous version was and what failures 
it lead to.


http://gerrit.cloudera.org:8080/#/c/21131/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/21131/3//COMMIT_MSG@9
PS3, Line 9: In this commit, a comprehensive mapping of special characters, 
including
The commit message should explain what errors the code before the change 
caused, what was wrong with the code and how it is fixed by this change.


http://gerrit.cloudera.org:8080/#/c/21131/3/be/src/util/coding-util.cc
File be/src/util/coding-util.cc:

http://gerrit.cloudera.org:8080/#/c/21131/3/be/src/util/coding-util.cc@48
PS3, Line 48: std::unordered_map<char, std::string> specialCharacterMap = {
Should be const.


http://gerrit.cloudera.org:8080/#/c/21131/3/be/src/util/coding-util.cc@71
PS3, Line 71:   (*out).reserve(in_len * 3);  // Maximum expansion per character 
is 3
I'm not sure why we have to reserve this. 'input_str' is modified (grown) in 
place. At the end of the function we could move 'input_str' into '*out' like 
this:

(*out) = std::move(input_str);

You should include <utility> for this overload of std::move.

Or, alternatively, you could discard 'input_str' completely and start by 
copying the input into '*out' and use that variable in the string replacements.


http://gerrit.cloudera.org:8080/#/c/21131/3/be/src/util/coding-util.cc@78
PS3, Line 78:   for (const auto& entry : specialCharacterMap) {
Won't this replace existing % signs twice? First on L75, then in the loop 
because '%' is present as a key in specialCharacterMap.

We could either remove '%' from the map or
 1. Replace the map with an ordered std::array<std::pair<char, std::string>>.
 2. Put '%' first in that std::array.


http://gerrit.cloudera.org:8080/#/c/21131/3/be/src/util/coding-util.cc@83
PS3, Line 83:   if (!hive_compat) {
Are these additional encodings the same that were done before this change? It 
doesn't seem like that. For example, I think "$" was encoded before but not now.
If not, won't it cause problems? The commit message should at least mention how 
the behaviour changes.


http://gerrit.cloudera.org:8080/#/c/21131/3/be/src/util/coding-util.cc@85
PS3, Line 85:     boost::replace_all(input_str, "/", "%2F");
Forward slashes have already been encoded because '/' is contained in 
specialCharacterMap. If it are not to be encoded in Hive-compat mode, it should 
be removed from the map.


http://gerrit.cloudera.org:8080/#/c/21131/3/be/src/util/coding-util.cc@90
PS3, Line 90: (*out)
This only makes sense after assigning 'input_str' to 'out'.
Also, could use out->shrink_to_fit().


http://gerrit.cloudera.org:8080/#/c/21131/3/testdata/workloads/functional-query/queries/QueryTest/unicode-column-name.test
File 
testdata/workloads/functional-query/queries/QueryTest/unicode-column-name.test:

http://gerrit.cloudera.org:8080/#/c/21131/3/testdata/workloads/functional-query/queries/QueryTest/unicode-column-name.test@298
PS3, Line 298: my_part_tbl
A more descriptive name would be better, for example "unicode_partition_values".



--
To view, visit http://gerrit.cloudera.org:8080/21131
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Gerrit-Change-Number: 21131
Gerrit-PatchSet: 3
Gerrit-Owner: Anonymous Coward <[email protected]>
Gerrit-Reviewer: Anonymous Coward <[email protected]>
Gerrit-Reviewer: Daniel Becker <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Michael Smith <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Zihao Ye <[email protected]>
Gerrit-Comment-Date: Tue, 12 Mar 2024 16:53:05 +0000
Gerrit-HasComments: Yes

Reply via email to