Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/21131 )
Change subject: IMPALA-11499: Refactor UrlEncode function to handle special characters ...................................................................... Patch Set 3: (9 comments) Thanks, Pranav! http://gerrit.cloudera.org:8080/#/c/21131/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21131/1//COMMIT_MSG@9 PS1, Line 9: In this commit, a comprehensive mapping of special characters, including You should explain what the error in the previous version was and what failures it lead to. http://gerrit.cloudera.org:8080/#/c/21131/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21131/3//COMMIT_MSG@9 PS3, Line 9: In this commit, a comprehensive mapping of special characters, including The commit message should explain what errors the code before the change caused, what was wrong with the code and how it is fixed by this change. http://gerrit.cloudera.org:8080/#/c/21131/3/be/src/util/coding-util.cc File be/src/util/coding-util.cc: http://gerrit.cloudera.org:8080/#/c/21131/3/be/src/util/coding-util.cc@48 PS3, Line 48: std::unordered_map<char, std::string> specialCharacterMap = { Should be const. http://gerrit.cloudera.org:8080/#/c/21131/3/be/src/util/coding-util.cc@71 PS3, Line 71: (*out).reserve(in_len * 3); // Maximum expansion per character is 3 I'm not sure why we have to reserve this. 'input_str' is modified (grown) in place. At the end of the function we could move 'input_str' into '*out' like this: (*out) = std::move(input_str); You should include <utility> for this overload of std::move. Or, alternatively, you could discard 'input_str' completely and start by copying the input into '*out' and use that variable in the string replacements. http://gerrit.cloudera.org:8080/#/c/21131/3/be/src/util/coding-util.cc@78 PS3, Line 78: for (const auto& entry : specialCharacterMap) { Won't this replace existing % signs twice? First on L75, then in the loop because '%' is present as a key in specialCharacterMap. We could either remove '%' from the map or 1. Replace the map with an ordered std::array<std::pair<char, std::string>>. 2. Put '%' first in that std::array. http://gerrit.cloudera.org:8080/#/c/21131/3/be/src/util/coding-util.cc@83 PS3, Line 83: if (!hive_compat) { Are these additional encodings the same that were done before this change? It doesn't seem like that. For example, I think "$" was encoded before but not now. If not, won't it cause problems? The commit message should at least mention how the behaviour changes. http://gerrit.cloudera.org:8080/#/c/21131/3/be/src/util/coding-util.cc@85 PS3, Line 85: boost::replace_all(input_str, "/", "%2F"); Forward slashes have already been encoded because '/' is contained in specialCharacterMap. If it are not to be encoded in Hive-compat mode, it should be removed from the map. http://gerrit.cloudera.org:8080/#/c/21131/3/be/src/util/coding-util.cc@90 PS3, Line 90: (*out) This only makes sense after assigning 'input_str' to 'out'. Also, could use out->shrink_to_fit(). http://gerrit.cloudera.org:8080/#/c/21131/3/testdata/workloads/functional-query/queries/QueryTest/unicode-column-name.test File testdata/workloads/functional-query/queries/QueryTest/unicode-column-name.test: http://gerrit.cloudera.org:8080/#/c/21131/3/testdata/workloads/functional-query/queries/QueryTest/unicode-column-name.test@298 PS3, Line 298: my_part_tbl A more descriptive name would be better, for example "unicode_partition_values". -- To view, visit http://gerrit.cloudera.org:8080/21131 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb Gerrit-Change-Number: 21131 Gerrit-PatchSet: 3 Gerrit-Owner: Anonymous Coward <[email protected]> Gerrit-Reviewer: Anonymous Coward <[email protected]> Gerrit-Reviewer: Daniel Becker <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Michael Smith <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: Zihao Ye <[email protected]> Gerrit-Comment-Date: Tue, 12 Mar 2024 16:53:05 +0000 Gerrit-HasComments: Yes
