Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/21131 )
Change subject: IMPALA-11499: Refactor UrlEncode function to handle special characters ...................................................................... Patch Set 15: (2 comments) Thanks Pranav! In the meantime I discovered something else, but that shouldn't be too hard to fix. http://gerrit.cloudera.org:8080/#/c/21131/15/be/src/util/coding-util.cc File be/src/util/coding-util.cc: http://gerrit.cloudera.org:8080/#/c/21131/15/be/src/util/coding-util.cc@66 PS15, Line 66: ss << '%' << static_cast<uint32_t>(ch); I played with it a bit and it doesn't work with for example '\t' and '\n' (for example insert into my_part_tbl partition(p='a\tt') values (0); ). Let's take '\t'. Its value is 9, which only has a single hexadecimal digit. Now we'll escape it as "%9" but Hive has "%09". To match Hive's behaviour, do this: 1. Add this include in the appropriate include group: #include <iomanip> 2. Add "<< setfill('0')" after "hex" on L58. 3. Add "<< setw(2)" after inserting the % sign on L66. This will pad hex values so that they are always at least 2 chars wide. Note that setw is reset after insertion, so it has to be within the loop. This also reveals a test deficiency: we should ideally test every character that is in 'SpecialCharacters'. I don't know if we can insert arbitrary byte values in Impala strings, maybe Quanlong can help there. But even if that's not possible, we should cover the ones for which an escape sequence exists (i.e. '\t', '\n', '\v' etc.). http://gerrit.cloudera.org:8080/#/c/21131/15/be/src/util/coding-util.cc@66 PS15, Line 66: ch I think it would be good if we first cast it to unsigned char because of what happens if we encounter a character > 127. This is possible if the input contains one, and we are in non-Hive-compat mode. If char is a signed type (that is implementation dependent), then the value > 127 will be interpreted as a negative number. When we convert it to uint32_t, it will be sign-extended. For example, instead of "%FF" we will have "%FFFFFFFF". If we convert it first to unsigned char, it will not be sign-extended. -- To view, visit http://gerrit.cloudera.org:8080/21131 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb Gerrit-Change-Number: 21131 Gerrit-PatchSet: 15 Gerrit-Owner: Anonymous Coward <[email protected]> Gerrit-Reviewer: Anonymous Coward <[email protected]> Gerrit-Reviewer: Daniel Becker <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Michael Smith <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: Zihao Ye <[email protected]> Gerrit-Comment-Date: Tue, 30 Apr 2024 14:01:31 +0000 Gerrit-HasComments: Yes
