[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/21131 ) Change subject: IMPALA-11499: Refactor UrlEncode function to handle special characters .. Patch Set 10: (4 comments) http://gerrit.cloudera.org:8080/#/c/21131/10/be/src/util/coding-util-test.cc File be/src/util/coding-util-test.cc: http://gerrit.cloudera.org:8080/#/c/21131/10/be/src/util/coding-util-test.cc@94 PS10, Line 94: TestUrl("/home/impala/directory/", "%2Fhome%2Fimpala%2Fdirectory%2F", false); : TestUrl("/home/impala/directory/", "%2Fhome%2Fimpala%2Fdirectory%2F", true); Please add some tests like these for newly added marks like '\n', '\r', '^', etc. EDIT: please include all the escaped characters. http://gerrit.cloudera.org:8080/#/c/21131/10/be/src/util/coding-util.cc File be/src/util/coding-util.cc: http://gerrit.cloudera.org:8080/#/c/21131/10/be/src/util/coding-util.cc@a40 PS10, Line 40: : : : Can you keep this comment? We should still mention common/src/java/org/apache/hadoop/hive/common/FileUtils.java since the list would change there. Also mention here that "Impala creates the partition directories using these encoded strings. So it's crucial to keep it consistent with Hive." http://gerrit.cloudera.org:8080/#/c/21131/10/be/src/util/coding-util.cc@45 PS10, Line 45: static const std::unordered_set SpecialCharacters = {'"', '#', '\\', '*', '/', : ':', '=', '?', '%', '^', '[', ']', '{', '\x7F'}; We are still missing several characters here, e.g. single quote, '\n' and '\r'. I think we should use the same list as Hive's. The list in Hive consists of control characters, i.e. ASCII characters from 0 to 31 and 127 (0x7F), and marks like '"', '#', '%', '\'', '*', '/', ':', '=', '?', '\\', '{', '[', ']', '^'. Code snipper of Hive: static BitSet charToEscape = new BitSet(128); static { for (char c = 0; c < ' '; c++) { charToEscape.set(c); } /** * ASCII 01-1F are HTTP control characters that need to be escaped. * \u000A and \u000D are \n and \r, respectively. */ char[] clist = new char[] {'\u0001', '\u0002', '\u0003', '\u0004', '\u0005', '\u0006', '\u0007', '\u0008', '\u0009', '\n', '\u000B', '\u000C', '\r', '\u000E', '\u000F', '\u0010', '\u0011', '\u0012', '\u0013', '\u0014', '\u0015', '\u0016', '\u0017', '\u0018', '\u0019', '\u001A', '\u001B', '\u001C', '\u001D', '\u001E', '\u001F', '"', '#', '%', '\'', '*', '/', ':', '=', '?', '\\', '\u007F', '{', '[', ']', '^'}; for (char c : clist) { charToEscape.set(c); } Note that Hive adds some characters twice. ASCII code of ' ' is 32. So in the first loop, 0-31 are already added. In Java, char is a "16-bit integer" that represents a single 16-bit Unicode character. So Hive code uses unicodes like '\u007F'. But in C++, such unicodes are represented by UTF-8 bytes. And char in C++ is just 8 bits. So in the original code "\u00FF" is actually two chars. http://gerrit.cloudera.org:8080/#/c/21131/10/testdata/workloads/functional-query/queries/QueryTest/insert.test File testdata/workloads/functional-query/queries/QueryTest/insert.test: http://gerrit.cloudera.org:8080/#/c/21131/10/testdata/workloads/functional-query/queries/QueryTest/insert.test@452 PS10, Line 452: INSERT INTO TABLE insert_string_partitioned PARTITION(s2="_.~ +") : SELECT "value" FROM functional.alltypessmall LIMIT 1; : RESULTS : s2=_.~ +: 1 : : QUERY : # select with unencoded partition key : SELECT * FROM insert_string_partitioned; : RESULTS : 'value','_.~ +' : TYPES : string, string e2e tests for ASCII characters can be added here, e.g. INSERT INTO TABLE insert_string_partitioned PARTITION(s2="O'Reilly") values ('0'); Or use a string with all escaped marks. -- To view, visit http://gerrit.cloudera.org:8080/21131 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb Gerrit-Change-Number: 21131 Gerrit-PatchSet: 10 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Zihao Ye Gerrit-Comment-Date: Sun, 28 Apr 2024 02:12:16 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10408: Support build using Apache components
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18977 ) Change subject: IMPALA-10408: Support build using Apache components .. Patch Set 17: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/16044/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/18977 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8730dd182b367c9daa94303937ad249db72b1399 Gerrit-Change-Number: 18977 Gerrit-PatchSet: 17 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Sat, 27 Apr 2024 23:51:19 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10408: Support build using Apache components
Hello Quanlong Huang, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/18977 to look at the new patch set (#17). Change subject: IMPALA-10408: Support build using Apache components .. IMPALA-10408: Support build using Apache components Apache Impala uses many apache components to build it. This patch provides a way to support building Apache Impala using Apache components. Change-Id: I8730dd182b367c9daa94303937ad249db72b1399 --- M README-build.md M bin/bootstrap_toolchain.py M bin/impala-config.sh M buildall.sh M fe/pom.xml M fe/src/main/java/org/apache/impala/authorization/ranger/RangerAuthorizationChecker.java M fe/src/main/java/org/apache/impala/authorization/ranger/RangerImpaladAuthorizationManager.java M fe/src/main/java/org/apache/impala/util/IcebergMetadataScanner.java M java/TableFlattener/pom.xml M java/pom.xml 10 files changed, 179 insertions(+), 68 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/77/18977/17 -- To view, visit http://gerrit.cloudera.org:8080/18977 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8730dd182b367c9daa94303937ad249db72b1399 Gerrit-Change-Number: 18977 Gerrit-PatchSet: 17 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-10408: Support build using Apache components
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18977 ) Change subject: IMPALA-10408: Support build using Apache components .. Patch Set 16: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/16043/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/18977 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8730dd182b367c9daa94303937ad249db72b1399 Gerrit-Change-Number: 18977 Gerrit-PatchSet: 16 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Sat, 27 Apr 2024 23:06:18 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10408: Support build using Apache components
Hello Quanlong Huang, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/18977 to look at the new patch set (#16). Change subject: IMPALA-10408: Support build using Apache components .. IMPALA-10408: Support build using Apache components Apache Impala uses many apache components to build it. This patch provides a way to support building Apache Impala using Apache components. Change-Id: I8730dd182b367c9daa94303937ad249db72b1399 --- M README-build.md M bin/bootstrap_toolchain.py M bin/impala-config.sh M buildall.sh M fe/pom.xml M fe/src/main/java/org/apache/impala/authorization/ranger/RangerAuthorizationChecker.java M fe/src/main/java/org/apache/impala/authorization/ranger/RangerImpaladAuthorizationManager.java M fe/src/main/java/org/apache/impala/util/IcebergMetadataScanner.java M java/TableFlattener/pom.xml M java/pom.xml 10 files changed, 179 insertions(+), 68 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/77/18977/16 -- To view, visit http://gerrit.cloudera.org:8080/18977 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8730dd182b367c9daa94303937ad249db72b1399 Gerrit-Change-Number: 18977 Gerrit-PatchSet: 16 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21131 ) Change subject: IMPALA-11499: Refactor UrlEncode function to handle special characters .. Patch Set 10: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/21131 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb Gerrit-Change-Number: 21131 Gerrit-PatchSet: 10 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Zihao Ye Gerrit-Comment-Date: Sat, 27 Apr 2024 09:13:04 + Gerrit-HasComments: No