[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters

2024-04-27 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
..


Patch Set 10:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/21131/10/be/src/util/coding-util-test.cc
File be/src/util/coding-util-test.cc:

http://gerrit.cloudera.org:8080/#/c/21131/10/be/src/util/coding-util-test.cc@94
PS10, Line 94:   TestUrl("/home/impala/directory/", 
"%2Fhome%2Fimpala%2Fdirectory%2F", false);
 :   TestUrl("/home/impala/directory/", 
"%2Fhome%2Fimpala%2Fdirectory%2F", true);
Please add some tests like these for newly added marks like '\n', '\r', '^', 
etc.

EDIT: please include all the escaped characters.


http://gerrit.cloudera.org:8080/#/c/21131/10/be/src/util/coding-util.cc
File be/src/util/coding-util.cc:

http://gerrit.cloudera.org:8080/#/c/21131/10/be/src/util/coding-util.cc@a40
PS10, Line 40:
 :
 :
 :
Can you keep this comment? We should still mention 
common/src/java/org/apache/hadoop/hive/common/FileUtils.java since the list 
would change there.

Also mention here that "Impala creates the partition directories using these 
encoded strings. So it's crucial to keep it consistent with Hive."


http://gerrit.cloudera.org:8080/#/c/21131/10/be/src/util/coding-util.cc@45
PS10, Line 45: static const std::unordered_set SpecialCharacters = {'"', 
'#', '\\', '*', '/',
 : ':', '=', '?', '%', '^', '[', ']', '{', '\x7F'};
We are still missing several characters here, e.g. single quote, '\n' and '\r'. 
I think we should use the same list as Hive's. The list in Hive consists of 
control characters, i.e. ASCII characters from 0 to 31 and 127 (0x7F), and 
marks like '"', '#', '%', '\'', '*', '/', ':', '=', '?', '\\', '{', '[', ']', 
'^'.

Code snipper of Hive:

  static BitSet charToEscape = new BitSet(128);
  static {
for (char c = 0; c < ' '; c++) {
  charToEscape.set(c);
}

/**
 * ASCII 01-1F are HTTP control characters that need to be escaped.
 * \u000A and \u000D are \n and \r, respectively.
 */
char[] clist = new char[] {'\u0001', '\u0002', '\u0003', '\u0004',
'\u0005', '\u0006', '\u0007', '\u0008', '\u0009', '\n', '\u000B',
'\u000C', '\r', '\u000E', '\u000F', '\u0010', '\u0011', '\u0012',
'\u0013', '\u0014', '\u0015', '\u0016', '\u0017', '\u0018', '\u0019',
'\u001A', '\u001B', '\u001C', '\u001D', '\u001E', '\u001F',
'"', '#', '%', '\'', '*', '/', ':', '=', '?', '\\', '\u007F', '{',
'[', ']', '^'};

for (char c : clist) {
  charToEscape.set(c);
}

Note that Hive adds some characters twice. ASCII code of ' ' is 32. So in the 
first loop, 0-31 are already added.
In Java, char is a "16-bit integer" that represents a single 16-bit Unicode 
character. So Hive code uses unicodes like '\u007F'. But in C++, such unicodes 
are represented by UTF-8 bytes. And char in C++ is just 8 bits. So in the 
original code "\u00FF" is actually two chars.


http://gerrit.cloudera.org:8080/#/c/21131/10/testdata/workloads/functional-query/queries/QueryTest/insert.test
File testdata/workloads/functional-query/queries/QueryTest/insert.test:

http://gerrit.cloudera.org:8080/#/c/21131/10/testdata/workloads/functional-query/queries/QueryTest/insert.test@452
PS10, Line 452: INSERT INTO TABLE insert_string_partitioned PARTITION(s2="_.~ 
+")
  : SELECT "value" FROM functional.alltypessmall LIMIT 1;
  :  RESULTS
  : s2=_.~ +: 1
  : 
  :  QUERY
  : # select with unencoded partition key
  : SELECT * FROM insert_string_partitioned;
  :  RESULTS
  : 'value','_.~ +'
  :  TYPES
  : string, string
e2e tests for ASCII characters can be added here, e.g.

INSERT INTO TABLE insert_string_partitioned PARTITION(s2="O'Reilly") values 
('0');

Or use a string with all escaped marks.



--
To view, visit http://gerrit.cloudera.org:8080/21131
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Gerrit-Change-Number: 21131
Gerrit-PatchSet: 10
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Sun, 28 Apr 2024 02:12:16 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10408: Support build using Apache components

2024-04-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18977 )

Change subject: IMPALA-10408: Support build using Apache components
..


Patch Set 17:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/16044/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/18977
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8730dd182b367c9daa94303937ad249db72b1399
Gerrit-Change-Number: 18977
Gerrit-PatchSet: 17
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Sat, 27 Apr 2024 23:51:19 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10408: Support build using Apache components

2024-04-27 Thread Anonymous Coward (Code Review)
Hello Quanlong Huang, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/18977

to look at the new patch set (#17).

Change subject: IMPALA-10408: Support build using Apache components
..

IMPALA-10408: Support build using Apache components

Apache Impala uses many apache components to build it.
This patch provides a way to support building Apache Impala using Apache 
components.

Change-Id: I8730dd182b367c9daa94303937ad249db72b1399
---
M README-build.md
M bin/bootstrap_toolchain.py
M bin/impala-config.sh
M buildall.sh
M fe/pom.xml
M 
fe/src/main/java/org/apache/impala/authorization/ranger/RangerAuthorizationChecker.java
M 
fe/src/main/java/org/apache/impala/authorization/ranger/RangerImpaladAuthorizationManager.java
M fe/src/main/java/org/apache/impala/util/IcebergMetadataScanner.java
M java/TableFlattener/pom.xml
M java/pom.xml
10 files changed, 179 insertions(+), 68 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/77/18977/17
--
To view, visit http://gerrit.cloudera.org:8080/18977
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8730dd182b367c9daa94303937ad249db72b1399
Gerrit-Change-Number: 18977
Gerrit-PatchSet: 17
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-10408: Support build using Apache components

2024-04-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18977 )

Change subject: IMPALA-10408: Support build using Apache components
..


Patch Set 16:

Build Failed

https://jenkins.impala.io/job/gerrit-code-review-checks/16043/ : Initial code 
review checks failed. See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/18977
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8730dd182b367c9daa94303937ad249db72b1399
Gerrit-Change-Number: 18977
Gerrit-PatchSet: 16
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Sat, 27 Apr 2024 23:06:18 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10408: Support build using Apache components

2024-04-27 Thread Anonymous Coward (Code Review)
Hello Quanlong Huang, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/18977

to look at the new patch set (#16).

Change subject: IMPALA-10408: Support build using Apache components
..

IMPALA-10408: Support build using Apache components

Apache Impala uses many apache components to build it.
This patch provides a way to support building Apache Impala using Apache 
components.

Change-Id: I8730dd182b367c9daa94303937ad249db72b1399
---
M README-build.md
M bin/bootstrap_toolchain.py
M bin/impala-config.sh
M buildall.sh
M fe/pom.xml
M 
fe/src/main/java/org/apache/impala/authorization/ranger/RangerAuthorizationChecker.java
M 
fe/src/main/java/org/apache/impala/authorization/ranger/RangerImpaladAuthorizationManager.java
M fe/src/main/java/org/apache/impala/util/IcebergMetadataScanner.java
M java/TableFlattener/pom.xml
M java/pom.xml
10 files changed, 179 insertions(+), 68 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/77/18977/16
--
To view, visit http://gerrit.cloudera.org:8080/18977
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8730dd182b367c9daa94303937ad249db72b1399
Gerrit-Change-Number: 18977
Gerrit-PatchSet: 16
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters

2024-04-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
..


Patch Set 10: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/21131
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Gerrit-Change-Number: 21131
Gerrit-PatchSet: 10
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Sat, 27 Apr 2024 09:13:04 +
Gerrit-HasComments: No