[Impala-ASF-CR] IMPALA-13085: Add warning and NULL out DECIMAL values in Iceberg metadata tables

2024-05-28 Thread Daniel Becker (Code Review)
Daniel Becker has uploaded a new patch set (#5). ( 
http://gerrit.cloudera.org:8080/21429 )

Change subject: IMPALA-13085: Add warning and NULL out DECIMAL values in 
Iceberg metadata tables
..

IMPALA-13085: Add warning and NULL out DECIMAL values in Iceberg metadata tables

DECIMAL values are not supported in Iceberg metadata tables and Impala
runs on a DCHECK and crashes if it encounters one.

Until this issue is properly fixed (see IMPALA-13080), this commit
introduces a temporary solution: DECIMAL values coming from Iceberg
metadata tables are NULLed out and a warning is issued.

Testing:
 - added a DECIMAL column to the 'iceberg_metadata_alltypes' test table,
   so querying the `files` metadata table will include a DECIMAL in the
   'readable_metrics' struct.

Change-Id: I0c8791805bc4fa2112e092e65366ca2815f3fa22
---
M be/src/exec/iceberg-metadata/iceberg-row-reader.cc
M be/src/exec/iceberg-metadata/iceberg-row-reader.h
M testdata/datasets/functional/functional_schema_template.sql
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test
4 files changed, 31 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/29/21429/5
--
To view, visit http://gerrit.cloudera.org:8080/21429
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0c8791805bc4fa2112e092e65366ca2815f3fa22
Gerrit-Change-Number: 21429
Gerrit-PatchSet: 5
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Peter Rozsa 


[Impala-ASF-CR] IMPALA-13091: query test.test iceberg.TestIcebergV2Table.test metadata tables fails on an expected constant

2024-05-21 Thread Daniel Becker (Code Review)
Daniel Becker has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/21440 )

Change subject: IMPALA-13091: 
query_test.test_iceberg.TestIcebergV2Table.test_metadata_tables fails on an 
expected constant
..

IMPALA-13091: query_test.test_iceberg.TestIcebergV2Table.test_metadata_tables 
fails on an expected constant

IMPALA-13079 added a test in iceberg-metadata-tables.test that included
assertions about values that can change across builds, e.g. file sizes,
which caused test failures.

This commit fixes it by doing two things:
1. narrowing down the result set of the query to the column that the
   test is really about - this removes some of the problematic values
2. using regexes for the remaining problematic values.

Change-Id: Ic056079eed87a68afa95cd111ce2037314cd9620
Reviewed-on: http://gerrit.cloudera.org:8080/21440
Tested-by: Impala Public Jenkins 
Reviewed-by: Riza Suminto 
---
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test
1 file changed, 3 insertions(+), 3 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Riza Suminto: Looks good to me, approved

--
To view, visit http://gerrit.cloudera.org:8080/21440
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ic056079eed87a68afa95cd111ce2037314cd9620
Gerrit-Change-Number: 21440
Gerrit-PatchSet: 2
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Peter Rozsa 
Gerrit-Reviewer: Riza Suminto 


[Impala-ASF-CR] IMPALA-13091: query test.test iceberg.TestIcebergV2Table.test metadata tables fails on an expected constant

2024-05-17 Thread Daniel Becker (Code Review)
Daniel Becker has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/21440


Change subject: IMPALA-13091: 
query_test.test_iceberg.TestIcebergV2Table.test_metadata_tables fails on an 
expected constant
..

IMPALA-13091: query_test.test_iceberg.TestIcebergV2Table.test_metadata_tables 
fails on an expected constant

IMPALA-13079 added a test in iceberg-metadata-tables.test that included
assertions about values that can change across builds, e.g. file sizes,
which caused test failures.

This commit fixes it by doing two things:
1. narrowing down the result set of the query to the column that the
   test is really about - this removes some of the problematic values
2. using regexes for the remaining problematic values.

Change-Id: Ic056079eed87a68afa95cd111ce2037314cd9620
---
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test
1 file changed, 3 insertions(+), 3 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/40/21440/1
--
To view, visit http://gerrit.cloudera.org:8080/21440
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ic056079eed87a68afa95cd111ce2037314cd9620
Gerrit-Change-Number: 21440
Gerrit-PatchSet: 1
Gerrit-Owner: Daniel Becker 


[Impala-ASF-CR] IMPALA-13085: Add warning and NULL out DECIMAL values in Iceberg metadata tables

2024-05-16 Thread Daniel Becker (Code Review)
Daniel Becker has uploaded a new patch set (#3). ( 
http://gerrit.cloudera.org:8080/21429 )

Change subject: IMPALA-13085: Add warning and NULL out DECIMAL values in 
Iceberg metadata tables
..

IMPALA-13085: Add warning and NULL out DECIMAL values in Iceberg metadata tables

DECIMAL values are not supported in Iceberg metadata tables and Impala
runs on a DCHECK and crashes if it encounters one.

Until this issue is properly fixed (see IMPALA-13080), this commit
introduces a temporary solution: DECIMAL values coming from Iceberg
metadata tables are NULLed out and a warning is issued.

Testing:
 - added a DECIMAL column to the 'iceberg_metadata_alltypes' test table,
   so querying the `files` metadata table will include a DECIMAL in the
   'readable_metrics' struct.

Change-Id: I0c8791805bc4fa2112e092e65366ca2815f3fa22
---
M be/src/exec/iceberg-metadata/iceberg-row-reader.cc
M be/src/exec/iceberg-metadata/iceberg-row-reader.h
M testdata/datasets/functional/functional_schema_template.sql
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test
4 files changed, 31 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/29/21429/3
--
To view, visit http://gerrit.cloudera.org:8080/21429
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0c8791805bc4fa2112e092e65366ca2815f3fa22
Gerrit-Change-Number: 21429
Gerrit-PatchSet: 3
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Peter Rozsa 


[Impala-ASF-CR] IMPALA-13085: Add warning and NULL out DECIMAL values in Iceberg metadata tables

2024-05-16 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21429 )

Change subject: IMPALA-13085: Add warning and NULL out DECIMAL values in 
Iceberg metadata tables
..


Patch Set 3:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/21429/1/be/src/exec/iceberg-metadata/iceberg-row-reader.cc
File be/src/exec/iceberg-metadata/iceberg-row-reader.cc:

http://gerrit.cloudera.org:8080/#/c/21429/1/be/src/exec/iceberg-metadata/iceberg-row-reader.cc@125
PS1, Line 125:
> represented as NULL, or displayed as NULL. NULLed out sounds a bit unusual.
Done


http://gerrit.cloudera.org:8080/#/c/21429/1/be/src/exec/iceberg-metadata/iceberg-row-reader.cc@125
PS1, Line 125:   RETURN_IF_ERROR(WriteDecimalSlot(slot_desc, tuple, state));
> The warning string could be extracted as a constant.
I made it a constexpr char array. I don't think it would make sense to extract 
it out of this function, it is only used here and makes the function more 
readable.


http://gerrit.cloudera.org:8080/#/c/21429/1/be/src/exec/iceberg-metadata/iceberg-row-reader.cc@126
PS1, Line 126:   break;
> With this approach, the warning will be posted as many times the query enco
I introduced a member that tracks whether the warning has already been emitted. 
Therefore one warning will be emitted for each scanner that encounters a 
DECIMAL from a metadata table.

Also changed the type of the error from TErrorCode::GENERAL to 
TErrorCode::NOT_IMPLEMENTED_ERROR so these errors will be aggregated and 
displayed only once even if there are multiple scanners reading DECIMALs from 
metadata tables.



--
To view, visit http://gerrit.cloudera.org:8080/21429
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0c8791805bc4fa2112e092e65366ca2815f3fa22
Gerrit-Change-Number: 21429
Gerrit-PatchSet: 3
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Peter Rozsa 
Gerrit-Comment-Date: Thu, 16 May 2024 11:01:54 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-13085: Add warning and NULL out DECIMAL values in Iceberg metadata tables

2024-05-15 Thread Daniel Becker (Code Review)
Daniel Becker has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/21429


Change subject: IMPALA-13085: Add warning and NULL out DECIMAL values in 
Iceberg metadata tables
..

IMPALA-13085: Add warning and NULL out DECIMAL values in Iceberg metadata tables

DECIMAL values are not supported in Iceberg metadata tables and Impala
runs on a DCHECK and crashes if it encounters one.

Until this issue is properly fixed (see IMPALA-13080), this commit
introduces a temporary solution: DECIMAL values coming from Iceberg
metadata tables are NULLed out and a warning is issued.

Testing:
 - added a DECIMAL column to the 'iceberg_metadata_alltypes' test table,
   so querying the `files` metadata table will include a DECIMAL in the
   'readable_metrics' struct.

Change-Id: I0c8791805bc4fa2112e092e65366ca2815f3fa22
---
M be/src/exec/iceberg-metadata/iceberg-row-reader.cc
M testdata/datasets/functional/functional_schema_template.sql
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test
3 files changed, 13 insertions(+), 4 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/29/21429/1
--
To view, visit http://gerrit.cloudera.org:8080/21429
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I0c8791805bc4fa2112e092e65366ca2815f3fa22
Gerrit-Change-Number: 21429
Gerrit-PatchSet: 1
Gerrit-Owner: Daniel Becker 


[Impala-ASF-CR] IMPALA-13079: Add support for FLOAT/DOUBLE in Iceberg metadata tables

2024-05-15 Thread Daniel Becker (Code Review)
Daniel Becker has uploaded a new patch set (#3). ( 
http://gerrit.cloudera.org:8080/21425 )

Change subject: IMPALA-13079: Add support for FLOAT/DOUBLE in Iceberg metadata 
tables
..

IMPALA-13079: Add support for FLOAT/DOUBLE in Iceberg metadata tables

Until now, the float and double data types were not supported in Iceberg
metadata tables. This commit adds support for them.

Testing:
 - added a test table that contains all primitive types (except for
   decimal, which is still not supported), a struct, an array and a map
 - added a test query that queries the `files` metadata table of the
   above table - the 'readable_metrics' struct contains lower and upper
   bounds for all columns in the original table, with the original type

Change-Id: I2171c9aa9b6d2b634b8c511263b1610cb1d7cb29
---
M be/src/exec/iceberg-metadata/iceberg-row-reader.cc
M be/src/exec/iceberg-metadata/iceberg-row-reader.h
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test
5 files changed, 122 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/25/21425/3
--
To view, visit http://gerrit.cloudera.org:8080/21425
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2171c9aa9b6d2b634b8c511263b1610cb1d7cb29
Gerrit-Change-Number: 21425
Gerrit-PatchSet: 3
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Peter Rozsa 


[Impala-ASF-CR] IMPALA-13079: Add support for FLOAT/DOUBLE in Iceberg metadata tables

2024-05-15 Thread Daniel Becker (Code Review)
Daniel Becker has uploaded a new patch set (#2). ( 
http://gerrit.cloudera.org:8080/21425 )

Change subject: IMPALA-13079: Add support for FLOAT/DOUBLE in Iceberg metadata 
tables
..

IMPALA-13079: Add support for FLOAT/DOUBLE in Iceberg metadata tables

Until now, the float and double data types were not supported in Iceberg
metadata tables. This commit adds support for them.

Testing:
 - added a test table that contains all primitive types (except for
   decimal, which is still not supported), a struct, an array and a map
 - added a test query that queries the `files` metadata table of the
   above table - the 'readable_metrics' struct contains lower and upper
   bounds for all columns in the original table, with the original type

Change-Id: I2171c9aa9b6d2b634b8c511263b1610cb1d7cb29
---
M be/src/exec/iceberg-metadata/iceberg-row-reader.cc
M be/src/exec/iceberg-metadata/iceberg-row-reader.h
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test
5 files changed, 105 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/25/21425/2
--
To view, visit http://gerrit.cloudera.org:8080/21425
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2171c9aa9b6d2b634b8c511263b1610cb1d7cb29
Gerrit-Change-Number: 21425
Gerrit-PatchSet: 2
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Peter Rozsa 


[Impala-ASF-CR] IMPALA-13079: Add support for FLOAT/DOUBLE in Iceberg metadata tables

2024-05-15 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21425 )

Change subject: IMPALA-13079: Add support for FLOAT/DOUBLE in Iceberg metadata 
tables
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/21425/1/testdata/datasets/functional/functional_schema_template.sql
File testdata/datasets/functional/functional_schema_template.sql:

http://gerrit.cloudera.org:8080/#/c/21425/1/testdata/datasets/functional/functional_schema_template.sql@3916
PS1, Line 3916: INSERT INTO {db_name}{db_suffix}.{table_name} VALUES (
> nit: More rows with different values could give better insight into how upp
You're right, added a new row.



--
To view, visit http://gerrit.cloudera.org:8080/21425
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2171c9aa9b6d2b634b8c511263b1610cb1d7cb29
Gerrit-Change-Number: 21425
Gerrit-PatchSet: 2
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Peter Rozsa 
Gerrit-Comment-Date: Wed, 15 May 2024 12:18:10 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-13079: Add support for FLOAT/DOUBLE in Iceberg metadata tables

2024-05-14 Thread Daniel Becker (Code Review)
Daniel Becker has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/21425


Change subject: IMPALA-13079: Add support for FLOAT/DOUBLE in Iceberg metadata 
tables
..

IMPALA-13079: Add support for FLOAT/DOUBLE in Iceberg metadata tables

Until now, the float and double data types were not supported in Iceberg
metadata tables. This commit adds support for them.

Testing:
 - added a test table that contains all primitive types (except for
   decimal, which is still not supported), a struct, an array and a map
 - added a test query that queries the `files` metadata table of the
   above table - the 'readable_metrics' struct contains lower and upper
   bounds for all columns in the original table, with the original type

Change-Id: I2171c9aa9b6d2b634b8c511263b1610cb1d7cb29
---
M be/src/exec/iceberg-metadata/iceberg-row-reader.cc
M be/src/exec/iceberg-metadata/iceberg-row-reader.h
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test
5 files changed, 92 insertions(+), 0 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/25/21425/1
--
To view, visit http://gerrit.cloudera.org:8080/21425
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I2171c9aa9b6d2b634b8c511263b1610cb1d7cb29
Gerrit-Change-Number: 21425
Gerrit-PatchSet: 1
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Peter Rozsa 


[Impala-ASF-CR] IMPALA-13036: Document Iceberg metadata tables

2024-05-10 Thread Daniel Becker (Code Review)
Daniel Becker has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/21387 )

Change subject: IMPALA-13036: Document Iceberg metadata tables
..

IMPALA-13036: Document Iceberg metadata tables

This change adds documentation on how Iceberg metadata tables can be
used.

Testing:
 - built docs locally

Change-Id: Ic453f567b814cb4363a155e2008029e94efb6ed1
Reviewed-on: http://gerrit.cloudera.org:8080/21387
Tested-by: Impala Public Jenkins 
Reviewed-by: Peter Rozsa 
---
M docs/topics/impala_iceberg.xml
1 file changed, 72 insertions(+), 0 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Peter Rozsa: Looks good to me, approved

--
To view, visit http://gerrit.cloudera.org:8080/21387
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ic453f567b814cb4363a155e2008029e94efb6ed1
Gerrit-Change-Number: 21387
Gerrit-PatchSet: 3
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Peter Rozsa 


[Impala-ASF-CR] IMPALA-13055: Some Iceberg metadata table tests don't assert

2024-05-09 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21394 )

Change subject: IMPALA-13055: Some Iceberg metadata table tests don't assert
..


Patch Set 2: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/21394
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie47093f25a70253b3e6faca27d466d7cf6999fad
Gerrit-Change-Number: 21394
Gerrit-PatchSet: 2
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 09 May 2024 11:53:08 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters

2024-05-09 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
..


Patch Set 27: Code-Review+2

Thanks Pranav!


--
To view, visit http://gerrit.cloudera.org:8080/21131
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Gerrit-Change-Number: 21131
Gerrit-PatchSet: 27
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Thu, 09 May 2024 10:06:59 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters

2024-05-07 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
..


Patch Set 23:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/21131/23/tests/query_test/test_insert.py
File tests/query_test/test_insert.py:

http://gerrit.cloudera.org:8080/#/c/21131/23/tests/query_test/test_insert.py@301
PS23, Line 301: "create table {}(i int) partitioned by (p string) 
stored as parquet".format(tbl))
One more thing: you should run this test with an Iceberg table too. You could 
extract the common parts (mostly everything except for the create and insert 
statements) into a helper function.



--
To view, visit http://gerrit.cloudera.org:8080/21131
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Gerrit-Change-Number: 21131
Gerrit-PatchSet: 23
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Tue, 07 May 2024 14:19:54 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters

2024-05-07 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
..


Patch Set 23:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/21131/23/tests/query_test/test_insert.py
File tests/query_test/test_insert.py:

http://gerrit.cloudera.org:8080/#/c/21131/23/tests/query_test/test_insert.py@320
PS23, Line 320: result line
"the result line..."


http://gerrit.cloudera.org:8080/#/c/21131/23/tests/query_test/test_insert.py@321
PS23, Line 321: the values present inside it are just verified.
"we only verify that the value is present in string representation of the whole 
row"



--
To view, visit http://gerrit.cloudera.org:8080/21131
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Gerrit-Change-Number: 21131
Gerrit-PatchSet: 23
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Tue, 07 May 2024 12:46:40 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters

2024-05-07 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
..


Patch Set 23:

(2 comments)

Just a few nits.

http://gerrit.cloudera.org:8080/#/c/21131/23/be/src/util/coding-util-test.cc
File be/src/util/coding-util-test.cc:

http://gerrit.cloudera.org:8080/#/c/21131/23/be/src/util/coding-util-test.cc@94
PS23, Line 94:   TestUrl("/home/impala/directory/", 
"%2Fhome%2Fimpala%2Fdirectory%2F", false);
These strings could also be extracted into variables.


http://gerrit.cloudera.org:8080/#/c/21131/23/tests/query_test/test_insert.py
File tests/query_test/test_insert.py:

http://gerrit.cloudera.org:8080/#/c/21131/23/tests/query_test/test_insert.py@320
PS23, Line 320: are
"is a"



--
To view, visit http://gerrit.cloudera.org:8080/21131
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Gerrit-Change-Number: 21131
Gerrit-PatchSet: 23
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Tue, 07 May 2024 12:24:14 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-13047: Support restarting/killing specified impalads in start-impala-cluster.py

2024-05-06 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21376 )

Change subject: IMPALA-13047: Support restarting/killing specified impalads in 
start-impala-cluster.py
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/21376/1/bin/start-impala-cluster.py
File bin/start-impala-cluster.py:

http://gerrit.cloudera.org:8080/#/c/21376/1/bin/start-impala-cluster.py@1118
PS1, Line 1118:   existing_cluster_size = 
cluster_ops.get_cluster().get_max_impalad_id() + 1
> Yeah, that's an issue. Removed this check when using --kill. Also improved
It's still not good if you get the existing cluster size from the max impalad 
id: then killing the last and the first impalas will cause different behaviour 
in the future. For example start 3, kill number 2, then restart number 0, it 
runs on the assertion again.

Isn't it possible to use the length of cluster_ops.get_cluster().__impalads?



--
To view, visit http://gerrit.cloudera.org:8080/21376
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id4a863853341959eb6870b662d82188e7d570971
Gerrit-Change-Number: 21376
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Mon, 06 May 2024 15:21:01 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-13055: Some Iceberg metadata table tests don't assert

2024-05-06 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21394 )

Change subject: IMPALA-13055: Some Iceberg metadata table tests don't assert
..


Patch Set 1:

(5 comments)

Thanks Gábor!

http://gerrit.cloudera.org:8080/#/c/21394/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/21394/1//COMMIT_MSG@7
PS1, Line 7: IMPALA-13055: Some Iceberg metadata table tests don't assert
Do you think it is an error in the test framework? If so, could you open a 
separate Jira for it with repro instructions?


http://gerrit.cloudera.org:8080/#/c/21394/1//COMMIT_MSG@11
PS1, Line 11: unconditionally passes
We should make it clear the it is not the actual result from Impala that can be 
anything but the "expected" string.


http://gerrit.cloudera.org:8080/#/c/21394/1/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test
File 
testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test:

http://gerrit.cloudera.org:8080/#/c/21394/1/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test@49
PS1, Line 49: : VERIFY_IS_SUBSET
Is adding VERIFY_IS_SUBSET needed? I ran the test locally without it and it 
passed.


http://gerrit.cloudera.org:8080/#/c/21394/1/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test@365
PS1, Line 365: # Currently not supported, complex type slots are set to NULL 
(IMPALA-12205)
This comment is stale, we should change it.


http://gerrit.cloudera.org:8080/#/c/21394/1/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test@371
PS1, Line 371: '{.*}'
As this test section deals with complex types, I think we shouldn't put a 
wildcard for all the result. We should include details and use regexes for 
easily changing values (file size etc.).



--
To view, visit http://gerrit.cloudera.org:8080/21394
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie47093f25a70253b3e6faca27d466d7cf6999fad
Gerrit-Change-Number: 21394
Gerrit-PatchSet: 1
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Mon, 06 May 2024 14:15:44 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-13035: Querying metadata tables from non-Iceberg tables throws IllegalArgumentException

2024-05-03 Thread Daniel Becker (Code Review)
Daniel Becker has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/21361 )

Change subject: IMPALA-13035: Querying metadata tables from non-Iceberg tables 
throws IllegalArgumentException
..

IMPALA-13035: Querying metadata tables from non-Iceberg tables throws 
IllegalArgumentException

When attempting to query a metadata table of a non-Iceberg table the
analyzer throws 'IllegalArgumentException'.

The problem is that 'IcebergMetadataTable.isIcebergMetadataTable()'
doesn't actually check whether the given path belongs to a valid
metadata table, it only checks whether the path could syntactically
refer to one. This is because it is called in
'Path.getCandidateTables()', at which point analysis has not been done
yet.

However, 'IcebergMetadataTable.isIcebergMetadataTable()' is also called
in 'Analyzer.getTable()'. If 'isIcebergMetadataTable()' returns true,
'getTable()' tries to instantiate an 'IcebergMetadataTable' object with
the table ref of the base table. If that table is not an Iceberg table,
a precondition check fails.

This change renames 'isIcebergMetadataTable()' to
'canBeIcebergMetadataTable()' and adds a new 'isIcebergMetadataTable()'
function, which also takes an 'Analyzer' as a parameter. With the help
of the 'Analyzer' it is possible to determine whether the base table is
an Iceberg table. 'Analyzer.getTable()' then uses this new
'isIcebergMetadataTable()' function instead of
canBeIcebergMetadataTable().

The constructor of 'IcebergMetadataTable' is also modified to take an
'FeIcebergTable' as the parameter for the base table instead of a
general 'FeTable'.

Testing:
 - Added a test query in iceberg-metadata-tables.test.

Change-Id: Ia7c25ed85a8813011537c73f0aaf72db1501f9ef
Reviewed-on: http://gerrit.cloudera.org:8080/21361
Tested-by: Impala Public Jenkins 
Reviewed-by: Peter Rozsa 
---
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/Path.java
M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergMetadataTable.java
M fe/src/main/java/org/apache/impala/service/DescribeResultFactory.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test
6 files changed, 52 insertions(+), 16 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Peter Rozsa: Looks good to me, approved

--
To view, visit http://gerrit.cloudera.org:8080/21361
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ia7c25ed85a8813011537c73f0aaf72db1501f9ef
Gerrit-Change-Number: 21361
Gerrit-PatchSet: 5
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Peter Rozsa 


[Impala-ASF-CR] IMPALA-13036: Document Iceberg metadata tables

2024-05-03 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21387 )

Change subject: IMPALA-13036: Document Iceberg metadata tables
..


Patch Set 2:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/21387/1/docs/topics/impala_iceberg.xml
File docs/topics/impala_iceberg.xml:

http://gerrit.cloudera.org:8080/#/c/21387/1/docs/topics/impala_iceberg.xml@729
PS1, Line 729:
> nit:,
I don't think a comma is needed before "and".


http://gerrit.cloudera.org:8080/#/c/21387/1/docs/topics/impala_iceberg.xml@730
PS1, Line 730: fr
> I'm not sure: from?
Yes, neither is perfect, I changed it to "from".


http://gerrit.cloudera.org:8080/#/c/21387/1/docs/topics/impala_iceberg.xml@730
PS1, Line 730:
> nit:,
I don't think a comma is needed here.


http://gerrit.cloudera.org:8080/#/c/21387/1/docs/topics/impala_iceberg.xml@730
PS1, Line 730:
> *direclty
Added a sentence at the end to make it clearer.


http://gerrit.cloudera.org:8080/#/c/21387/1/docs/topics/impala_iceberg.xml@770
PS1, Line 770:
> *are
The subject is "unnesting", which is singular, that's why it's "is".



--
To view, visit http://gerrit.cloudera.org:8080/21387
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic453f567b814cb4363a155e2008029e94efb6ed1
Gerrit-Change-Number: 21387
Gerrit-PatchSet: 2
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Peter Rozsa 
Gerrit-Comment-Date: Fri, 03 May 2024 09:45:39 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-13036: Document Iceberg metadata tables

2024-05-03 Thread Daniel Becker (Code Review)
Daniel Becker has uploaded a new patch set (#2). ( 
http://gerrit.cloudera.org:8080/21387 )

Change subject: IMPALA-13036: Document Iceberg metadata tables
..

IMPALA-13036: Document Iceberg metadata tables

This change adds documentation on how Iceberg metadata tables can be
used.

Testing:
 - built docs locally

Change-Id: Ic453f567b814cb4363a155e2008029e94efb6ed1
---
M docs/topics/impala_iceberg.xml
1 file changed, 72 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/87/21387/2
--
To view, visit http://gerrit.cloudera.org:8080/21387
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic453f567b814cb4363a155e2008029e94efb6ed1
Gerrit-Change-Number: 21387
Gerrit-PatchSet: 2
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Peter Rozsa 


[Impala-ASF-CR] IMPALA-13036: Document Iceberg metadata tables

2024-05-02 Thread Daniel Becker (Code Review)
Daniel Becker has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/21387


Change subject: IMPALA-13036: Document Iceberg metadata tables
..

IMPALA-13036: Document Iceberg metadata tables

This change adds documentation on how Iceberg metadata tables can be
used.

Testing:
 - built docs locally

Change-Id: Ic453f567b814cb4363a155e2008029e94efb6ed1
---
M docs/topics/impala_iceberg.xml
1 file changed, 70 insertions(+), 0 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/87/21387/1
--
To view, visit http://gerrit.cloudera.org:8080/21387
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ic453f567b814cb4363a155e2008029e94efb6ed1
Gerrit-Change-Number: 21387
Gerrit-PatchSet: 1
Gerrit-Owner: Daniel Becker 


[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters

2024-04-30 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
..


Patch Set 15:

(2 comments)

Thanks Pranav! In the meantime I discovered something else, but that shouldn't 
be too hard to fix.

http://gerrit.cloudera.org:8080/#/c/21131/15/be/src/util/coding-util.cc
File be/src/util/coding-util.cc:

http://gerrit.cloudera.org:8080/#/c/21131/15/be/src/util/coding-util.cc@66
PS15, Line 66:   ss << '%' << static_cast(ch);
I played with it a bit and it doesn't work with for example '\t' and '\n' (for 
example
  insert into my_part_tbl partition(p='a\tt') values (0);
).
Let's take '\t'. Its value is 9, which only has a single hexadecimal digit. Now 
we'll escape it as "%9" but Hive has "%09". To match Hive's behaviour, do this:
1. Add this include in the appropriate include group: #include 
2. Add "<< setfill('0')" after "hex" on L58.
3. Add "<< setw(2)" after inserting the % sign on L66.

This will pad hex values so that they are always at least 2 chars wide.
Note that setw is reset after insertion, so it has to be within the loop.

This also reveals a test deficiency: we should ideally test every character 
that is in 'SpecialCharacters'. I don't know if we can insert arbitrary byte 
values in Impala strings, maybe Quanlong can help there. But even if that's not 
possible, we should cover the ones for which  an escape sequence exists (i.e. 
'\t', '\n', '\v' etc.).


http://gerrit.cloudera.org:8080/#/c/21131/15/be/src/util/coding-util.cc@66
PS15, Line 66: ch
I think it would be good if we first cast it to unsigned char because of what 
happens if we encounter a character > 127. This is possible if the input 
contains one, and we are in non-Hive-compat mode.

If char is a signed type (that is implementation dependent), then the value > 
127 will be interpreted as a negative number. When we convert it to uint32_t, 
it will be sign-extended. For example, instead of "%FF" we will have 
"%". If we convert it first to unsigned char, it will not be 
sign-extended.



--
To view, visit http://gerrit.cloudera.org:8080/21131
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Gerrit-Change-Number: 21131
Gerrit-PatchSet: 15
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Tue, 30 Apr 2024 14:01:31 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters

2024-04-30 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
..


Patch Set 13:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/21131/13//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/21131/13//COMMIT_MSG@18
PS13, Line 18: addresses
makes


http://gerrit.cloudera.org:8080/#/c/21131/13//COMMIT_MSG@18
PS13, Line 18: two
I've almost forgotten the third: the list of characters to be escaped was 
extended to match the current list in Hive.


http://gerrit.cloudera.org:8080/#/c/21131/13//COMMIT_MSG@19
PS13, Line 19: there was a "set"
I didn't mean it as a literal text to be included. Instead, it could be:
"Before the change, the set of characters that need to be escaped was stored as 
a string. The current patch uses an unordered_set instead."


http://gerrit.cloudera.org:8080/#/c/21131/13//COMMIT_MSG@25
PS13, Line 25:
Nit: "... byte and whose inclusion ...".



--
To view, visit http://gerrit.cloudera.org:8080/21131
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Gerrit-Change-Number: 21131
Gerrit-PatchSet: 13
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Tue, 30 Apr 2024 11:31:32 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-13047: Support restarting/killing specified impalads in start-impala-cluster.py

2024-04-30 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21376 )

Change subject: IMPALA-13047: Support restarting/killing specified impalads in 
start-impala-cluster.py
..


Patch Set 1:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/21376/1/bin/start-impala-cluster.py
File bin/start-impala-cluster.py:

http://gerrit.cloudera.org:8080/#/c/21376/1/bin/start-impala-cluster.py@95
PS1, Line 95: Start
Nit: Starts.


http://gerrit.cloudera.org:8080/#/c/21376/1/bin/start-impala-cluster.py@275
PS1, Line 275:   kill_processes([t for t in find_user_processes(binary_names)], 
force)
Could use "list(find_user_processes(binary_names))" instead of list 
comprehension.


http://gerrit.cloudera.org:8080/#/c/21376/1/bin/start-impala-cluster.py@833
PS1, Line 833: node_ids[i]
Optional: could extract into a variable, also used on L840.


http://gerrit.cloudera.org:8080/#/c/21376/1/bin/start-impala-cluster.py@1118
PS1, Line 1118: assert existing_cluster_size == expected_cluster_size,\
If we kill some impalads, doesn't that change the cluster size?

If I try to kill impalad1 first, then impalad2, this assertion is triggered.

$ bin/start-impala-cluster.py
$ bin/start-impala-cluster.py --kill --impalad_ids 1
13:13:50 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
13:13:50 MainThread: Killed impalad (pid 369056)
$ bin/start-impala-cluster.py --kill --impalad_ids 2
13:13:53 MainThread: Found 2 impalad/1 statestored/1 catalogd process(es)
Traceback (most recent call last):
  File "bin/start-impala-cluster.py", line 1119, in 
"Commands on specified impalads shouldn't change the cluster size"
AssertionError: Commands on specified impalads shouldn't change the cluster size



--
To view, visit http://gerrit.cloudera.org:8080/21376
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id4a863853341959eb6870b662d82188e7d570971
Gerrit-Change-Number: 21376
Gerrit-PatchSet: 1
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Tue, 30 Apr 2024 11:16:29 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters

2024-04-30 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
..


Patch Set 12:

(6 comments)

Thanks Pranav, we're close.

http://gerrit.cloudera.org:8080/#/c/21131/12//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/21131/12//COMMIT_MSG@15
PS12, Line 15: '
Use double quotes here ("\u00FF"), this is not a valid character literal but it 
is a valid string literal.
Same also later on this line.


http://gerrit.cloudera.org:8080/#/c/21131/12//COMMIT_MSG@18
PS12, Line 18: the special characters
"the special characters that need to be escaped" would be better, it's not all 
special chars.


http://gerrit.cloudera.org:8080/#/c/21131/12//COMMIT_MSG@18
PS12, Line 18: ,
Nit: no need for the comma here.


http://gerrit.cloudera.org:8080/#/c/21131/12//COMMIT_MSG@18
PS12, Line 18: included
I think it could be formulated better. There are two things that have changed:
1. '\xFF' was corrected to '\x7F' and
2. before this change we also had a "set" of characters that needed to be 
encoded, but it used to be stored as a string and now it's stored as an 
unordered_set (which is much better in my opinion). I think we should mention 
these separately.


http://gerrit.cloudera.org:8080/#/c/21131/12/be/src/util/coding-util.cc
File be/src/util/coding-util.cc:

http://gerrit.cloudera.org:8080/#/c/21131/12/be/src/util/coding-util.cc@19
PS12, Line 19: #include 
Let's fix include groups: there should be an empty line after #include 
"util/coding-util.h": after it come the standard library includes. Also, leave 
a line after  because the boost and sasl includes are includes 
from other libraries than std.

Usually we group includes like this:

1. header belonging to the current .cc file
2. standard library includes
3. other (third party) library includes
4. impala includes

Each group is ordered alphabetically and there is an empty line between groups.


http://gerrit.cloudera.org:8080/#/c/21131/12/be/src/util/coding-util.cc@50
PS12, Line 50: '\x0C'
This could be '\f' (form feed), see https://www.compart.com/en/unicode/U+000C.



--
To view, visit http://gerrit.cloudera.org:8080/21131
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Gerrit-Change-Number: 21131
Gerrit-PatchSet: 12
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Tue, 30 Apr 2024 10:36:08 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-13046: Update Iceberg mixed format deletes test

2024-04-30 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21373 )

Change subject: IMPALA-13046: Update Iceberg mixed format deletes test
..


Patch Set 1: Code-Review+2

Thanks for fixing this, Michael.


--
To view, visit http://gerrit.cloudera.org:8080/21373
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I87c23cc541983223c6b766372f4e582c33ae6836
Gerrit-Change-Number: 21373
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Smith 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 30 Apr 2024 09:48:13 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters

2024-04-29 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
..


Patch Set 11:

(17 comments)

http://gerrit.cloudera.org:8080/#/c/21131/11//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/21131/11//COMMIT_MSG@9
PS11, Line 9: The
An error - we're introducing it now.


http://gerrit.cloudera.org:8080/#/c/21131/11//COMMIT_MSG@14
PS11, Line 14: was matching with
matched one of the ...


http://gerrit.cloudera.org:8080/#/c/21131/11//COMMIT_MSG@15
PS11, Line 15: '\u00FF'
'\u00FF' is not a valid character literal in C++. "\u00FF" was included in a 
string, it was part of a string literal. At last I found out that it can 
actually be used in C++ to specify arbitrary unicode values, but of course it 
may be encoded on multiple bytes, see 
https://en.cppreference.com/w/cpp/language/escape.

Also include in the commit message that including this character was probably a 
mistake from the beginning, it should have been '\x7F'.


http://gerrit.cloudera.org:8080/#/c/21131/11//COMMIT_MSG@18
PS11, Line 18: '\xFF'
See the comment above, we should explain that it was a mistake from the 
beginning.


http://gerrit.cloudera.org:8080/#/c/21131/11//COMMIT_MSG@22
PS11, Line 22: Some
It would be good to include in which file the new tests are.


http://gerrit.cloudera.org:8080/#/c/21131/11//COMMIT_MSG@22
PS11, Line 22: for both parquet and iceberg
Parquet and Iceberg are not exclusive things, we also use Parquet in our 
Iceberg tables. Parquet is a file format and Iceberg is a table format. We 
could say "tests on both traditional Hive tables and Iceberg tables ...".


http://gerrit.cloudera.org:8080/#/c/21131/11/be/src/util/coding-util.cc
File be/src/util/coding-util.cc:

http://gerrit.cloudera.org:8080/#/c/21131/11/be/src/util/coding-util.cc@25
PS11, Line 25: #include 
The includes within a group should be in alphabetical order, so this should go 
before L23.


http://gerrit.cloudera.org:8080/#/c/21131/11/be/src/util/coding-util.cc@31
PS11, Line 31: #include "sasl/saslutil.h"
Includes from the standard library should be the first group after the header, 
i.e. this should come after #include .


http://gerrit.cloudera.org:8080/#/c/21131/11/be/src/util/coding-util.cc@51
PS11, Line 51: '\x20'
> Represents space character
For those characters that have a readable literal representation, e.g. " ", 
"\n" etc., we should use that even if Hive doesn't.


http://gerrit.cloudera.org:8080/#/c/21131/11/be/src/util/coding-util.cc@57
PS11, Line 57: Used
It would be clearer like this: "uppercase" and "hex" only affect the insertion 
of integers, not that of char values.


http://gerrit.cloudera.org:8080/#/c/21131/11/be/src/util/coding-util.cc@62
PS11, Line 62: // Hive-compat mode, and the character is not alphanumeric 
or is one
This doesn't match the code. Try this:
"... b) we are not in Hive-compat mode and the character is not alphanumeric 
and it is not one of the
characters specifically excluded from escaping (see ShouldNotEscape())."


http://gerrit.cloudera.org:8080/#/c/21131/11/testdata/workloads/functional-query/queries/QueryTest/unicode-column-name.test
File 
testdata/workloads/functional-query/queries/QueryTest/unicode-column-name.test:

http://gerrit.cloudera.org:8080/#/c/21131/11/testdata/workloads/functional-query/queries/QueryTest/unicode-column-name.test@302
PS11, Line 302: show tables;
I don't think it's needed. Also, if another table is added later before this 
query, this will have to be changed as well.


http://gerrit.cloudera.org:8080/#/c/21131/11/testdata/workloads/functional-query/queries/QueryTest/unicode-column-name.test@309
PS11, Line 309: insert into unicode_partition_values partition(p='运营业务数据') 
values (0);
You don't need to put these inserts into separate QUERY segments. It could also 
go together with the select on L318.


http://gerrit.cloudera.org:8080/#/c/21131/11/testdata/workloads/functional-query/queries/QueryTest/unicode-column-name.test@327
PS11, Line 327: drop table unicode_partition_values;
You don't need to drop the table, the tables are created in a unique database, 
see test_column_unicode.py::TestColumnUnicode::test_create_unicode_table.


http://gerrit.cloudera.org:8080/#/c/21131/11/testdata/workloads/functional-query/queries/QueryTest/unicode-column-name.test@337
PS11, Line 337: show tables;
I don't think this is needed.


http://gerrit.cloudera.org:8080/#/c/21131/11/testdata/workloads/functional-query/queries/QueryTest/unicode-column-name.test@344
PS11, Line 344: insert into unicode_partition_values_iceberg values (0, '运');
You should do the same insertions into the Iceberg table as into the other one.


http://gerrit.cloudera.org:8080/#/c/21131/11/testdata/workloads/functional-query/queries/QueryTest/unicode-column-name.test@354
PS11, Line 354: drop table 

[Impala-ASF-CR] IMPALA-13035: Querying metadata tables from non-Iceberg tables throws IllegalArgumentException

2024-04-29 Thread Daniel Becker (Code Review)
Daniel Becker has uploaded a new patch set (#4). ( 
http://gerrit.cloudera.org:8080/21361 )

Change subject: IMPALA-13035: Querying metadata tables from non-Iceberg tables 
throws IllegalArgumentException
..

IMPALA-13035: Querying metadata tables from non-Iceberg tables throws 
IllegalArgumentException

When attempting to query a metadata table of a non-Iceberg table the
analyzer throws 'IllegalArgumentException'.

The problem is that 'IcebergMetadataTable.isIcebergMetadataTable()'
doesn't actually check whether the given path belongs to a valid
metadata table, it only checks whether the path could syntactically
refer to one. This is because it is called in
'Path.getCandidateTables()', at which point analysis has not been done
yet.

However, 'IcebergMetadataTable.isIcebergMetadataTable()' is also called
in 'Analyzer.getTable()'. If 'isIcebergMetadataTable()' returns true,
'getTable()' tries to instantiate an 'IcebergMetadataTable' object with
the table ref of the base table. If that table is not an Iceberg table,
a precondition check fails.

This change renames 'isIcebergMetadataTable()' to
'canBeIcebergMetadataTable()' and adds a new 'isIcebergMetadataTable()'
function, which also takes an 'Analyzer' as a parameter. With the help
of the 'Analyzer' it is possible to determine whether the base table is
an Iceberg table. 'Analyzer.getTable()' then uses this new
'isIcebergMetadataTable()' function instead of
canBeIcebergMetadataTable().

The constructor of 'IcebergMetadataTable' is also modified to take an
'FeIcebergTable' as the parameter for the base table instead of a
general 'FeTable'.

Testing:
 - Added a test query in iceberg-metadata-tables.test.

Change-Id: Ia7c25ed85a8813011537c73f0aaf72db1501f9ef
---
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/Path.java
M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergMetadataTable.java
M fe/src/main/java/org/apache/impala/service/DescribeResultFactory.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test
6 files changed, 52 insertions(+), 16 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/61/21361/4
--
To view, visit http://gerrit.cloudera.org:8080/21361
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia7c25ed85a8813011537c73f0aaf72db1501f9ef
Gerrit-Change-Number: 21361
Gerrit-PatchSet: 4
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Peter Rozsa 


[Impala-ASF-CR] IMPALA-13035: Querying metadata tables from non-Iceberg tables throws IllegalArgumentException

2024-04-29 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21361 )

Change subject: IMPALA-13035: Querying metadata tables from non-Iceberg tables 
throws IllegalArgumentException
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/21361/2/fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergMetadataTable.java
File 
fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergMetadataTable.java:

http://gerrit.cloudera.org:8080/#/c/21361/2/fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergMetadataTable.java@148
PS2, Line 148: // If the metadata table has already been analyzed in the 
query, the table cache will
> If a metadata table is analyzed before and it's referenced in a different p
Thanks, I added a comment for it.



--
To view, visit http://gerrit.cloudera.org:8080/21361
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia7c25ed85a8813011537c73f0aaf72db1501f9ef
Gerrit-Change-Number: 21361
Gerrit-PatchSet: 3
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Peter Rozsa 
Gerrit-Comment-Date: Mon, 29 Apr 2024 11:23:39 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-13035: Querying metadata tables from non-Iceberg tables throws IllegalArgumentException

2024-04-29 Thread Daniel Becker (Code Review)
Daniel Becker has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/21361


Change subject: IMPALA-13035: Querying metadata tables from non-Iceberg tables 
throws IllegalArgumentException
..

IMPALA-13035: Querying metadata tables from non-Iceberg tables throws 
IllegalArgumentException

When attempting to query a metadata table of a non-Iceberg table the
analyzer throws 'IllegalArgumentException'.

The problem is that 'IcebergMetadataTable.isIcebergMetadataTable()'
doesn't actually check whether the given path belongs to a valid
metadata table, it only checks whether the path could syntactically
refer to one. This is because it is called in
'Path.getCandidateTables()', at which point analysis has not been done
yet.

However, 'IcebergMetadataTable.isIcebergMetadataTable()' is also called
in 'Analyzer.getTable()'. If 'isIcebergMetadataTable()' returns true,
'getTable()' tries to instantiate an 'IcebergMetadataTable' object with
the table ref of the base table. If that table is not an Iceberg table,
a precondition check fails.

This change renames 'isIcebergMetadataTable()' to
'canBeIcebergMetadataTable()' and adds a new 'isIcebergMetadataTable()'
function, which also takes an 'Analyzer' as a parameter. With the help
of the 'Analyzer' it is possible to determine whether the base table is
an Iceberg table. 'Analyzer.getTable()' then uses this new
'isIcebergMetadataTable()' function instead of
canBeIcebergMetadataTable().

The constructor of 'IcebergMetadataTable' is also modified to take an
'FeIcebergTable' as the parameter for the base table instead of a
general 'FeTable'.

Testing:
 - Added a test query in iceberg-metadata-tables.test.

Change-Id: Ia7c25ed85a8813011537c73f0aaf72db1501f9ef
---
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/Path.java
M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergMetadataTable.java
M fe/src/main/java/org/apache/impala/service/DescribeResultFactory.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test
6 files changed, 50 insertions(+), 16 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/61/21361/3
--
To view, visit http://gerrit.cloudera.org:8080/21361
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ia7c25ed85a8813011537c73f0aaf72db1501f9ef
Gerrit-Change-Number: 21361
Gerrit-PatchSet: 3
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Peter Rozsa 


[Impala-ASF-CR] IMPALA-12973,IMPALA-11491,IMPALA-12651: Support BINARY nested in complex types in select list

2024-04-26 Thread Daniel Becker (Code Review)
Daniel Becker has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/21269 )

Change subject: IMPALA-12973,IMPALA-11491,IMPALA-12651: Support BINARY nested 
in complex types in select list
..

IMPALA-12973,IMPALA-11491,IMPALA-12651: Support BINARY nested in complex types 
in select list

Binary fields in complex types are currently not supported at all for
regular tables (an error is returned). For Iceberg metadata tables,
IMPALA-12899 added a temporary workaround to allow queries that contain
these fields to succeed by NULLing them out. This change adds support
for displaying them with base64 encoding for both regular and Iceberg
metadata tables.

Complex types are displayed in JSON format, so simply inserting the
bytes of the binary fields is not acceptable as it would produce invalid
JSON. Base64 is a widely used encoding that allows representing
arbitrary binary information using only a limited set of ASCII
characters.

This change also adds support for top level binary columns in Iceberg
metadata tables. However, these are not base64 encoded but are returned
in raw byte format - this is consistent with how top level binary
columns from regular (non-metadata) tables are handled.

Testing:
 - added test queries in iceberg-metadata-tables.test referencing both
   nested and top level binary fields; also updated existing queries
 - moved relevant tests (queries extracting binary fields from within
   complex types) from nested-types-scanner-basic.test to a new
   binary-in-complex-type.test file and also added a query that selects
   the containing complex types; this new test file is run from
   test_scanners.py::TestBinaryInComplexType::\
 test_binary_in_complex_type
 - moved negative tests in AnalyzerTest.TestUnsupportedTypes() to
   AnalyzeStmtsTest.TestComplexTypesInSelectList() and converted them to
   positive tests (expecting success); a negative test already in
   AnalyzeStmtsTest.TestComplexTypesInSelectList() was also converted

Change-Id: I7b1d7fa332a901f05a46e0199e13fb841d2687c2
Reviewed-on: http://gerrit.cloudera.org:8080/21269
Tested-by: Impala Public Jenkins 
Reviewed-by: Csaba Ringhofer 
---
M be/src/exec/iceberg-metadata/iceberg-metadata-scanner.cc
M be/src/exec/iceberg-metadata/iceberg-metadata-scanner.h
M be/src/exec/iceberg-metadata/iceberg-row-reader.cc
M be/src/exec/iceberg-metadata/iceberg-row-reader.h
M be/src/rpc/jni-thrift-util.h
M be/src/runtime/complex-value-writer.inline.h
M be/src/util/jni-util.cc
M be/src/util/jni-util.h
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/SlotRef.java
M fe/src/main/java/org/apache/impala/util/IcebergMetadataScanner.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java
M testdata/data/README
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/data/0-0-data-danielbecker_20240408174043_c3737eaf-db30-4b88-aafb-f23c0f3c1dd3-job_17125053806420_0002-1-1.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/64da0e56-efa3-4025-bef1-1047fdd9a2b0-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/snap-3079551887386250470-1-64da0e56-efa3-4025-bef1-1047fdd9a2b0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/v1.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/v2.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/version-hint.txt
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
A 
testdata/workloads/functional-query/queries/QueryTest/binary-in-complex-type.test
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test
M 
testdata/workloads/functional-query/queries/QueryTest/nested-types-scanner-basic.test
M tests/query_test/test_scanners.py
26 files changed, 441 insertions(+), 157 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Csaba Ringhofer: Looks good to me, approved

-- 
To view, visit http://gerrit.cloudera.org:8080/21269
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I7b1d7fa332a901f05a46e0199e13fb841d2687c2
Gerrit-Change-Number: 21269
Gerrit-PatchSet: 12
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 


[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters

2024-04-26 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
..


Patch Set 6:

(13 comments)

http://gerrit.cloudera.org:8080/#/c/21131/6//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/21131/6//COMMIT_MSG@7
PS6, Line 7: IMPALA-11499: Refactor UrlEncode function to handle special
We don't usually break the title into two lines, and it is not too long either.


http://gerrit.cloudera.org:8080/#/c/21131/6//COMMIT_MSG@12
PS6, Line 12: specialCharacterMap
Nit: surround with quotes ('specialCharacterMap').


http://gerrit.cloudera.org:8080/#/c/21131/6//COMMIT_MSG@18
PS6, Line 18: Additionally, the function clears the output string
'*out' was assigned to also before this change, so its old contents were 
discarded. This sentence could be removed.


http://gerrit.cloudera.org:8080/#/c/21131/3/be/src/util/coding-util.cc
File be/src/util/coding-util.cc:

http://gerrit.cloudera.org:8080/#/c/21131/3/be/src/util/coding-util.cc@48
PS3, Line 48: {'#', "%23"},
> Should be const.
This comment has not been addressed.


http://gerrit.cloudera.org:8080/#/c/21131/6/be/src/util/coding-util.cc
File be/src/util/coding-util.cc:

http://gerrit.cloudera.org:8080/#/c/21131/6/be/src/util/coding-util.cc@23
PS6, Line 23: #include 
I don't think we need iostream.


http://gerrit.cloudera.org:8080/#/c/21131/6/be/src/util/coding-util.cc@46
PS6, Line 46: std::unordered_map
Could also be static. Alternatively, we could put all these static functions 
and variables into an anonymous namespace, which is a more modern way of 
ensuring that these are not visible outside this translation unit.


http://gerrit.cloudera.org:8080/#/c/21131/6/be/src/util/coding-util.cc@55
PS6, Line 55: '\xFF
The problem in the old code was that "\u00FF" translates to two bytes: 
[194,191]. Note that the notation '\u' is Java-specific. The process is 
broken because the code takes the input bytes one by one.

The hanzi character '?', in UTF-8, is [232, 191, 144]. Its second character, 
191, matched with the second character of "\u00FF" and that caused the problem.

I don't know why '\xFF' was included in the first place. If we'd like to handle 
UTF-8 only, then '\xFF' is not a valid byte. If we'd like to handle non-UTF-8 
bytes, then we should handle all other bytes > 127.

After some digging in Hive, I found this commit that expanded the list of 
escaped characters: 
https://github.com/apache/hive/commit/32b046bf93e3d041b2bb07a1c9b4e1ef5d977ddf
The list before that commit is very similar to what we have here in the old 
code, except they have '\u007F' instead of '\u00FF'. That makes more sense as 
that is a control character for DELETE, see 
https://www.compart.com/en/unicode/U+007F.

If we remove '\xFF', we can see that the escaped values in the map are the same 
as how we HEX encode values on L85, so there is no need for these values to be 
included in a map. Instead, we could have a set (std::unordered_set) with the 
keys in the current map (without '\xFF' but including '\x7F'), and we would 
encode input characters if they are in the set (taking into account hive_compat 
of course). That would also simplify the code.


http://gerrit.cloudera.org:8080/#/c/21131/6/be/src/util/coding-util.cc@66
PS6, Line 66:   (*out).clear();
No need to clear it if it is assigned to at the end of the function.


http://gerrit.cloudera.org:8080/#/c/21131/6/be/src/util/coding-util.cc@69
PS6, Line 69:   for (int i = 0; i < in_len; ++i) {
Could use a range-based for-loop: for (char ch : in) {...}.


http://gerrit.cloudera.org:8080/#/c/21131/6/be/src/util/coding-util.cc@74
PS6, Line 74: is not
: // alphanumeric or one of the commonly excluded characters
Could be misleading: does "not" also apply to "one of the commonly..."? If not, 
we could write for example "is not alphanumeric or is one of the commonly 
excluded ...".


http://gerrit.cloudera.org:8080/#/c/21131/6/be/src/util/coding-util.cc@80
PS6, Line 80: !(std::isalnum(ch) || ShouldNotEscape(ch))
This is easier to understand if converted to a form without parentheses:

!std::isalnum(ch) && !ShouldNotEscape(ch)


http://gerrit.cloudera.org:8080/#/c/21131/6/be/src/util/coding-util.cc@85
PS6, Line 85: std::uppercase << std::hex
We could put std::uppercase and std::hex after L68, before the loop, as these 
settings don't get reset automatically, so no need to set them every time. On 
the other hand, I would include a comment stating that this only applies to 
when integer values are inserted and not when char values, therefore it doesn't 
cause a problem when not escaping characters (like on L88).


http://gerrit.cloudera.org:8080/#/c/21131/3/testdata/workloads/functional-query/queries/QueryTest/unicode-column-name.test
File 
testdata/workloads/functional-query/queries/QueryTest/unicode-column-name.test:


[Impala-ASF-CR] IMPALA-12973,IMPALA-11491,IMPALA-12651: Support BINARY nested in complex types in select list

2024-04-25 Thread Daniel Becker (Code Review)
Daniel Becker has uploaded a new patch set (#11). ( 
http://gerrit.cloudera.org:8080/21269 )

Change subject: IMPALA-12973,IMPALA-11491,IMPALA-12651: Support BINARY nested 
in complex types in select list
..

IMPALA-12973,IMPALA-11491,IMPALA-12651: Support BINARY nested in complex types 
in select list

Binary fields in complex types are currently not supported at all for
regular tables (an error is returned). For Iceberg metadata tables,
IMPALA-12899 added a temporary workaround to allow queries that contain
these fields to succeed by NULLing them out. This change adds support
for displaying them with base64 encoding for both regular and Iceberg
metadata tables.

Complex types are displayed in JSON format, so simply inserting the
bytes of the binary fields is not acceptable as it would produce invalid
JSON. Base64 is a widely used encoding that allows representing
arbitrary binary information using only a limited set of ASCII
characters.

This change also adds support for top level binary columns in Iceberg
metadata tables. However, these are not base64 encoded but are returned
in raw byte format - this is consistent with how top level binary
columns from regular (non-metadata) tables are handled.

Testing:
 - added test queries in iceberg-metadata-tables.test referencing both
   nested and top level binary fields; also updated existing queries
 - moved relevant tests (queries extracting binary fields from within
   complex types) from nested-types-scanner-basic.test to a new
   binary-in-complex-type.test file and also added a query that selects
   the containing complex types; this new test file is run from
   test_scanners.py::TestBinaryInComplexType::\
 test_binary_in_complex_type
 - moved negative tests in AnalyzerTest.TestUnsupportedTypes() to
   AnalyzeStmtsTest.TestComplexTypesInSelectList() and converted them to
   positive tests (expecting success); a negative test already in
   AnalyzeStmtsTest.TestComplexTypesInSelectList() was also converted

Change-Id: I7b1d7fa332a901f05a46e0199e13fb841d2687c2
---
M be/src/exec/iceberg-metadata/iceberg-metadata-scanner.cc
M be/src/exec/iceberg-metadata/iceberg-metadata-scanner.h
M be/src/exec/iceberg-metadata/iceberg-row-reader.cc
M be/src/exec/iceberg-metadata/iceberg-row-reader.h
M be/src/rpc/jni-thrift-util.h
M be/src/runtime/complex-value-writer.inline.h
M be/src/util/jni-util.cc
M be/src/util/jni-util.h
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/SlotRef.java
M fe/src/main/java/org/apache/impala/util/IcebergMetadataScanner.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java
M testdata/data/README
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/data/0-0-data-danielbecker_20240408174043_c3737eaf-db30-4b88-aafb-f23c0f3c1dd3-job_17125053806420_0002-1-1.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/64da0e56-efa3-4025-bef1-1047fdd9a2b0-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/snap-3079551887386250470-1-64da0e56-efa3-4025-bef1-1047fdd9a2b0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/v1.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/v2.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/version-hint.txt
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
A 
testdata/workloads/functional-query/queries/QueryTest/binary-in-complex-type.test
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test
M 
testdata/workloads/functional-query/queries/QueryTest/nested-types-scanner-basic.test
M tests/query_test/test_scanners.py
26 files changed, 441 insertions(+), 157 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/21269/11
--
To view, visit http://gerrit.cloudera.org:8080/21269
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7b1d7fa332a901f05a46e0199e13fb841d2687c2
Gerrit-Change-Number: 21269
Gerrit-PatchSet: 11
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 


[Impala-ASF-CR] IMPALA-13029: Tests for multi format equality deletes

2024-04-24 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21348 )

Change subject: IMPALA-13029: Tests for multi format equality deletes
..


Patch Set 2: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/21348
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7f0ebf7f4d401877741eb3e1c990f1318ac2b4ba
Gerrit-Change-Number: 21348
Gerrit-PatchSet: 2
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 24 Apr 2024 15:32:02 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12973,IMPALA-11491,IMPALA-12651: Support BINARY nested in complex types in select list

2024-04-24 Thread Daniel Becker (Code Review)
Daniel Becker has uploaded a new patch set (#10). ( 
http://gerrit.cloudera.org:8080/21269 )

Change subject: IMPALA-12973,IMPALA-11491,IMPALA-12651: Support BINARY nested 
in complex types in select list
..

IMPALA-12973,IMPALA-11491,IMPALA-12651: Support BINARY nested in complex types 
in select list

Binary fields in complex types are currently not supported at all for
regular tables (an error is returned). For Iceberg metadata tables,
IMPALA-12899 added a temporary workaround to allow queries that contain
these fields to succeed by NULLing them out. This change adds support
for displaying them with base64 encoding for both regular and Iceberg
metadata tables.

Complex types are displayed in JSON format, so simply inserting the
bytes of the binary fields is not acceptable as it would produce invalid
JSON. Base64 is a widely used encoding that allows representing
arbitrary binary information using only a limited set of ASCII
characters.

This change also adds support for top level binary columns in Iceberg
metadata tables. However, these are not base64 encoded but are returned
in raw byte format - this is consistent with how top level binary
columns from regular (non-metadata) tables are handled.

Testing:
 - added test queries in iceberg-metadata-tables.test referencing both
   nested and top level binary fields; also updated existing queries
 - moved relevant tests (queries extracting binary fields from within
   complex types) from nested-types-scanner-basic.test to a new
   binary-in-complex-type.test file and also added a query that selects
   the containing complex types; this new test file is run from
   test_scanners.py::TestBinaryInComplexType::\
 test_binary_in_complex_type
 - moved negative tests in AnalyzerTest.TestUnsupportedTypes() to
   AnalyzeStmtsTest.TestComplexTypesInSelectList() and converted them to
   positive tests (expecting success); a negative test already in
   AnalyzeStmtsTest.TestComplexTypesInSelectList() was also converted

Change-Id: I7b1d7fa332a901f05a46e0199e13fb841d2687c2
---
M be/src/exec/iceberg-metadata/iceberg-metadata-scanner.cc
M be/src/exec/iceberg-metadata/iceberg-metadata-scanner.h
M be/src/exec/iceberg-metadata/iceberg-row-reader.cc
M be/src/exec/iceberg-metadata/iceberg-row-reader.h
M be/src/rpc/jni-thrift-util.h
M be/src/runtime/complex-value-writer.inline.h
M be/src/util/jni-util.cc
M be/src/util/jni-util.h
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/SlotRef.java
M fe/src/main/java/org/apache/impala/util/IcebergMetadataScanner.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java
M testdata/data/README
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/data/0-0-data-danielbecker_20240408174043_c3737eaf-db30-4b88-aafb-f23c0f3c1dd3-job_17125053806420_0002-1-1.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/64da0e56-efa3-4025-bef1-1047fdd9a2b0-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/snap-3079551887386250470-1-64da0e56-efa3-4025-bef1-1047fdd9a2b0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/v1.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/v2.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/version-hint.txt
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
A 
testdata/workloads/functional-query/queries/QueryTest/binary-in-complex-type.test
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test
M 
testdata/workloads/functional-query/queries/QueryTest/nested-types-scanner-basic.test
M tests/query_test/test_scanners.py
26 files changed, 441 insertions(+), 157 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/21269/10
--
To view, visit http://gerrit.cloudera.org:8080/21269
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7b1d7fa332a901f05a46e0199e13fb841d2687c2
Gerrit-Change-Number: 21269
Gerrit-PatchSet: 10
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 


[Impala-ASF-CR] IMPALA-12950: Improve error message in case of out-of-range numeric conversions

2024-04-23 Thread Daniel Becker (Code Review)
Daniel Becker has uploaded a new patch set (#3). ( 
http://gerrit.cloudera.org:8080/21331 )

Change subject: IMPALA-12950: Improve error message in case of out-of-range 
numeric conversions
..

IMPALA-12950: Improve error message in case of out-of-range numeric conversions

IMPALA-12035 introduced checks for numeric conversions that are unsafe
and can fail (if the target type cannot store the value, the behaviour
is undefined):
 - from floating-point types to integer types
 - from double to float

However, it can be difficult to trace which part of the query caused
this based on the error message. This change adds the source type, the
destination type and the value to be converted to the error message.
Unfortunately, at this point in the BE, the original SQL is not
available, so we cannot reference that.

Testing:
 - extended existing tests in expr-test.cc.

Change-Id: Ieeed52e25f155818c35c11a8a6821708476ffb32
---
M be/src/exprs/cast-functions-ir.cc
M be/src/exprs/expr-test.cc
M be/src/udf/udf.h
3 files changed, 85 insertions(+), 24 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/31/21331/3
--
To view, visit http://gerrit.cloudera.org:8080/21331
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ieeed52e25f155818c35c11a8a6821708476ffb32
Gerrit-Change-Number: 21331
Gerrit-PatchSet: 3
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Peter Rozsa 


[Impala-ASF-CR] IMPALA-12950: Improve error message in case of out-of-range numeric conversions

2024-04-23 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21331 )

Change subject: IMPALA-12950: Improve error message in case of out-of-range 
numeric conversions
..


Patch Set 3:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/21331/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/21331/2//COMMIT_MSG@7
PS2, Line 7: :
> nit: missing whitespace
Done


http://gerrit.cloudera.org:8080/#/c/21331/2//COMMIT_MSG@12
PS2, Line 12: floating-point
> nit: floating-point
Done


http://gerrit.cloudera.org:8080/#/c/21331/2/be/src/exprs/cast-functions-ir.cc
File be/src/exprs/cast-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/21331/2/be/src/exprs/cast-functions-ir.cc@76
PS2, Line 76:   } else {
> The default case is missing, it could be added as "UNKNOWN TYPE" or somethi
It's a good point. These are the only types we'd like to cover here, so 
originally I wanted to add
  static_assert(false);
but it doesn't compile. I could write
  static_assert(!std::is_same_v), which is always false, however this 
link suggests even that may be ill-formed: 
https://stackoverflow.com/questions/38304847/constexpr-if-and-static-assert

On the other hand, this latter approach seems to work in practice, but I'm not 
sure we should do that if it's not guaranteed. Leaving out the default case 
deterministically leads to a warning about no return statement so I did it like 
this.

Maybe adding a DCHECK would be best, but I don't like that there doesn't seem 
to be a clean and concise way of doing it compile time without repeating the 
types (e.g. static_assert that T is one of the types, or some SFINAE magic).

In the new patch set I went with static_assert(!std::is_same_v), it 
correctly fires when I try to instantiate the template with a different type, 
but I don't know if it's guaranteed or not. What do you think?


http://gerrit.cloudera.org:8080/#/c/21331/2/be/src/exprs/cast-functions-ir.cc@182
PS2, Line 182: constexpr const char* FROM_TYPE_NAME = 
TypeToName();
> Could you please add test cases to cover each condition?
Extended existing tests and added some new ones in expr-test.cc.



--
To view, visit http://gerrit.cloudera.org:8080/21331
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ieeed52e25f155818c35c11a8a6821708476ffb32
Gerrit-Change-Number: 21331
Gerrit-PatchSet: 3
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Peter Rozsa 
Gerrit-Comment-Date: Tue, 23 Apr 2024 13:03:07 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12973,IMPALA-11491,IMPALA-12651: Support BINARY nested in complex types in select list

2024-04-22 Thread Daniel Becker (Code Review)
Daniel Becker has uploaded a new patch set (#8). ( 
http://gerrit.cloudera.org:8080/21269 )

Change subject: IMPALA-12973,IMPALA-11491,IMPALA-12651: Support BINARY nested 
in complex types in select list
..

IMPALA-12973,IMPALA-11491,IMPALA-12651: Support BINARY nested in complex types 
in select list

Binary fields in complex types are currently not supported at all for
regular tables (an error is returned). For Iceberg metadata tables,
IMPALA-12899 added a temporary workaround to allow queries that contain
these fields to succeed by NULLing them out. This change adds support
for displaying them with base64 encoding for both regular and Iceberg
metadata tables.

Complex types are displayed in JSON format, so simply inserting the
bytes of the binary fields is not acceptable as it would produce invalid
JSON. Base64 is a widely used encoding that allows representing
arbitrary binary information using only a limited set of ASCII
characters.

This change also adds support for top level binary columns in Iceberg
metadata tables. However, these are not base64 encoded but are returned
in raw byte format - this is consistent with how top level binary
columns from regular (non-metadata) tables are handled.

Testing:
 - added test queries in iceberg-metadata-tables.test referencing both
   nested and top level binary fields; also updated existing queries
 - moved relevant tests (queries extracting binary fields from within
   complex types) from nested-types-scanner-basic.test to a new
   binary-in-complex-type.test file and also added a query that selects
   the containing complex types; this new test file is run from
   test_scanners.py::TestBinaryInComplexType::\
 test_binary_in_complex_type
 - moved negative tests in AnalyzerTest.TestUnsupportedTypes() to
   AnalyzeStmtsTest.TestComplexTypesInSelectList() and converted them to
   positive tests (expecting success); a negative test already in
   AnalyzeStmtsTest.TestComplexTypesInSelectList() was also converted

Change-Id: I7b1d7fa332a901f05a46e0199e13fb841d2687c2
---
M be/src/exec/iceberg-metadata/iceberg-metadata-scanner.cc
M be/src/exec/iceberg-metadata/iceberg-metadata-scanner.h
M be/src/exec/iceberg-metadata/iceberg-row-reader.cc
M be/src/exec/iceberg-metadata/iceberg-row-reader.h
M be/src/rpc/jni-thrift-util.h
M be/src/runtime/complex-value-writer.inline.h
M be/src/util/jni-util.cc
M be/src/util/jni-util.h
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/SlotRef.java
M fe/src/main/java/org/apache/impala/util/IcebergMetadataScanner.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java
M testdata/data/README
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/data/0-0-data-danielbecker_20240408174043_c3737eaf-db30-4b88-aafb-f23c0f3c1dd3-job_17125053806420_0002-1-1.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/64da0e56-efa3-4025-bef1-1047fdd9a2b0-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/snap-3079551887386250470-1-64da0e56-efa3-4025-bef1-1047fdd9a2b0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/v1.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/v2.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/version-hint.txt
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
A 
testdata/workloads/functional-query/queries/QueryTest/binary-in-complex-type.test
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test
M 
testdata/workloads/functional-query/queries/QueryTest/nested-types-scanner-basic.test
M tests/query_test/test_scanners.py
26 files changed, 439 insertions(+), 155 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/21269/8
--
To view, visit http://gerrit.cloudera.org:8080/21269
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7b1d7fa332a901f05a46e0199e13fb841d2687c2
Gerrit-Change-Number: 21269
Gerrit-PatchSet: 8
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 


[Impala-ASF-CR] IMPALA-12973,IMPALA-11491,IMPALA-12651: Support BINARY nested in complex types in select list

2024-04-22 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21269 )

Change subject: IMPALA-12973,IMPALA-11491,IMPALA-12651: Support BINARY nested 
in complex types in select list
..


Patch Set 7:

(2 comments)

Thanks for the review, Gábor!

http://gerrit.cloudera.org:8080/#/c/21269/7/be/src/exec/iceberg-metadata/iceberg-row-reader.cc
File be/src/exec/iceberg-metadata/iceberg-row-reader.cc:

http://gerrit.cloudera.org:8080/#/c/21269/7/be/src/exec/iceberg-metadata/iceberg-row-reader.cc@227
PS7, Line 227:   RETURN_IF_ERROR(GuardType::create(env, jbuffer, 
_guard));
> if create() returned an error here we'd leak memory because of 'jbuffer', r
No, the type of 'jbuffer' is jbyteArray or jstring, and they are local JNI 
object references which will be freed by the JVM. Only the buffer returned by 
env->GetByteArrayElements() or env->GetStringUTFChars() (called within 
create()) has to be freed manually, that's what 'jbuffer_guard' takes care of.


http://gerrit.cloudera.org:8080/#/c/21269/7/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test
File 
testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test:

http://gerrit.cloudera.org:8080/#/c/21269/7/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test@804
PS7, Line 804: select data_file from 
functional_parquet.iceberg_query_metadata.entries;
> The query should go into the QUERY section
Thanks for catching it.



--
To view, visit http://gerrit.cloudera.org:8080/21269
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7b1d7fa332a901f05a46e0199e13fb841d2687c2
Gerrit-Change-Number: 21269
Gerrit-PatchSet: 7
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Comment-Date: Mon, 22 Apr 2024 14:26:29 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-13000: Document OPTIMIZE TABLE

2024-04-22 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21320 )

Change subject: IMPALA-13000: Document OPTIMIZE TABLE
..


Patch Set 3: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/21320
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I851669686ed4da610dcac97c9b88ff23b0a4a647
Gerrit-Change-Number: 21320
Gerrit-PatchSet: 3
Gerrit-Owner: Noemi Pap-Takacs 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 22 Apr 2024 10:40:35 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12950:Improve error message in case of out-of-range numeric conversions

2024-04-18 Thread Daniel Becker (Code Review)
Daniel Becker has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/21331


Change subject: IMPALA-12950:Improve error message in case of out-of-range 
numeric conversions
..

IMPALA-12950:Improve error message in case of out-of-range numeric conversions

IMPALA-12035 introduced checks for numeric conversions that are unsafe
and can fail (if the target type cannot store the value, the behaviour
is undefined):
 - from floating point types to integer types
 - from double to float

However, it can be difficult to trace which part of the query caused
this based on the error message. This change adds the source type, the
destination type and the value to be converted to the error message.
Unfortunately, at this point in the BE, the original SQL is not
available, so we cannot reference that.

Change-Id: Ieeed52e25f155818c35c11a8a6821708476ffb32
---
M be/src/exprs/cast-functions-ir.cc
M be/src/udf/udf.h
2 files changed, 35 insertions(+), 8 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/31/21331/2
--
To view, visit http://gerrit.cloudera.org:8080/21331
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ieeed52e25f155818c35c11a8a6821708476ffb32
Gerrit-Change-Number: 21331
Gerrit-PatchSet: 2
Gerrit-Owner: Daniel Becker 


[Impala-ASF-CR] IMPALA-13004: Fix heap-use-after-free error in ExprTest AiFunctionsTest

2024-04-18 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21315 )

Change subject: IMPALA-13004: Fix heap-use-after-free error in ExprTest 
AiFunctionsTest
..


Patch Set 3: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/21315
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3bb9dcf9d72cce7ad37d5bc25821cf6ee55a8ab5
Gerrit-Change-Number: 21315
Gerrit-PatchSet: 3
Gerrit-Owner: Yida Wu 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Yida Wu 
Gerrit-Comment-Date: Thu, 18 Apr 2024 13:53:35 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-13000: Document OPTIMIZE TABLE

2024-04-18 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21320 )

Change subject: IMPALA-13000: Document OPTIMIZE TABLE
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/21320/1/docs/topics/impala_iceberg.xml
File docs/topics/impala_iceberg.xml:

http://gerrit.cloudera.org:8080/#/c/21320/1/docs/topics/impala_iceberg.xml@556
PS1, Line 556: able_na
> [] is quite standard notation, and we are using it extensively in the Impal
I'm also okay with leaving [db_name]. I think a separate top-level page or even 
just a paragraph showing the proper syntax would be even better.


http://gerrit.cloudera.org:8080/#/c/21320/2/docs/topics/impala_iceberg.xml
File docs/topics/impala_iceberg.xml:

http://gerrit.cloudera.org:8080/#/c/21320/2/docs/topics/impala_iceberg.xml@566
PS2, Line 566: using
If you want to make it even clearer that all files are rewritten (not just the 
ones with the latest schema), you could write "rewrite all files, converting 
them (if necessary) to the latest table schema".
I'm not sure it's needed, I'm also okay with the current wording.



--
To view, visit http://gerrit.cloudera.org:8080/21320
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I851669686ed4da610dcac97c9b88ff23b0a4a647
Gerrit-Change-Number: 21320
Gerrit-PatchSet: 2
Gerrit-Owner: Noemi Pap-Takacs 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 18 Apr 2024 13:20:58 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12973,IMPALA-11491,IMPALA-12651: Support BINARY nested in complex types in select list

2024-04-18 Thread Daniel Becker (Code Review)
Daniel Becker has uploaded a new patch set (#7). ( 
http://gerrit.cloudera.org:8080/21269 )

Change subject: IMPALA-12973,IMPALA-11491,IMPALA-12651: Support BINARY nested 
in complex types in select list
..

IMPALA-12973,IMPALA-11491,IMPALA-12651: Support BINARY nested in complex types 
in select list

Binary fields in complex types are currently not supported at all for
regular tables (an error is returned). For Iceberg metadata tables,
IMPALA-12899 added a temporary workaround to allow queries that contain
these fields to succeed by NULLing them out. This change adds support
for displaying them with base64 encoding for both regular and Iceberg
metadata tables.

Complex types are displayed in JSON format, so simply inserting the
bytes of the binary fields is not acceptable as it would produce invalid
JSON. Base64 is a widely used encoding that allows representing
arbitrary binary information using only a limited set of ASCII
characters.

This change also adds support for top level binary columns in Iceberg
metadata tables. However, these are not base64 encoded but are returned
in raw byte format - this is consistent with how top level binary
columns from regular (non-metadata) tables are handled.

Testing:
 - added test queries in iceberg-metadata-tables.test referencing both
   nested and top level binary fields; also updated existing queries
 - moved relevant tests (queries extracting binary fields from within
   complex types) from nested-types-scanner-basic.test to a new
   binary-in-complex-type.test file and also added a query that selects
   the containing complex types; this new test file is run from
   test_scanners.py::TestBinaryInComplexType::\
 test_binary_in_complex_type
 - moved negative tests in AnalyzerTest.TestUnsupportedTypes() to
   AnalyzeStmtsTest.TestComplexTypesInSelectList() and converted them to
   positive tests (expecting success); a negative test already in
   AnalyzeStmtsTest.TestComplexTypesInSelectList() was also converted

Change-Id: I7b1d7fa332a901f05a46e0199e13fb841d2687c2
---
M be/src/exec/iceberg-metadata/iceberg-metadata-scanner.cc
M be/src/exec/iceberg-metadata/iceberg-metadata-scanner.h
M be/src/exec/iceberg-metadata/iceberg-row-reader.cc
M be/src/exec/iceberg-metadata/iceberg-row-reader.h
M be/src/rpc/jni-thrift-util.h
M be/src/runtime/complex-value-writer.inline.h
M be/src/util/jni-util.cc
M be/src/util/jni-util.h
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/SlotRef.java
M fe/src/main/java/org/apache/impala/util/IcebergMetadataScanner.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java
M testdata/data/README
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/data/0-0-data-danielbecker_20240408174043_c3737eaf-db30-4b88-aafb-f23c0f3c1dd3-job_17125053806420_0002-1-1.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/64da0e56-efa3-4025-bef1-1047fdd9a2b0-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/snap-3079551887386250470-1-64da0e56-efa3-4025-bef1-1047fdd9a2b0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/v1.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/v2.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/version-hint.txt
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
A 
testdata/workloads/functional-query/queries/QueryTest/binary-in-complex-type.test
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test
M 
testdata/workloads/functional-query/queries/QueryTest/nested-types-scanner-basic.test
M tests/query_test/test_scanners.py
26 files changed, 439 insertions(+), 155 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/21269/7
--
To view, visit http://gerrit.cloudera.org:8080/21269
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7b1d7fa332a901f05a46e0199e13fb841d2687c2
Gerrit-Change-Number: 21269
Gerrit-PatchSet: 7
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 


[Impala-ASF-CR] IMPALA-13004: Fix heap-use-after-free error in ExprTest AiFunctionsTest

2024-04-18 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21315 )

Change subject: IMPALA-13004: Fix heap-use-after-free error in ExprTest 
AiFunctionsTest
..


Patch Set 2: Code-Review+1

(2 comments)

Thanks, just some nits.

http://gerrit.cloudera.org:8080/#/c/21315/2/be/src/exprs/ai-functions.inline.h
File be/src/exprs/ai-functions.inline.h:

http://gerrit.cloudera.org:8080/#/c/21315/2/be/src/exprs/ai-functions.inline.h@108
PS2, Line 108: move
'Move' is good from the context of the change, but if someone is reading the 
new code it's a bit strange. I think "place" or "put" would be better.


http://gerrit.cloudera.org:8080/#/c/21315/2/be/src/exprs/ai-functions.inline.h@108
PS2, Line 108: loop
Nit: it is not a loop, I wrote it wrong in my comment. "'if' statement" would 
be better.



--
To view, visit http://gerrit.cloudera.org:8080/21315
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3bb9dcf9d72cce7ad37d5bc25821cf6ee55a8ab5
Gerrit-Change-Number: 21315
Gerrit-PatchSet: 2
Gerrit-Owner: Yida Wu 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Yida Wu 
Gerrit-Comment-Date: Thu, 18 Apr 2024 12:54:05 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-13004: Fix heap-use-after-free error in ExprTest AiFunctionsTest

2024-04-17 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21315 )

Change subject: IMPALA-13004: Fix heap-use-after-free error in ExprTest 
AiFunctionsTest
..


Patch Set 1:

(2 comments)

Thanks Yida, it looks good.

http://gerrit.cloudera.org:8080/#/c/21315/1/be/src/exprs/ai-functions-ir.cc
File be/src/exprs/ai-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/21315/1/be/src/exprs/ai-functions-ir.cc@88
PS1, Line 88: string
Nit: we could keep the "std::" prefix, it is also used at L81 and L74.


http://gerrit.cloudera.org:8080/#/c/21315/1/be/src/exprs/ai-functions.inline.h
File be/src/exprs/ai-functions.inline.h:

http://gerrit.cloudera.org:8080/#/c/21315/1/be/src/exprs/ai-functions.inline.h@107
PS1, Line 107:   Document overrides;
Is it intentional that 'overrides' is taken out of the loop? Is it because 
'payload' references data that 'overrides' owns?



--
To view, visit http://gerrit.cloudera.org:8080/21315
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3bb9dcf9d72cce7ad37d5bc25821cf6ee55a8ab5
Gerrit-Change-Number: 21315
Gerrit-PatchSet: 1
Gerrit-Owner: Yida Wu 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 17 Apr 2024 15:24:19 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-13000: Document OPTIMIZE TABLE

2024-04-17 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21320 )

Change subject: IMPALA-13000: Document OPTIMIZE TABLE
..


Patch Set 1:

(5 comments)

Thanks Noémi!

http://gerrit.cloudera.org:8080/#/c/21320/1/docs/topics/impala_iceberg.xml
File docs/topics/impala_iceberg.xml:

http://gerrit.cloudera.org:8080/#/c/21320/1/docs/topics/impala_iceberg.xml@553
PS1, Line 553: many small,
 : delete
It doesn't feel right for me, maybe "can write many small data files and/(as 
well as) delete files.


http://gerrit.cloudera.org:8080/#/c/21320/1/docs/topics/impala_iceberg.xml@554
PS1, Line 554: update
Do we have 'update files'? My understanding is that an update adds new data 
files and delete files. If this is correct, I don't think we should call them 
'update files'. Of course we should mention that updates (as well as deletes 
and inserts) contribute to the problem.


http://gerrit.cloudera.org:8080/#/c/21320/1/docs/topics/impala_iceberg.xml@556
PS1, Line 556: db_name
Do we always have to use the fully qualified name? If not, it may not be 
important to include the database name here.


http://gerrit.cloudera.org:8080/#/c/21320/1/docs/topics/impala_iceberg.xml@577
PS1, Line 577:   Views cannot be optimized.
Maybe take it out of the list into a separate paragraph because it doesn't fit 
well with the introduction of the list ("To execute table optimization:").


http://gerrit.cloudera.org:8080/#/c/21320/1/docs/topics/impala_iceberg.xml@585
PS1, Line 585: 
Do you think it would be worth to add a warning that because OPTIMIZE re-writes 
the whole table, it can take a long time, so it shouldn't be called too often?



--
To view, visit http://gerrit.cloudera.org:8080/21320
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I851669686ed4da610dcac97c9b88ff23b0a4a647
Gerrit-Change-Number: 21320
Gerrit-PatchSet: 1
Gerrit-Owner: Noemi Pap-Takacs 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 17 Apr 2024 13:47:28 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-13008: test metadata tables failed in Ubuntu 20 build

2024-04-17 Thread Daniel Becker (Code Review)
Daniel Becker has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/21317


Change subject: IMPALA-13008: test_metadata_tables failed in Ubuntu 20 build
..

IMPALA-13008: test_metadata_tables failed in Ubuntu 20 build

TestIcebergV2Table::test_metadata_tables failed in Ubuntu 20 build in a
release candidate because the file sizes in some queries didn't match
the expected ones. As Impala writes its version into the Parquet files
it writes, the file sizes can change with the release (especially as
SNAPSHOT or RELEASE is part of the full version, and their lengths
differ).

This change updates the failing tests to take regexes for the file sizes
instead of concrete values.

Change-Id: Iad8fd0d9920034e7dbe6c605bed7579fbe3b5b1f
---
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test
1 file changed, 7 insertions(+), 7 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/17/21317/1
--
To view, visit http://gerrit.cloudera.org:8080/21317
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Iad8fd0d9920034e7dbe6c605bed7579fbe3b5b1f
Gerrit-Change-Number: 21317
Gerrit-PatchSet: 1
Gerrit-Owner: Daniel Becker 


[Impala-ASF-CR] IMPALA-13002: Iceberg V2 tables with Avro delete files aren't read properly

2024-04-16 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21301 )

Change subject: IMPALA-13002: Iceberg V2 tables with Avro delete files aren't 
read properly
..


Patch Set 1: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/21301
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iff13198991caf32c51cd9e0ace4454fd00216cf6
Gerrit-Change-Number: 21301
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Tue, 16 Apr 2024 09:48:05 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12973,IMPALA-11491,IMPALA-12651: Support BINARY nested in complex types in select list

2024-04-15 Thread Daniel Becker (Code Review)
Daniel Becker has uploaded a new patch set (#6). ( 
http://gerrit.cloudera.org:8080/21269 )

Change subject: IMPALA-12973,IMPALA-11491,IMPALA-12651: Support BINARY nested 
in complex types in select list
..

IMPALA-12973,IMPALA-11491,IMPALA-12651: Support BINARY nested in complex types 
in select list

Binary fields in complex types are currently not supported at all for
regular tables (an error is returned). For Iceberg metadata tables,
IMPALA-12899 added a temporary workaround to allow queries that contain
these fields to succeed by NULLing them out. This change adds support
for displaying them with base64 encoding for both regular and Iceberg
metadata tables.

Complex types are displayed in JSON format, so simply inserting the
bytes of the binary fields is not acceptable as it would produce invalid
JSON. Base64 is a widely used encoding that allows representing
arbitrary binary information using only a limited set of ASCII
characters.

This change also adds support for top level binary columns in Iceberg
metadata tables. However, these are not base64 encoded but are returned
in raw byte format - this is consistent with how top level binary
columns from regular (non-metadata) tables are handled.

Testing:
 - added test queries in iceberg-metadata-tables.test referencing both
   nested and top level binary fields; also updated existing queries
 - moved relevant tests (queries extracting binary fields from within
   complex types) from nested-types-scanner-basic.test to a new
   binary-in-complex-type.test file and also added a query that selects
   the containing complex types; this new test file is run from
   test_scanners.py::TestBinaryInComplexType::\
 test_binary_in_complex_type
 - moved negative tests in AnalyzerTest.TestUnsupportedTypes() to
   AnalyzeStmtsTest.TestComplexTypesInSelectList() and converted them to
   positive tests (expecting success); a negative test already in
   AnalyzeStmtsTest.TestComplexTypesInSelectList() was also converted

Change-Id: I7b1d7fa332a901f05a46e0199e13fb841d2687c2
---
M be/src/exec/iceberg-metadata/iceberg-metadata-scanner.cc
M be/src/exec/iceberg-metadata/iceberg-metadata-scanner.h
M be/src/exec/iceberg-metadata/iceberg-row-reader.cc
M be/src/exec/iceberg-metadata/iceberg-row-reader.h
M be/src/rpc/jni-thrift-util.h
M be/src/runtime/complex-value-writer.inline.h
M be/src/util/jni-util.cc
M be/src/util/jni-util.h
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/SlotRef.java
M fe/src/main/java/org/apache/impala/util/IcebergMetadataScanner.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java
M testdata/data/README
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/data/0-0-data-danielbecker_20240408174043_c3737eaf-db30-4b88-aafb-f23c0f3c1dd3-job_17125053806420_0002-1-1.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/64da0e56-efa3-4025-bef1-1047fdd9a2b0-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/snap-3079551887386250470-1-64da0e56-efa3-4025-bef1-1047fdd9a2b0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/v1.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/v2.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/version-hint.txt
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
A 
testdata/workloads/functional-query/queries/QueryTest/binary-in-complex-type.test
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test
M 
testdata/workloads/functional-query/queries/QueryTest/nested-types-scanner-basic.test
M tests/query_test/test_scanners.py
26 files changed, 439 insertions(+), 154 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/21269/6
--
To view, visit http://gerrit.cloudera.org:8080/21269
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7b1d7fa332a901f05a46e0199e13fb841d2687c2
Gerrit-Change-Number: 21269
Gerrit-PatchSet: 6
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 


[Impala-ASF-CR] IMPALA-12996: Add support for DATE in Iceberg metadata tables

2024-04-12 Thread Daniel Becker (Code Review)
Daniel Becker has uploaded a new patch set (#3). ( 
http://gerrit.cloudera.org:8080/21292 )

Change subject: IMPALA-12996: Add support for DATE in Iceberg metadata tables
..

IMPALA-12996: Add support for DATE in Iceberg metadata tables

DATE fields in Iceberg metadata tables were NULLed out before this
change. This change adds support for displaying their actual values.

DATE fields are stored as 32-bit integers (storing the number of days
since the epoch), so they are handled similarly to INTS, but if they are
out of the valid DATE range, their value is set to
DateValue::INVALID_DAYS_SINCE_EPOCH.

Tests:
 - added a test query and adjusted existing ones in
   iceberg-metadata-tables.test

Change-Id: Ib2223385f90555b1f9b22f3e27fa0e2489c3b9b5
---
M be/src/exec/iceberg-metadata/iceberg-row-reader.cc
M be/src/exec/iceberg-metadata/iceberg-row-reader.h
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test
3 files changed, 52 insertions(+), 10 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/92/21292/3
--
To view, visit http://gerrit.cloudera.org:8080/21292
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib2223385f90555b1f9b22f3e27fa0e2489c3b9b5
Gerrit-Change-Number: 21292
Gerrit-PatchSet: 3
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-12996: Add support for DATE in Iceberg metadata tables

2024-04-12 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21292 )

Change subject: IMPALA-12996: Add support for DATE in Iceberg metadata tables
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/21292/2/be/src/exec/iceberg-metadata/iceberg-row-reader.cc
File be/src/exec/iceberg-metadata/iceberg-row-reader.cc:

http://gerrit.cloudera.org:8080/#/c/21292/2/be/src/exec/iceberg-metadata/iceberg-row-reader.cc@97
PS2, Line 97: TYPE_DATE
> This shouldn't cause issue in practice, but it would be better to validate
Thanks for pointing it out, I added the check by using the DataValue 
constructor.



--
To view, visit http://gerrit.cloudera.org:8080/21292
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib2223385f90555b1f9b22f3e27fa0e2489c3b9b5
Gerrit-Change-Number: 21292
Gerrit-PatchSet: 3
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Fri, 12 Apr 2024 13:56:51 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12996: Add support for DATE in Iceberg metadata tables

2024-04-12 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21292 )

Change subject: IMPALA-12996: Add support for DATE in Iceberg metadata tables
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/21292/2/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test
File 
testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test:

http://gerrit.cloudera.org:8080/#/c/21292/2/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test@a805
PS2, Line 805:
This is no longer needed because complex types are now expanded by default for 
Iceberg metadata tables.



--
To view, visit http://gerrit.cloudera.org:8080/21292
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib2223385f90555b1f9b22f3e27fa0e2489c3b9b5
Gerrit-Change-Number: 21292
Gerrit-PatchSet: 2
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Comment-Date: Fri, 12 Apr 2024 12:48:49 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12996: Add support for DATE in Iceberg metadata tables

2024-04-12 Thread Daniel Becker (Code Review)
Daniel Becker has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/21292


Change subject: IMPALA-12996: Add support for DATE in Iceberg metadata tables
..

IMPALA-12996: Add support for DATE in Iceberg metadata tables

DATE fields in Iceberg metadata tables were NULLed out before this
change. This change adds support for displaying their actual values.
DATE fields are stored as 32-bit integers so they are handled as if they
were INTs.

Tests:
 - added a test query and adjusted existing ones in
   iceberg-metadata-tables.test

Change-Id: Ib2223385f90555b1f9b22f3e27fa0e2489c3b9b5
---
M be/src/exec/iceberg-metadata/iceberg-row-reader.cc
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test
2 files changed, 23 insertions(+), 7 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/92/21292/2
--
To view, visit http://gerrit.cloudera.org:8080/21292
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ib2223385f90555b1f9b22f3e27fa0e2489c3b9b5
Gerrit-Change-Number: 21292
Gerrit-PatchSet: 2
Gerrit-Owner: Daniel Becker 


[Impala-ASF-CR] IMPALA-12973,IMPALA-11491,IMPALA-12651: Support BINARY nested in complex types in select list

2024-04-12 Thread Daniel Becker (Code Review)
Daniel Becker has uploaded a new patch set (#4). ( 
http://gerrit.cloudera.org:8080/21269 )

Change subject: IMPALA-12973,IMPALA-11491,IMPALA-12651: Support BINARY nested 
in complex types in select list
..

IMPALA-12973,IMPALA-11491,IMPALA-12651: Support BINARY nested in complex types 
in select list

Binary fields in complex types are currently not supported at all for
regular tables (an error is returned). For Iceberg metadata tables,
IMPALA-12899 added a temporary workaround to allow queries that contain
these fields to succeed by NULLing them out. This change adds support
for displaying them with base64 encoding for both regular and Iceberg
metadata tables.

Complex types are displayed in JSON format, so simply inserting the
bytes of the binary fields is not acceptable as it would produce invalid
JSON. Base64 is a widely used encoding that allows representing
arbitrary binary information using only a limited set of ASCII
characters.

This change also adds support for top level binary columns in Iceberg
metadata tables. However, these are not base64 encoded but are returned
in raw byte format - this is consistent with how top level binary
columns from regular (non-metadata) tables are handled.

Testing:
 - added test queries in iceberg-metadata-tables.test referencing both
   nested and top level binary fields; also updated existing queries
 - moved relevant tests (queries extracting binary fields from within
   complex types) from nested-types-scanner-basic.test to a new
   binary-in-complex-type.test file and also added a query that selects
   the containing complex types; this new test file is run from
   test_scanners.py::TestBinaryInComplexType::\
 test_binary_in_complex_type
 - moved negative tests in AnalyzerTest.TestUnsupportedTypes() to
   AnalyzeStmtsTest.TestComplexTypesInSelectList() and converted them to
   positive tests (expecting success); a negative test already in
   AnalyzeStmtsTest.TestComplexTypesInSelectList() was also converted

Change-Id: I7b1d7fa332a901f05a46e0199e13fb841d2687c2
---
M be/src/exec/iceberg-metadata/iceberg-metadata-scanner.cc
M be/src/exec/iceberg-metadata/iceberg-metadata-scanner.h
M be/src/exec/iceberg-metadata/iceberg-row-reader.cc
M be/src/exec/iceberg-metadata/iceberg-row-reader.h
M be/src/rpc/jni-thrift-util.h
M be/src/runtime/complex-value-writer.inline.h
M be/src/util/jni-util.cc
M be/src/util/jni-util.h
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/SlotRef.java
M fe/src/main/java/org/apache/impala/util/IcebergMetadataScanner.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java
M testdata/data/README
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/data/0-0-data-danielbecker_20240408174043_c3737eaf-db30-4b88-aafb-f23c0f3c1dd3-job_17125053806420_0002-1-1.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/64da0e56-efa3-4025-bef1-1047fdd9a2b0-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/snap-3079551887386250470-1-64da0e56-efa3-4025-bef1-1047fdd9a2b0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/v1.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/v2.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/version-hint.txt
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
A 
testdata/workloads/functional-query/queries/QueryTest/binary-in-complex-type.test
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test
M 
testdata/workloads/functional-query/queries/QueryTest/nested-types-scanner-basic.test
M tests/query_test/test_scanners.py
26 files changed, 438 insertions(+), 154 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/21269/4
--
To view, visit http://gerrit.cloudera.org:8080/21269
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7b1d7fa332a901f05a46e0199e13fb841d2687c2
Gerrit-Change-Number: 21269
Gerrit-PatchSet: 4
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 


[Impala-ASF-CR] IMPALA-12973,IMPALA-11491,IMPALA-12651: Support BINARY nested in complex types in select list

2024-04-11 Thread Daniel Becker (Code Review)
Daniel Becker has uploaded a new patch set (#3). ( 
http://gerrit.cloudera.org:8080/21269 )

Change subject: IMPALA-12973,IMPALA-11491,IMPALA-12651: Support BINARY nested 
in complex types in select list
..

IMPALA-12973,IMPALA-11491,IMPALA-12651: Support BINARY nested in complex types 
in select list

Binary fields in complex types are currently not supported at all for
regular tables (an error is returned). For Iceberg metadata tables,
IMPALA-12899 added a temporary workaround to allow queries that contain
these fields to succeed by NULLing them out. This change adds support
for displaying them with base64 encoding for both regular and Iceberg
metadata tables.

Complex types are displayed in JSON format, so simply inserting the
bytes of the binary fields is not acceptable as it would produce invalid
JSON. Base64 is a widely used encoding that allows representing
arbitrary binary information using only a limited set of ASCII
characters.

This change also adds support for top level binary columns in Iceberg
metadata tables. However, these are not base64 encoded but are returned
in raw byte format - this is consistent with how top level binary
columns from regular (non-metadata) tables are handled.

Testing:
 - added test queries in iceberg-metadata-tables.test referencing both
   nested and top level binary fields; also updated existing queries
 - moved relevant tests (queries extracting binary fields from within
   complex types) from nested-types-scanner-basic.test to a new
   binary-in-complex-type.test file and also added a query that selects
   the containing complex types; this new test file is run from
   test_scanners.py::TestBinaryInComplexType::\
 test_binary_in_complex_type
 - moved negative tests in AnalyzerTest.TestUnsupportedTypes() to
   AnalyzeStmtsTest.TestComplexTypesInSelectList() and converted them to
   positive tests (expecting success); a negative test already in
   AnalyzeStmtsTest.TestComplexTypesInSelectList() was also converted

Change-Id: I7b1d7fa332a901f05a46e0199e13fb841d2687c2
---
M be/src/exec/iceberg-metadata/iceberg-metadata-scanner.cc
M be/src/exec/iceberg-metadata/iceberg-metadata-scanner.h
M be/src/exec/iceberg-metadata/iceberg-row-reader.cc
M be/src/exec/iceberg-metadata/iceberg-row-reader.h
M be/src/rpc/jni-thrift-util.h
M be/src/runtime/complex-value-writer.inline.h
M be/src/util/jni-util.cc
M be/src/util/jni-util.h
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/SlotRef.java
M fe/src/main/java/org/apache/impala/util/IcebergMetadataScanner.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java
M testdata/data/README
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/data/0-0-data-danielbecker_20240408174043_c3737eaf-db30-4b88-aafb-f23c0f3c1dd3-job_17125053806420_0002-1-1.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/64da0e56-efa3-4025-bef1-1047fdd9a2b0-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/snap-3079551887386250470-1-64da0e56-efa3-4025-bef1-1047fdd9a2b0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/v1.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/v2.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/version-hint.txt
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
A 
testdata/workloads/functional-query/queries/QueryTest/binary-in-complex-type.test
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test
M 
testdata/workloads/functional-query/queries/QueryTest/nested-types-scanner-basic.test
M tests/query_test/test_scanners.py
26 files changed, 440 insertions(+), 153 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/21269/3
--
To view, visit http://gerrit.cloudera.org:8080/21269
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7b1d7fa332a901f05a46e0199e13fb841d2687c2
Gerrit-Change-Number: 21269
Gerrit-PatchSet: 3
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 


[Impala-ASF-CR] [TEST][WIP]IMPALA-11499: Refactor UrlEncode function to handle special characters

2024-04-11 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: [TEST][WIP]IMPALA-11499: Refactor UrlEncode function to handle 
special characters
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/21131/3/be/src/util/coding-util.cc
File be/src/util/coding-util.cc:

http://gerrit.cloudera.org:8080/#/c/21131/3/be/src/util/coding-util.cc@83
PS3, Line 83:   if (!hive_compat) {
> That was previously possible. The following works for me before this change
The exception is thrown here in the catalogd:
https://github.com/apache/impala/blob/70c35425d3f8ac68b23fb8e0d08e12ee763965d7/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java#L1770
It is a Preconditions check. The partition name is not found in the map. 
'partName' is "p=12?!@%23$%25^&%2A()|%3F%2F.~`", while 'nameToPartitionMap_' 
contains "p=12?!@%23$%25%5E&%2A()|%3F%2F.~`". The difference is the caret ("^").
What I think happens is that the HMS encodes it to "%5E", and that is what is 
present in the 'nameToPartitionMap_' map. But we don't encode it, hence 
'partName' contains "^".

Indeed, if I add the mapping {'^', "%5E"} in 'specialCharacterMap' in this 
file, there is no error.



--
To view, visit http://gerrit.cloudera.org:8080/21131
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Gerrit-Change-Number: 21131
Gerrit-PatchSet: 3
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Thu, 11 Apr 2024 10:38:30 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12810: Simplify IcebergDeleteNode and IcebergDeleteBuilder

2024-04-10 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21258 )

Change subject: IMPALA-12810: Simplify IcebergDeleteNode and 
IcebergDeleteBuilder
..


Patch Set 9: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/21258
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3ba02b33433990950b49628f11e732e01ed8a34d
Gerrit-Change-Number: 21258
Gerrit-PatchSet: 9
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 10 Apr 2024 15:10:18 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12986: Base64Encode fails if the 'out len' output parameter is passed with certain values

2024-04-10 Thread Daniel Becker (Code Review)
Daniel Becker has uploaded a new patch set (#3). ( 
http://gerrit.cloudera.org:8080/21271 )

Change subject: IMPALA-12986: Base64Encode fails if the 'out_len' output 
parameter is passed with certain values
..

IMPALA-12986: Base64Encode fails if the 'out_len' output parameter is passed 
with certain values

The Base64Encode function in coding-util.h with signature

  bool Base64Encode(const char* in, int64_t in_len, int64_t out_max,
  char* out, int64_t* out_len);

fails if '*out_len', when passed to the function, contains a value that
does not fit in a 32 bit integer.

Internally we use the

  int sasl_encode64(const char *in, unsigned inlen, char *out, unsigned
  outmax, unsigned *outlen);

function and explicitly cast 'out_len' to 'unsigned*'.

The error is that the called sasl_encode64() function only sets the four
lower bytes of '*out_len' (assuming that 'unsigned' is a 32 bit
integer), and if the upper bytes are not all zero, the resulting value
of '*out_len' will be incorrect.

This change changes the type of 'out_len' from 'int64_t*' to 'unsigned*'
to match the type that sasl_encode64() expects.

Base64Decode() is also updated to use 'unsigned*'. Before this change it
used an intermediate 32 bit local variable to avoid this issue.

Testing:
 - added a regression test in coding-util-test.cc

Change-Id: I35ae59fc9b3280f89ea4f7d95d27d2f21751001f
---
M be/src/exec/text-converter.inline.h
M be/src/exprs/string-functions-ir.cc
M be/src/util/coding-util-test.cc
M be/src/util/coding-util.cc
M be/src/util/coding-util.h
M be/src/util/runtime-profile.cc
6 files changed, 43 insertions(+), 16 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/71/21271/3
--
To view, visit http://gerrit.cloudera.org:8080/21271
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I35ae59fc9b3280f89ea4f7d95d27d2f21751001f
Gerrit-Change-Number: 21271
Gerrit-PatchSet: 3
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Peter Rozsa 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-12986: Base64Encode fails if the 'out len' output parameter is passed with certain values

2024-04-10 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21271 )

Change subject: IMPALA-12986: Base64Encode fails if the 'out_len' output 
parameter is passed with certain values
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/21271/2/be/src/util/coding-util.cc
File be/src/util/coding-util.cc:

http://gerrit.cloudera.org:8080/#/c/21271/2/be/src/util/coding-util.cc@125
PS2, Line 125: bool Base64Encode(const char* in, int64_t in_len, int64_t 
out_max, char* out,
> What do you think about changing Base64* functions to use uint32_t* or size
I actually think we should use 'unsigned' because that's what sasl_encode64() 
uses. It's very ugly and error-prone to cast from an integer pointer to the 
other when the size of the original and destination types are different. Now I 
think the current error is also because of that: we cast int64_t* to unsigned* 
(32 bits in this case) and the called function will only write into the lower 4 
bytes, and if the upper 4 bytes are non-zero, the result will fail the check at 
L134 (*out_len != out_max - 1).



--
To view, visit http://gerrit.cloudera.org:8080/21271
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I35ae59fc9b3280f89ea4f7d95d27d2f21751001f
Gerrit-Change-Number: 21271
Gerrit-PatchSet: 2
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Peter Rozsa 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 10 Apr 2024 13:58:37 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12810: Simplify IcebergDeleteNode and IcebergDeleteBuilder

2024-04-10 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21258 )

Change subject: IMPALA-12810: Simplify IcebergDeleteNode and 
IcebergDeleteBuilder
..


Patch Set 8:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/21258/7/be/src/runtime/krpc-data-stream-sender.cc
File be/src/runtime/krpc-data-stream-sender.cc:

http://gerrit.cloudera.org:8080/#/c/21258/7/be/src/runtime/krpc-data-stream-sender.cc@1125
PS7, Line 1125: Or
> I can change it if you feel strong about it, but I think concise and simple
I think it's a bit unexpected/abrupt after the previous paragraphs which are 
full sentences. In isolation it would be ok but I think it's better to match 
the style.


http://gerrit.cloudera.org:8080/#/c/21258/8/testdata/data/README
File testdata/data/README:

http://gerrit.cloudera.org:8080/#/c/21258/8/testdata/data/README@1134
PS8, Line 1134: 1) Created the table via Impala and added some records to it.
Could you include the CREATE TABLE and INSERT statements for reproducibility?



--
To view, visit http://gerrit.cloudera.org:8080/21258
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3ba02b33433990950b49628f11e732e01ed8a34d
Gerrit-Change-Number: 21258
Gerrit-PatchSet: 8
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 10 Apr 2024 13:19:08 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12973,IMPALA-11491,IMPALA-12651: Support BINARY nested in complex types in select list

2024-04-10 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21269 )

Change subject: IMPALA-12973,IMPALA-11491,IMPALA-12651: Support BINARY nested 
in complex types in select list
..


Patch Set 2:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/21269/1/be/src/exec/iceberg-metadata/iceberg-row-reader.cc
File be/src/exec/iceberg-metadata/iceberg-row-reader.cc:

http://gerrit.cloudera.org:8080/#/c/21269/1/be/src/exec/iceberg-metadata/iceberg-row-reader.cc@135
PS1, Line 135:   DCHECK(false) << "Unsupported column type: " << 
slot_desc->type().type;
> Are there still unsupported types that are intentionally nulled? How is thi
There aren't any more types that we don't support but are present in metadata 
tables (TINYINT and SHORTINT are not Iceberg types). In our tests we have 
"select *" for all metadata tables so if Iceberg later adds e.g. a DATE field 
our tests will catch it. Added a DCHECK instead.


http://gerrit.cloudera.org:8080/#/c/21269/1/be/src/exec/iceberg-metadata/iceberg-row-reader.cc@187
PS1, Line 187: const jobject _value, void* slot, MemPool* 
tuple_data_pool) {
> line too long (92 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/21269/1/be/src/exec/iceberg-metadata/iceberg-row-reader.cc@198
PS1, Line 198: 
RETURN_IF_ERROR(metadata_scanner_->ConvertJavaByteBufferToByteArray(
> tl/dr: The binary handling seems good to me, but I have concerns about the
I agree with what you say, let's see if we have capacity to do it. Please 
create a Jira for it. Thanks.


http://gerrit.cloudera.org:8080/#/c/21269/1/be/src/exec/iceberg-metadata/iceberg-row-reader.cc@209
PS1, Line 209:
> nit: extra ;
Done


http://gerrit.cloudera.org:8080/#/c/21269/1/be/src/rpc/jni-thrift-util.h
File be/src/rpc/jni-thrift-util.h:

http://gerrit.cloudera.org:8080/#/c/21269/1/be/src/rpc/jni-thrift-util.h@62
PS1, Line 62: }
> With JniByteArrayGuard we could just return DeserializeThriftMsg directly
Done


http://gerrit.cloudera.org:8080/#/c/21269/1/be/src/util/jni-util.h
File be/src/util/jni-util.h:

http://gerrit.cloudera.org:8080/#/c/21269/1/be/src/util/jni-util.h@115
PS1, Line 115: /// is more restricted, see 
https://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/functions.html#GetPrimitiveArrayCritical_ReleasePrimitiveArrayCritical
> line too long (162 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/21269/1/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test
File 
testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test:

http://gerrit.cloudera.org:8080/#/c/21269/1/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test@780
PS1, Line 780:
> nit: extra lines? (line 791)
We have these extra empty lines in this file, for example on L686 and L765 from 
before. I think they contribute to making the code easier to read and navigate.



--
To view, visit http://gerrit.cloudera.org:8080/21269
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7b1d7fa332a901f05a46e0199e13fb841d2687c2
Gerrit-Change-Number: 21269
Gerrit-PatchSet: 2
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Comment-Date: Wed, 10 Apr 2024 11:25:16 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12973,IMPALA-11491,IMPALA-12651: Support BINARY nested in complex types in select list

2024-04-10 Thread Daniel Becker (Code Review)
Daniel Becker has uploaded a new patch set (#2). ( 
http://gerrit.cloudera.org:8080/21269 )

Change subject: IMPALA-12973,IMPALA-11491,IMPALA-12651: Support BINARY nested 
in complex types in select list
..

IMPALA-12973,IMPALA-11491,IMPALA-12651: Support BINARY nested in complex types 
in select list

Binary fields in complex types are currently not supported at all for
regular tables (an error is returned). For Iceberg metadata tables,
IMPALA-12899 added a temporary workaround to allow queries that contain
these fields to succeed by NULLing them out. This change adds support
for displaying them with base64 encoding for both regular and Iceberg
metadata tables.

Complex types are displayed in JSON format, so simply inserting the
bytes of the binary fields is not acceptable as it would produce invalid
JSON. Base64 is a widely used encoding that allows representing
arbitrary binary information using only a limited set of ASCII
characters.

This change also adds support for top level binary columns in Iceberg
metadata tables. However, these are not base64 encoded but are returned
in raw byte format - this is consistent with how top level binary
columns from regular (non-metadata) tables are handled.

Testing:
 - added test queries in iceberg-metadata-tables.test referencing both
   nested and top level binary fields; also updated existing queries
 - moved relevant tests (queries extracting binary fields from within
   complex types) from nested-types-scanner-basic.test to a new
   binary-in-complex-type.test file and also added a query that selects
   the containing complex types; this new test file is run from
   test_scanners.py::TestBinaryInComplexType::\
 test_binary_in_complex_type
 - moved negative tests in AnalyzerTest.TestUnsupportedTypes() to
   AnalyzeStmtsTest.TestComplexTypesInSelectList() and converted them to
   positive tests (expecting success); a negative test already in
   AnalyzeStmtsTest.TestComplexTypesInSelectList() was also converted

Change-Id: I7b1d7fa332a901f05a46e0199e13fb841d2687c2
---
M be/src/exec/iceberg-metadata/iceberg-metadata-scanner.cc
M be/src/exec/iceberg-metadata/iceberg-metadata-scanner.h
M be/src/exec/iceberg-metadata/iceberg-row-reader.cc
M be/src/exec/iceberg-metadata/iceberg-row-reader.h
M be/src/rpc/jni-thrift-util.h
M be/src/runtime/complex-value-writer.inline.h
M be/src/util/jni-util.cc
M be/src/util/jni-util.h
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/SlotRef.java
M fe/src/main/java/org/apache/impala/util/IcebergMetadataScanner.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java
M testdata/data/README
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/data/0-0-data-danielbecker_20240408174043_c3737eaf-db30-4b88-aafb-f23c0f3c1dd3-job_17125053806420_0002-1-1.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/64da0e56-efa3-4025-bef1-1047fdd9a2b0-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/snap-3079551887386250470-1-64da0e56-efa3-4025-bef1-1047fdd9a2b0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/v1.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/v2.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/version-hint.txt
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
A 
testdata/workloads/functional-query/queries/QueryTest/binary-in-complex-type.test
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test
M 
testdata/workloads/functional-query/queries/QueryTest/nested-types-scanner-basic.test
M tests/query_test/test_scanners.py
26 files changed, 441 insertions(+), 155 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/21269/2
--
To view, visit http://gerrit.cloudera.org:8080/21269
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7b1d7fa332a901f05a46e0199e13fb841d2687c2
Gerrit-Change-Number: 21269
Gerrit-PatchSet: 2
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 


[Impala-ASF-CR] IMPALA-12810: Simplify IcebergDeleteNode and IcebergDeleteBuilder

2024-04-10 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21258 )

Change subject: IMPALA-12810: Simplify IcebergDeleteNode and 
IcebergDeleteBuilder
..


Patch Set 7:

(2 comments)

Thanks.

http://gerrit.cloudera.org:8080/#/c/21258/7/be/src/exec/iceberg-delete-builder.cc
File be/src/exec/iceberg-delete-builder.cc:

http://gerrit.cloudera.org:8080/#/c/21258/7/be/src/exec/iceberg-delete-builder.cc@283
PS7, Line 283:   ErrorMsg(TErrorCode::GENERAL, "NULL found as file_path 
in delete file"));
Can't we return or continue here?


http://gerrit.cloudera.org:8080/#/c/21258/7/be/src/runtime/krpc-data-stream-sender.cc
File be/src/runtime/krpc-data-stream-sender.cc:

http://gerrit.cloudera.org:8080/#/c/21258/7/be/src/runtime/krpc-data-stream-sender.cc@1125
PS7, Line 1125: Or
Nit: something like "A third case is..." would be nicer.



--
To view, visit http://gerrit.cloudera.org:8080/21258
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3ba02b33433990950b49628f11e732e01ed8a34d
Gerrit-Change-Number: 21258
Gerrit-PatchSet: 7
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 10 Apr 2024 10:20:49 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12810: Simplify IcebergDeleteNode and IcebergDeleteBuilder

2024-04-09 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21258 )

Change subject: IMPALA-12810: Simplify IcebergDeleteNode and 
IcebergDeleteBuilder
..


Patch Set 4:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/21258/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/21258/4//COMMIT_MSG@13
PS4, Line 13: IcebergDeleteBuilder now also tolerates NULL file paths
Is it possible to add a test for this?


http://gerrit.cloudera.org:8080/#/c/21258/4/be/src/exec/iceberg-delete-builder.h
File be/src/exec/iceberg-delete-builder.h:

http://gerrit.cloudera.org:8080/#/c/21258/4/be/src/exec/iceberg-delete-builder.h@79
PS4, Line 79: ///   Partitioned: Both data and delete files are hashed by the 
file path. This means
Shouldn't we mention DIRECTED mode here?


http://gerrit.cloudera.org:8080/#/c/21258/4/be/src/exec/iceberg-delete-builder.cc
File be/src/exec/iceberg-delete-builder.cc:

http://gerrit.cloudera.org:8080/#/c/21258/4/be/src/exec/iceberg-delete-builder.cc@272
PS4, Line 272: state
Do we use 'state' anywhere? If not, this parameter could also be removed from 
AddBatch().



--
To view, visit http://gerrit.cloudera.org:8080/21258
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3ba02b33433990950b49628f11e732e01ed8a34d
Gerrit-Change-Number: 21258
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 09 Apr 2024 14:31:45 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12986: Base64Encode fails if the 'out len' output parameter is passed with certain values

2024-04-09 Thread Daniel Becker (Code Review)
Daniel Becker has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/21271


Change subject: IMPALA-12986: Base64Encode fails if the 'out_len' output 
parameter is passed with certain values
..

IMPALA-12986: Base64Encode fails if the 'out_len' output parameter is passed 
with certain values

The Base64Encode function in coding-util.h with signature

  bool Base64Encode(const char* in, int64_t in_len, int64_t out_max,
  char* out, int64_t* out_len);

fails if '*out_len', when passed to the function, contains a negative
value or a value that does not fit in a 32 bit integer.

Internally we use the

  int sasl_encode64(const char *in, unsigned inlen, char *out, unsigned
  outmax, unsigned *outlen);

function and explicitly cast 'out_len' to 'unsigned*'.

The success of this function shouldn't depend on the value of '*out_len'
because it is an output parameter, so this change sets '*out_len' to
zero before passing it to sasl_encode64().

Testing:
 - added a regression test in coding-util-test.cc

Change-Id: I35ae59fc9b3280f89ea4f7d95d27d2f21751001f
---
M be/src/util/coding-util-test.cc
M be/src/util/coding-util.cc
M be/src/util/coding-util.h
3 files changed, 32 insertions(+), 3 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/71/21271/2
--
To view, visit http://gerrit.cloudera.org:8080/21271
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I35ae59fc9b3280f89ea4f7d95d27d2f21751001f
Gerrit-Change-Number: 21271
Gerrit-PatchSet: 2
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Peter Rozsa 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-5323: Support BINARY columns in Kudu tables

2024-04-09 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18868 )

Change subject: IMPALA-5323: Support BINARY columns in Kudu tables
..


Patch Set 8: Code-Review+1

Thanks Csaba, looks good.


--
To view, visit http://gerrit.cloudera.org:8080/18868
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iff701a4b3a09ce7b6982c5d238e65f3d4f3d1151
Gerrit-Change-Number: 18868
Gerrit-PatchSet: 8
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Peter Rozsa 
Gerrit-Comment-Date: Tue, 09 Apr 2024 10:37:47 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12973,IMPALA-11491,IMPALA-12651: Support BINARY nested in complex types in select list

2024-04-09 Thread Daniel Becker (Code Review)
Daniel Becker has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/21269


Change subject: IMPALA-12973,IMPALA-11491,IMPALA-12651: Support BINARY nested 
in complex types in select list
..

IMPALA-12973,IMPALA-11491,IMPALA-12651: Support BINARY nested in complex types 
in select list

Binary fields in complex types are currently not supported at all for
regular tables (an error is returned). For Iceberg metadata tables,
IMPALA-12899 added a temporary workaround to allow queries that contain
these fields to succeed by NULLing them out. This change adds support
for displaying them with base64 encoding for both regular and Iceberg
metadata tables.

Complex types are displayed in JSON format, so simply inserting the
bytes of the binary fields is not acceptable as it would produce invalid
JSON. Base64 is a widely used encoding that allows representing
arbitrary binary information using only a limited set of ASCII
characters.

This change also adds support for top level binary columns in Iceberg
metadata tables. However, these are not base64 encoded but are returned
in raw byte format - this is consistent with how top level binary
columns from regular (non-metadata) tables are handled.

Testing:
 - added test queries in iceberg-metadata-tables.test referencing both
   nested and top level binary fields; also updated existing queries
 - moved relevant tests (queries extracting binary fields from within
   complex types) from nested-types-scanner-basic.test to a new
   binary-in-complex-type.test file and also added a query that selects
   the containing complex types; this new test file is run from
   test_scanners.py::TestBinaryInComplexType::\
 test_binary_in_complex_type
 - moved negative tests in AnalyzerTest.TestUnsupportedTypes() to
   AnalyzeStmtsTest.TestComplexTypesInSelectList() and converted them to
   positive tests (expecting success); a negative test already in
   AnalyzeStmtsTest.TestComplexTypesInSelectList() was also converted

Change-Id: I7b1d7fa332a901f05a46e0199e13fb841d2687c2
---
M be/src/exec/iceberg-metadata/iceberg-metadata-scanner.cc
M be/src/exec/iceberg-metadata/iceberg-metadata-scanner.h
M be/src/exec/iceberg-metadata/iceberg-row-reader.cc
M be/src/exec/iceberg-metadata/iceberg-row-reader.h
M be/src/rpc/jni-thrift-util.h
M be/src/runtime/complex-value-writer.inline.h
M be/src/util/jni-util.cc
M be/src/util/jni-util.h
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/SlotRef.java
M fe/src/main/java/org/apache/impala/util/IcebergMetadataScanner.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java
M testdata/data/README
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/data/0-0-data-danielbecker_20240408174043_c3737eaf-db30-4b88-aafb-f23c0f3c1dd3-job_17125053806420_0002-1-1.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/64da0e56-efa3-4025-bef1-1047fdd9a2b0-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/snap-3079551887386250470-1-64da0e56-efa3-4025-bef1-1047fdd9a2b0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/v1.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/v2.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/version-hint.txt
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
A 
testdata/workloads/functional-query/queries/QueryTest/binary-in-complex-type.test
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test
M 
testdata/workloads/functional-query/queries/QueryTest/nested-types-scanner-basic.test
M tests/query_test/test_scanners.py
26 files changed, 438 insertions(+), 150 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/21269/1
--
To view, visit http://gerrit.cloudera.org:8080/21269
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I7b1d7fa332a901f05a46e0199e13fb841d2687c2
Gerrit-Change-Number: 21269
Gerrit-PatchSet: 1
Gerrit-Owner: Daniel Becker 


[Impala-ASF-CR] IMPALA-5323: Support BINARY columns in Kudu tables

2024-04-04 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18868 )

Change subject: IMPALA-5323: Support BINARY columns in Kudu tables
..


Patch Set 7:

(3 comments)

Looks good, thanks.

http://gerrit.cloudera.org:8080/#/c/18868/7/be/src/exec/kudu/kudu-util-ir.cc
File be/src/exec/kudu/kudu-util-ir.cc:

http://gerrit.cloudera.org:8080/#/c/18868/7/be/src/exec/kudu/kudu-util-ir.cc@56
PS7, Line 56:   const char* VALUE_ERROR_MSG = "Could not set Kudu row value.";
Could be constexpr:
constexpr const char* VALUE_ERROR_MSG = "...";


http://gerrit.cloudera.org:8080/#/c/18868/5/fe/src/main/java/org/apache/impala/analysis/CastExpr.java
File fe/src/main/java/org/apache/impala/analysis/CastExpr.java:

http://gerrit.cloudera.org:8080/#/c/18868/5/fe/src/main/java/org/apache/impala/analysis/CastExpr.java@198
PS5, Line 198:L
> I agree with Peter that it should be a separate patch.
Could you create a Jira ticket for it and reference the number here?


http://gerrit.cloudera.org:8080/#/c/18868/7/testdata/workloads/functional-query/queries/QueryTest/binary-type.test
File testdata/workloads/functional-query/queries/QueryTest/binary-type.test:

http://gerrit.cloudera.org:8080/#/c/18868/7/testdata/workloads/functional-query/queries/QueryTest/binary-type.test@170
PS7, Line 170: s
Nit: constant folding. See also on L182.



--
To view, visit http://gerrit.cloudera.org:8080/18868
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iff701a4b3a09ce7b6982c5d238e65f3d4f3d1151
Gerrit-Change-Number: 18868
Gerrit-PatchSet: 7
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Peter Rozsa 
Gerrit-Comment-Date: Thu, 04 Apr 2024 13:50:16 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12612: SELECT * queries expand complex type columns from Iceberg metadata tables

2024-04-04 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21236 )

Change subject: IMPALA-12612: SELECT * queries expand complex type columns from 
Iceberg metadata tables
..


Patch Set 5: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/21236
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia298705ba54411cc439e99d5cb27184093541f02
Gerrit-Change-Number: 21236
Gerrit-PatchSet: 5
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 04 Apr 2024 09:15:39 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12612: SELECT * queries expand complex type columns from Iceberg metadata tables

2024-04-04 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21236 )

Change subject: IMPALA-12612: SELECT * queries expand complex type columns from 
Iceberg metadata tables
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/21236/1/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test
File 
testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test:

http://gerrit.cloudera.org:8080/#/c/21236/1/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test@1110
PS1, Line 1110: select readable_metrics.i.* from 
functional_parquet.iceberg_query_metadata.`files`;
> I changed this to expand readable_metrics.i
Sorry, I didn't notice that readable_metrics.i doesn't contain complex types, 
so expanding it is not relevant for this patch. We could try "select 
data_file.* from functional_parquet.iceberg_query_metadata.entries" instead, 
that contains some maps.



--
To view, visit http://gerrit.cloudera.org:8080/21236
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia298705ba54411cc439e99d5cb27184093541f02
Gerrit-Change-Number: 21236
Gerrit-PatchSet: 3
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 04 Apr 2024 07:54:45 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12612: Expand complex type columns from Iceberg metadata tables

2024-04-03 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21236 )

Change subject: IMPALA-12612: Expand complex type columns from Iceberg metadata 
tables
..


Patch Set 1:

(5 comments)

Thanks Gábor, it looks good, I have only a few remarks.

http://gerrit.cloudera.org:8080/#/c/21236/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/21236/1//COMMIT_MSG@7
PS1, Line 7: IMPALA-12612: Expand complex type columns from Iceberg metadata 
tables
The title could include "select *", otherwise it's not clear what this refers 
to.


http://gerrit.cloudera.org:8080/#/c/21236/1//COMMIT_MSG@9
PS1, Line 9: ,
Nit: should come after "behave".


http://gerrit.cloudera.org:8080/#/c/21236/1//COMMIT_MSG@14
PS1, Line 14: Note, the behavior of handling nested columns from regular tables
We could mention that although this is technically a breaking change, metadata 
tables are a very recent feature so it is not problematic.


http://gerrit.cloudera.org:8080/#/c/21236/1/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test
File 
testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test:

http://gerrit.cloudera.org:8080/#/c/21236/1/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test@25
PS1, Line 25: hdfs://localhost:20500
Is it intentional that these are changed from "$NAMENODE" to 
"hdfs://localhost:20500"? Applies also to some of the other queries.


http://gerrit.cloudera.org:8080/#/c/21236/1/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test@1110
PS1, Line 1110: select readable_metrics.* from 
functional_parquet.iceberg_query_metadata.`files`;
This example can be confusing at first because the 'readable_metrics' struct 
itself only contains a single struct. I thought first that it failed to expand 
to the columns "column_size", "value_count" etc. We could either mention it in 
a comment and/or expand "readable_metrics.i" instead.



--
To view, visit http://gerrit.cloudera.org:8080/21236
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia298705ba54411cc439e99d5cb27184093541f02
Gerrit-Change-Number: 21236
Gerrit-PatchSet: 1
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 03 Apr 2024 21:52:04 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12969: Release JNI array if DeserializeThriftMsg failed

2024-04-03 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21234 )

Change subject: IMPALA-12969: Release JNI array if DeserializeThriftMsg failed
..


Patch Set 1: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/21234/1/be/src/rpc/jni-thrift-util.h
File be/src/rpc/jni-thrift-util.h:

http://gerrit.cloudera.org:8080/#/c/21234/1/be/src/rpc/jni-thrift-util.h@61
PS1, Line 61:   Status status = DeserializeThriftMsg(
We could use something like JniUtil::JniUtfCharGuard for arrays as well. I 
understand that it is urgent to correct this error so it can be done in another 
commit.



--
To view, visit http://gerrit.cloudera.org:8080/21234
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id2c0335b12e9289ae851d0ec050765951a8ca6c7
Gerrit-Change-Number: 21234
Gerrit-PatchSet: 1
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Wed, 03 Apr 2024 21:39:20 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12899: Temporary workaround for BINARY in complex types

2024-04-03 Thread Daniel Becker (Code Review)
Daniel Becker has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/21219 )

Change subject: IMPALA-12899: Temporary workaround for BINARY in complex types
..

IMPALA-12899: Temporary workaround for BINARY in complex types

The BINARY type is currently not supported inside complex types and a
cross-component decision is probably needed to support it (see
IMPALA-11491). We would like to enable EXPAND_COMPLEX_TYPES for Iceberg
metadata tables (IMPALA-12612), which requires that queries with BINARY
inside complex types don't fail. Enabling EXPAND_COMPLEX_TYPES is a more
prioritised issue than IMPALA-11491, so we have come up with a
temporary solution.

This change NULLs out BINARY values in complex types coming from Iceberg
metadata tables and logs a warning.

BINARYs in complex types from regular tables are not affected by this
change.

Testing:
 - Added test queries in iceberg-metadata-tables.test.

Change-Id: I0d834126c7d702a25e957bb6071ecbf0fda2c203
Reviewed-on: http://gerrit.cloudera.org:8080/21219
Reviewed-by: Gabor Kaszab 
Tested-by: Impala Public Jenkins 
---
M be/src/exec/iceberg-metadata/iceberg-row-reader.cc
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/Path.java
M fe/src/main/java/org/apache/impala/analysis/SlotRef.java
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test
5 files changed, 64 insertions(+), 6 deletions(-)

Approvals:
  Gabor Kaszab: Looks good to me, approved
  Impala Public Jenkins: Verified

--
To view, visit http://gerrit.cloudera.org:8080/21219
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I0d834126c7d702a25e957bb6071ecbf0fda2c203
Gerrit-Change-Number: 21219
Gerrit-PatchSet: 10
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-12899: Temporary workaround for BINARY in complex types

2024-04-03 Thread Daniel Becker (Code Review)
Daniel Becker has uploaded a new patch set (#9). ( 
http://gerrit.cloudera.org:8080/21219 )

Change subject: IMPALA-12899: Temporary workaround for BINARY in complex types
..

IMPALA-12899: Temporary workaround for BINARY in complex types

The BINARY type is currently not supported inside complex types and a
cross-component decision is probably needed to support it (see
IMPALA-11491). We would like to enable EXPAND_COMPLEX_TYPES for Iceberg
metadata tables (IMPALA-12612), which requires that queries with BINARY
inside complex types don't fail. Enabling EXPAND_COMPLEX_TYPES is a more
prioritised issue than IMPALA-11491, so we have come up with a
temporary solution.

This change NULLs out BINARY values in complex types coming from Iceberg
metadata tables and logs a warning.

BINARYs in complex types from regular tables are not affected by this
change.

Testing:
 - Added test queries in iceberg-metadata-tables.test.

Change-Id: I0d834126c7d702a25e957bb6071ecbf0fda2c203
---
M be/src/exec/iceberg-metadata/iceberg-row-reader.cc
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/Path.java
M fe/src/main/java/org/apache/impala/analysis/SlotRef.java
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test
5 files changed, 64 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/19/21219/9
--
To view, visit http://gerrit.cloudera.org:8080/21219
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0d834126c7d702a25e957bb6071ecbf0fda2c203
Gerrit-Change-Number: 21219
Gerrit-PatchSet: 9
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-12899: Temporary workaround for BINARY in complex types

2024-04-03 Thread Daniel Becker (Code Review)
Daniel Becker has uploaded a new patch set (#8). ( 
http://gerrit.cloudera.org:8080/21219 )

Change subject: IMPALA-12899: Temporary workaround for BINARY in complex types
..

IMPALA-12899: Temporary workaround for BINARY in complex types

The BINARY type is currently not supported inside complex types and a
cross-component decision is probably needed to support it (see
IMPALA-11491). We would like to enable EXPAND_COMPLEX_TYPES for Iceberg
metadata tables (IMPALA-12612), which requires that queries with BINARY
inside complex types don't fail. Enabling EXPAND_COMPLEX_TYPES is a more
prioritised issue than IMPALA-11491, so we have come up with a
temporary solution.

This change NULLs out BINARY values in complex types coming from Iceberg
metadata tables and logs a warning.

BINARYs in complex types from regular tables are not affected by this
change.

Testing:
 - Added test queries in iceberg-metadata-tables.test.

Change-Id: I0d834126c7d702a25e957bb6071ecbf0fda2c203
---
M be/src/exec/iceberg-metadata/iceberg-row-reader.cc
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/Path.java
M fe/src/main/java/org/apache/impala/analysis/SlotRef.java
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test
5 files changed, 64 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/19/21219/8
--
To view, visit http://gerrit.cloudera.org:8080/21219
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0d834126c7d702a25e957bb6071ecbf0fda2c203
Gerrit-Change-Number: 21219
Gerrit-PatchSet: 8
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-12899: Temporary workaround for BINARY in complex types

2024-04-02 Thread Daniel Becker (Code Review)
Daniel Becker has uploaded a new patch set (#7). ( 
http://gerrit.cloudera.org:8080/21219 )

Change subject: IMPALA-12899: Temporary workaround for BINARY in complex types
..

IMPALA-12899: Temporary workaround for BINARY in complex types

The BINARY type is currently not supported inside complex types and a
cross-component decision is probably needed to support it (see
IMPALA-11491). We would like to enable EXPAND_COMPLEX_TYPES for Iceberg
metadata tables (IMPALA-12612), which requires that queries with BINARY
inside complex types don't fail. Enabling EXPAND_COMPLEX_TYPES is a more
prioritised issue than IMPALA-11491, so we have come up with a
temporary solution.

This change NULLs out BINARY values in complex types coming from Iceberg
metadata tables and logs a warning.

BINARYs in complex types from regular tables are not affected by this
change.

Testing:
 - Added test queries in iceberg-metadata-tables.test.

Change-Id: I0d834126c7d702a25e957bb6071ecbf0fda2c203
---
M be/src/exec/iceberg-metadata/iceberg-row-reader.cc
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/Path.java
M fe/src/main/java/org/apache/impala/analysis/SlotRef.java
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test
5 files changed, 64 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/19/21219/7
--
To view, visit http://gerrit.cloudera.org:8080/21219
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0d834126c7d702a25e957bb6071ecbf0fda2c203
Gerrit-Change-Number: 21219
Gerrit-PatchSet: 7
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-12899: Temporary workaround for BINARY in complex types

2024-04-02 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21219 )

Change subject: IMPALA-12899: Temporary workaround for BINARY in complex types
..


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/21219/6/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test
File 
testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test:

http://gerrit.cloudera.org:8080/#/c/21219/6/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test@822
PS6, Line 822: 
1,'/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/data/delete-074a9e19e61b766e-652a169e0001_800513971_data.0.parq','PARQUET',0,1,1606,'{2147483546:215,2147483545:51}','{2147483546:1,2147483545:1}','{2147483546:0,2147483545:0}','NULL','{2147483546:null,2147483545:null}','{2147483546:null,2147483545:null}','NULL','NULL','NULL',NULL,'{"d":{"column_size":null,"value_count":null,"null_value_count":null,"nan_value_count":null,"lower_bound":null,"upper_bound":null},"i":{"column_size":null,"value_count":null,"null_value_count":null,"nan_value_count":null,"lower_bound":null,"upper_bound":null},"s":{"column_size":null,"value_count":null,"null_value_count":null,"nan_value_count":null,"lower_bound":null,"upper_bound":null}}'
> shouldn't these also be regexes similarly as above because of the paths?
These files are copied into the table directory, not generated with 
INSERT/DELETE statements. They are also present with concrete values in the 
query at L617.

I experimented with regexes also but I couldn't get the order of the results 
correctly, although I changed the expected order and also tried adding 
VERIFY_IS_EQUAL_SORTED. Therefore I looked for a table for which the file paths 
are constant across data loads so we don't have to use regexes.



--
To view, visit http://gerrit.cloudera.org:8080/21219
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0d834126c7d702a25e957bb6071ecbf0fda2c203
Gerrit-Change-Number: 21219
Gerrit-PatchSet: 6
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Tue, 02 Apr 2024 14:39:33 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12899: Temporary workaround for BINARY in complex types

2024-04-02 Thread Daniel Becker (Code Review)
Daniel Becker has uploaded a new patch set (#6). ( 
http://gerrit.cloudera.org:8080/21219 )

Change subject: IMPALA-12899: Temporary workaround for BINARY in complex types
..

IMPALA-12899: Temporary workaround for BINARY in complex types

The BINARY type is currently not supported inside complex types and a
cross-component decision is probably needed to support it (see
IMPALA-11491). We would like to enable EXPAND_COMPLEX_TYPES for Iceberg
metadata tables (IMPALA-12612), which requires that queries with BINARY
inside complex types don't fail. Enabling EXPAND_COMPLEX_TYPES is a more
prioritised issue than IMPALA-11491, so we have come up with a
temporary solution.

This change NULLs out BINARY values in complex types coming from Iceberg
metadata tables and logs a warning.

BINARYs in complex types from regular tables are not affected by this
change.

Testing:
 - Added test queries in iceberg-metadata-tables.test.

Change-Id: I0d834126c7d702a25e957bb6071ecbf0fda2c203
---
M be/src/exec/iceberg-metadata/iceberg-row-reader.cc
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/Path.java
M fe/src/main/java/org/apache/impala/analysis/SlotRef.java
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test
5 files changed, 64 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/19/21219/6
--
To view, visit http://gerrit.cloudera.org:8080/21219
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0d834126c7d702a25e957bb6071ecbf0fda2c203
Gerrit-Change-Number: 21219
Gerrit-PatchSet: 6
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-12899: Temporary workaround for BINARY in complex types

2024-04-02 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21219 )

Change subject: IMPALA-12899: Temporary workaround for BINARY in complex types
..


Patch Set 6:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/21219/4/be/src/exec/iceberg-metadata/iceberg-row-reader.cc
File be/src/exec/iceberg-metadata/iceberg-row-reader.cc:

http://gerrit.cloudera.org:8080/#/c/21219/4/be/src/exec/iceberg-metadata/iceberg-row-reader.cc@328
PS4, Line 328:   return long_cl_;
> I understand this will be needed for the permanent solution and not the NUL
Done


http://gerrit.cloudera.org:8080/#/c/21219/4/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test
File 
testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test:

http://gerrit.cloudera.org:8080/#/c/21219/4/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test@813
PS4, Line 813: 
ntent":0,"file_path":".*/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_quer
> not sure about this but for dockerised builds don't the file paths start wi
Done. There are also some other parts for which a regex is needed: the exact 
file names and the size of the delete file. Let's see if this is enough or even 
more needs to be captured by regexes.


http://gerrit.cloudera.org:8080/#/c/21219/4/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test@819
PS4, Line 819: set EXPAND_COMPLEX_TYPES=1;
> One possible extra test case is to set the EXPAND_COMPLEX_TYPES true and do
Done



--
To view, visit http://gerrit.cloudera.org:8080/21219
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0d834126c7d702a25e957bb6071ecbf0fda2c203
Gerrit-Change-Number: 21219
Gerrit-PatchSet: 6
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Tue, 02 Apr 2024 14:29:35 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12611: Add support to MAP type Iceberg Metadata table columns

2024-04-02 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21125 )

Change subject: IMPALA-12611: Add support to MAP type Iceberg Metadata table 
columns
..


Patch Set 10:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/21125/8/be/src/exec/iceberg-metadata/iceberg-metadata-scanner.cc
File be/src/exec/iceberg-metadata/iceberg-metadata-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/21125/8/be/src/exec/iceberg-metadata/iceberg-metadata-scanner.cc@156
PS8, Line 156:   jobject map_entry;
> Shouldn't we release 'map_entry' at the end of this function?
Done


http://gerrit.cloudera.org:8080/#/c/21125/8/be/src/exec/iceberg-metadata/iceberg-row-reader.cc
File be/src/exec/iceberg-metadata/iceberg-row-reader.cc:

http://gerrit.cloudera.org:8080/#/c/21125/8/be/src/exec/iceberg-metadata/iceberg-row-reader.cc@270
PS8, Line 270: DeleteLocalRef
> Isn't 'collection_scanner' a GlobalRef? We call DeleteLocalRef here so I'm
I checked the code and JNI doc again and I think it is actually a local ref and 
the comment was wrong.


http://gerrit.cloudera.org:8080/#/c/21125/8/be/src/exec/iceberg-metadata/iceberg-row-reader.cc@270
PS8, Line 270:   env->DeleteLocalRef(collection_scanner);
> I think we can leak memory if any of the RETURN_IF_ERROR or RETURN_IF_CANCE
I checked the code and JNI doc again and I think it is actually a local ref and 
the comment was wrong. I updated the comments in iceberg-metadata-scanner.h.

If the reference is indeed local, deleting it may not be as important. This is 
what the doc says about deleting local references 
(https://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/design.html#:~:text=The%20JNI%20divides%20object%20references,until%20they%20are%20explicitly%20freed):


In most cases, the programmer should rely on the VM to free all local 
references after the native method returns. However, there are times when the 
programmer should explicitly free a local reference. Consider, for example, the 
following situations:
 - A native method accesses a large Java object, thereby creating a local 
reference to the Java object. The native method then performs additional 
computation before returning to the caller. The local reference to the large 
Java object will prevent the object from being garbage collected, even if the 
object is no longer used in the remainder of the computation.
 - A native method creates a large number of local references, although not all 
of them are used at the same time. Since the VM needs a certain amount of space 
to keep track of a local reference, creating too many local references may 
cause the system to run out of memory. For example, a native method loops 
through a large array of objects, retrieves the elements as local references, 
and operates on one element at each iteration. After each iteration, the 
programmer no longer needs the local reference to the array element.


I think Tamas added the deletes because of the second case. If an error occurs 
or the query is cancelled we won't create new (local) references, so freeing 
these local references is not important.

If you'd like to I am open to creating a wrapper for these refs though.


http://gerrit.cloudera.org:8080/#/c/21125/8/be/src/exec/iceberg-metadata/iceberg-row-reader.cc@288
PS8, Line 288:   env->DeleteLocalRef(item);
> Same comment about leaking memory
Done



--
To view, visit http://gerrit.cloudera.org:8080/21125
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8a8b3a574ca45c893315c3b41b33ce4e0eff865a
Gerrit-Change-Number: 21125
Gerrit-PatchSet: 10
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 02 Apr 2024 12:36:22 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12611: Add support to MAP type Iceberg Metadata table columns

2024-04-02 Thread Daniel Becker (Code Review)
Daniel Becker has uploaded a new patch set (#10). ( 
http://gerrit.cloudera.org:8080/21125 )

Change subject: IMPALA-12611: Add support to MAP type Iceberg Metadata table 
columns
..

IMPALA-12611: Add support to MAP type Iceberg Metadata table columns

This change adds support for querying MAP types from Iceberg Metadata
tables.

The 'IcebergMetadataScanner.ArrayScanner' java class is renamed to
'CollectionScanner' and extended to be able to handle maps. For arrays
the iteration returns the element as before, for maps it returns
'Map.Entry' objects.

Note that collections in the FROM clause are still not supported.

Testing:
- Added E2E tests in iceberg-metadata-tables.test.

Change-Id: I8a8b3a574ca45c893315c3b41b33ce4e0eff865a
---
M be/src/exec/iceberg-metadata/iceberg-metadata-scanner.cc
M be/src/exec/iceberg-metadata/iceberg-metadata-scanner.h
M be/src/exec/iceberg-metadata/iceberg-row-reader.cc
M be/src/exec/iceberg-metadata/iceberg-row-reader.h
M fe/src/main/java/org/apache/impala/util/IcebergMetadataScanner.java
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test
6 files changed, 400 insertions(+), 124 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/25/21125/10
--
To view, visit http://gerrit.cloudera.org:8080/21125
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8a8b3a574ca45c893315c3b41b33ce4e0eff865a
Gerrit-Change-Number: 21125
Gerrit-PatchSet: 10
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-12611: Add support to MAP type Iceberg Metadata table columns

2024-04-02 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21125 )

Change subject: IMPALA-12611: Add support to MAP type Iceberg Metadata table 
columns
..


Patch Set 8:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/21125/7/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test
File 
testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test:

http://gerrit.cloudera.org:8080/#/c/21125/7/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test@763
PS7, Line 763: select
> Removed h.made_current_at and h.snapshot_id from the select list because th
It seems that in s.summary, for the 'overwrite' column, "added-files-size" and 
"total-files-size" in the map vary by builds, trying a regex instead of a 
concrete value.



--
To view, visit http://gerrit.cloudera.org:8080/21125
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8a8b3a574ca45c893315c3b41b33ce4e0eff865a
Gerrit-Change-Number: 21125
Gerrit-PatchSet: 8
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 02 Apr 2024 10:50:09 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12611: Add support to MAP type Iceberg Metadata table columns

2024-04-02 Thread Daniel Becker (Code Review)
Daniel Becker has uploaded a new patch set (#8). ( 
http://gerrit.cloudera.org:8080/21125 )

Change subject: IMPALA-12611: Add support to MAP type Iceberg Metadata table 
columns
..

IMPALA-12611: Add support to MAP type Iceberg Metadata table columns

This change adds support for querying MAP types from Iceberg Metadata
tables.

The 'IcebergMetadataScanner.ArrayScanner' java class is renamed to
'CollectionScanner' and extended to be able to handle maps. For arrays
the iteration returns the element as before, for maps it returns
'Map.Entry' objects.

Note that collections in the FROM clause are still not supported.

Testing:
- Added E2E tests in iceberg-metadata-tables.test.

Change-Id: I8a8b3a574ca45c893315c3b41b33ce4e0eff865a
---
M be/src/exec/iceberg-metadata/iceberg-metadata-scanner.cc
M be/src/exec/iceberg-metadata/iceberg-metadata-scanner.h
M be/src/exec/iceberg-metadata/iceberg-row-reader.cc
M be/src/exec/iceberg-metadata/iceberg-row-reader.h
M fe/src/main/java/org/apache/impala/util/IcebergMetadataScanner.java
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test
6 files changed, 399 insertions(+), 124 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/25/21125/8
--
To view, visit http://gerrit.cloudera.org:8080/21125
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8a8b3a574ca45c893315c3b41b33ce4e0eff865a
Gerrit-Change-Number: 21125
Gerrit-PatchSet: 8
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-12609: Implement SHOW METADATA TABLES IN statement to list Iceberg Metadata tables

2024-04-02 Thread Daniel Becker (Code Review)
Daniel Becker has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/21026 )

Change subject: IMPALA-12609: Implement SHOW METADATA TABLES IN statement to 
list Iceberg Metadata tables
..

IMPALA-12609: Implement SHOW METADATA TABLES IN statement to list Iceberg 
Metadata tables

After this change, the new SHOW METADATA TABLES IN statement can be used
to list all the available metadata tables of an Iceberg table.

Note that in contrast to querying the contents of Iceberg metadata tables,
this does not require fully qualified paths, e.g. both
  SHOW METADATA TABLES IN functional_parquet.iceberg_query_metadata;
and
  USE functional_parquet;
  SHOW METADATA TABLES IN iceberg_query_metadata;
work.

The available metadata tables for all Iceberg tables are the same,
corresponding to the values of the enum
"org.apache.iceberg.MetadataTableType", so there is actually no need to
pass the name of the regular table for which the metadata table list is
requested through Thrift. This change, however, does send the table name
because this way
 - if we add support for metadata tables for other table formats, the
   table name/path will be necessary to determine the correct list of
   metadata tables
 - we could later add support for different authorisation policies for
   individual tables
 - we can check also at the point of generating the list of metadata
   tables that the table is an Iceberg table

Testing:
 - added and updated tests in ParserTest, AnalyzeDDLTest, ToSqlTest and
   AuthorizationStmtTest
 - added a custom cluster test in test_authorization.py
 - added functional tests in iceberg-metadata-tables.test

Change-Id: Ide10ccf10fc0abf5c270119ba7092c67e712ec49
Reviewed-on: http://gerrit.cloudera.org:8080/21026
Tested-by: Impala Public Jenkins 
Reviewed-by: Zoltan Borok-Nagy 
---
M be/src/service/client-request-state.cc
M be/src/service/frontend.cc
M be/src/service/frontend.h
M be/src/service/impala-http-handler.cc
M common/thrift/Frontend.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
A fe/src/main/java/org/apache/impala/analysis/ShowMetadataTablesStmt.java
M fe/src/main/java/org/apache/impala/analysis/ShowTablesStmt.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M fe/src/test/java/org/apache/impala/authorization/AuthorizationStmtTest.java
M testdata/datasets/functional/functional_schema_template.sql
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test
M tests/authorization/test_authorization.py
18 files changed, 476 insertions(+), 59 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Zoltan Borok-Nagy: Looks good to me, approved

--
To view, visit http://gerrit.cloudera.org:8080/21026
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ide10ccf10fc0abf5c270119ba7092c67e712ec49
Gerrit-Change-Number: 21026
Gerrit-PatchSet: 16
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-12609: Implement SHOW METADATA TABLES IN statement to list Iceberg Metadata tables

2024-04-01 Thread Daniel Becker (Code Review)
Daniel Becker has removed a vote on this change.

Change subject: IMPALA-12609: Implement SHOW METADATA TABLES IN statement to 
list Iceberg Metadata tables
..


Removed Code-Review+2 by Daniel Becker 
--
To view, visit http://gerrit.cloudera.org:8080/21026
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: deleteVote
Gerrit-Change-Id: Ide10ccf10fc0abf5c270119ba7092c67e712ec49
Gerrit-Change-Number: 21026
Gerrit-PatchSet: 15
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-12609: Implement SHOW METADATA TABLES IN statement to list Iceberg Metadata tables

2024-04-01 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21026 )

Change subject: IMPALA-12609: Implement SHOW METADATA TABLES IN statement to 
list Iceberg Metadata tables
..


Patch Set 15:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/21026/15/fe/src/test/java/org/apache/impala/authorization/AuthorizationStmtTest.java
File 
fe/src/test/java/org/apache/impala/authorization/AuthorizationStmtTest.java:

http://gerrit.cloudera.org:8080/#/c/21026/15/fe/src/test/java/org/apache/impala/authorization/AuthorizationStmtTest.java@1258
PS15, Line 1258: functional_parquet
Removed the table name because "functional_parquet.iceberg_query_metadata" and 
"functional_parquet.*.*" varied with builds.



--
To view, visit http://gerrit.cloudera.org:8080/21026
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ide10ccf10fc0abf5c270119ba7092c67e712ec49
Gerrit-Change-Number: 21026
Gerrit-PatchSet: 15
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 01 Apr 2024 10:47:26 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12609: Implement SHOW METADATA TABLES IN statement to list Iceberg Metadata tables

2024-04-01 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21026 )

Change subject: IMPALA-12609: Implement SHOW METADATA TABLES IN statement to 
list Iceberg Metadata tables
..


Patch Set 15: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/21026
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ide10ccf10fc0abf5c270119ba7092c67e712ec49
Gerrit-Change-Number: 21026
Gerrit-PatchSet: 15
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 01 Apr 2024 10:40:00 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12611: Add support to MAP type Iceberg Metadata table columns

2024-03-31 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21125 )

Change subject: IMPALA-12611: Add support to MAP type Iceberg Metadata table 
columns
..


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/21125/7/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test
File 
testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test:

http://gerrit.cloudera.org:8080/#/c/21125/7/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test@763
PS7, Line 763: select
Removed h.made_current_at and h.snapshot_id from the select list because the 
values change with every data load.



--
To view, visit http://gerrit.cloudera.org:8080/21125
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8a8b3a574ca45c893315c3b41b33ce4e0eff865a
Gerrit-Change-Number: 21125
Gerrit-PatchSet: 7
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Sun, 31 Mar 2024 22:53:29 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12611: Add support to MAP type Iceberg Metadata table columns

2024-03-31 Thread Daniel Becker (Code Review)
Daniel Becker has uploaded a new patch set (#7). ( 
http://gerrit.cloudera.org:8080/21125 )

Change subject: IMPALA-12611: Add support to MAP type Iceberg Metadata table 
columns
..

IMPALA-12611: Add support to MAP type Iceberg Metadata table columns

This change adds support for querying MAP types from Iceberg Metadata
tables.

The 'IcebergMetadataScanner.ArrayScanner' java class is renamed to
'CollectionScanner' and extended to be able to handle maps. For arrays
the iteration returns the element as before, for maps it returns
'Map.Entry' objects.

Note that collections in the FROM clause are still not supported.

Testing:
- Added E2E tests in iceberg-metadata-tables.test.

Change-Id: I8a8b3a574ca45c893315c3b41b33ce4e0eff865a
---
M be/src/exec/iceberg-metadata/iceberg-metadata-scanner.cc
M be/src/exec/iceberg-metadata/iceberg-metadata-scanner.h
M be/src/exec/iceberg-metadata/iceberg-row-reader.cc
M be/src/exec/iceberg-metadata/iceberg-row-reader.h
M fe/src/main/java/org/apache/impala/util/IcebergMetadataScanner.java
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test
6 files changed, 399 insertions(+), 124 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/25/21125/7
--
To view, visit http://gerrit.cloudera.org:8080/21125
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8a8b3a574ca45c893315c3b41b33ce4e0eff865a
Gerrit-Change-Number: 21125
Gerrit-PatchSet: 7
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-12609: Implement SHOW METADATA TABLES IN statement to list Iceberg Metadata tables

2024-03-31 Thread Daniel Becker (Code Review)
Daniel Becker has uploaded a new patch set (#15). ( 
http://gerrit.cloudera.org:8080/21026 )

Change subject: IMPALA-12609: Implement SHOW METADATA TABLES IN statement to 
list Iceberg Metadata tables
..

IMPALA-12609: Implement SHOW METADATA TABLES IN statement to list Iceberg 
Metadata tables

After this change, the new SHOW METADATA TABLES IN statement can be used
to list all the available metadata tables of an Iceberg table.

Note that in contrast to querying the contents of Iceberg metadata tables,
this does not require fully qualified paths, e.g. both
  SHOW METADATA TABLES IN functional_parquet.iceberg_query_metadata;
and
  USE functional_parquet;
  SHOW METADATA TABLES IN iceberg_query_metadata;
work.

The available metadata tables for all Iceberg tables are the same,
corresponding to the values of the enum
"org.apache.iceberg.MetadataTableType", so there is actually no need to
pass the name of the regular table for which the metadata table list is
requested through Thrift. This change, however, does send the table name
because this way
 - if we add support for metadata tables for other table formats, the
   table name/path will be necessary to determine the correct list of
   metadata tables
 - we could later add support for different authorisation policies for
   individual tables
 - we can check also at the point of generating the list of metadata
   tables that the table is an Iceberg table

Testing:
 - added and updated tests in ParserTest, AnalyzeDDLTest, ToSqlTest and
   AuthorizationStmtTest
 - added a custom cluster test in test_authorization.py
 - added functional tests in iceberg-metadata-tables.test

Change-Id: Ide10ccf10fc0abf5c270119ba7092c67e712ec49
---
M be/src/service/client-request-state.cc
M be/src/service/frontend.cc
M be/src/service/frontend.h
M be/src/service/impala-http-handler.cc
M common/thrift/Frontend.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
A fe/src/main/java/org/apache/impala/analysis/ShowMetadataTablesStmt.java
M fe/src/main/java/org/apache/impala/analysis/ShowTablesStmt.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M fe/src/test/java/org/apache/impala/authorization/AuthorizationStmtTest.java
M testdata/datasets/functional/functional_schema_template.sql
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test
M tests/authorization/test_authorization.py
18 files changed, 476 insertions(+), 59 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/21026/15
--
To view, visit http://gerrit.cloudera.org:8080/21026
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ide10ccf10fc0abf5c270119ba7092c67e712ec49
Gerrit-Change-Number: 21026
Gerrit-PatchSet: 15
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-12609: Implement SHOW METADATA TABLES IN statement to list Iceberg Metadata tables

2024-03-30 Thread Daniel Becker (Code Review)
Daniel Becker has uploaded a new patch set (#13). ( 
http://gerrit.cloudera.org:8080/21026 )

Change subject: IMPALA-12609: Implement SHOW METADATA TABLES IN statement to 
list Iceberg Metadata tables
..

IMPALA-12609: Implement SHOW METADATA TABLES IN statement to list Iceberg 
Metadata tables

After this change, the new SHOW METADATA TABLES IN statement can be used
to list all the available metadata tables of an Iceberg table.

Note that in contrast to querying the contents of Iceberg metadata tables,
this does not require fully qualified paths, e.g. both
  SHOW METADATA TABLES IN functional_parquet.iceberg_query_metadata;
and
  USE functional_parquet;
  SHOW METADATA TABLES IN iceberg_query_metadata;
work.

The available metadata tables for all Iceberg tables are the same,
corresponding to the values of the enum
"org.apache.iceberg.MetadataTableType", so there is actually no need to
pass the name of the regular table for which the metadata table list is
requested through Thrift. This change, however, does send the table name
because this way
 - if we add support for metadata tables for other table formats, the
   table name/path will be necessary to determine the correct list of
   metadata tables
 - we could later add support for different authorisation policies for
   individual tables
 - we can check also at the point of generating the list of metadata
   tables that the table is an Iceberg table

Testing:
 - added and updated tests in ParserTest, AnalyzeDDLTest, ToSqlTest and
   AuthorizationStmtTest
 - added a custom cluster test in test_authorization.py
 - added functional tests in iceberg-metadata-tables.test

Change-Id: Ide10ccf10fc0abf5c270119ba7092c67e712ec49
---
M be/src/service/client-request-state.cc
M be/src/service/frontend.cc
M be/src/service/frontend.h
M be/src/service/impala-http-handler.cc
M common/thrift/Frontend.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
A fe/src/main/java/org/apache/impala/analysis/ShowMetadataTablesStmt.java
M fe/src/main/java/org/apache/impala/analysis/ShowTablesStmt.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M fe/src/test/java/org/apache/impala/authorization/AuthorizationStmtTest.java
M testdata/datasets/functional/functional_schema_template.sql
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test
M tests/authorization/test_authorization.py
18 files changed, 476 insertions(+), 59 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/21026/13
--
To view, visit http://gerrit.cloudera.org:8080/21026
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ide10ccf10fc0abf5c270119ba7092c67e712ec49
Gerrit-Change-Number: 21026
Gerrit-PatchSet: 13
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-12609: Implement SHOW METADATA TABLES IN statement to list Iceberg Metadata tables

2024-03-28 Thread Daniel Becker (Code Review)
Daniel Becker has uploaded a new patch set (#12). ( 
http://gerrit.cloudera.org:8080/21026 )

Change subject: IMPALA-12609: Implement SHOW METADATA TABLES IN statement to 
list Iceberg Metadata tables
..

IMPALA-12609: Implement SHOW METADATA TABLES IN statement to list Iceberg 
Metadata tables

After this change, the new SHOW METADATA TABLES IN statement can be used
to list all the available metadata tables of an Iceberg table.

Note that in contrast to querying the contents of Iceberg metadata tables,
this does not require fully qualified paths, e.g. both
  SHOW METADATA TABLES IN functional_parquet.iceberg_query_metadata;
and
  USE functional_parquet;
  SHOW METADATA TABLES IN iceberg_query_metadata;
work.

The available metadata tables for all Iceberg tables are the same,
corresponding to the values of the enum
"org.apache.iceberg.MetadataTableType", so there is actually no need to
pass the name of the regular table for which the metadata table list is
requested through Thrift. This change, however, does send the table name
because this way
 - if we add support for metadata tables for other table formats, the
   table name/path will be necessary to determine the correct list of
   metadata tables
 - we could later add support for different authorisation policies for
   individual tables
 - we can check also at the point of generating the list of metadata
   tables that the table is an Iceberg table

Testing:
 - added and updated tests in ParserTest, AnalyzeDDLTest, ToSqlTest and
   AuthorizationStmtTest
 - added a custom cluster test in test_authorization.py
 - added functional tests in iceberg-metadata-tables.test

Change-Id: Ide10ccf10fc0abf5c270119ba7092c67e712ec49
---
M be/src/service/client-request-state.cc
M be/src/service/frontend.cc
M be/src/service/frontend.h
M be/src/service/impala-http-handler.cc
M common/thrift/Frontend.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
A fe/src/main/java/org/apache/impala/analysis/ShowMetadataTablesStmt.java
M fe/src/main/java/org/apache/impala/analysis/ShowTablesStmt.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M fe/src/test/java/org/apache/impala/authorization/AuthorizationStmtTest.java
M testdata/datasets/functional/functional_schema_template.sql
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test
M tests/authorization/test_authorization.py
18 files changed, 476 insertions(+), 59 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/21026/12
--
To view, visit http://gerrit.cloudera.org:8080/21026
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ide10ccf10fc0abf5c270119ba7092c67e712ec49
Gerrit-Change-Number: 21026
Gerrit-PatchSet: 12
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-12809: Iceberg metadata table scanner should always be scheduled to the coordinator

2024-03-28 Thread Daniel Becker (Code Review)
Daniel Becker has uploaded a new patch set (#12). ( 
http://gerrit.cloudera.org:8080/21138 )

Change subject: IMPALA-12809: Iceberg metadata table scanner should always be 
scheduled to the coordinator
..

IMPALA-12809: Iceberg metadata table scanner should always be scheduled to the 
coordinator

On clusters with dedicated coordinators and executors the Iceberg
metadata scanner fragment(s) can be scheduled to executors, for example
during a join. The fragment in this case will fail a precondition check,
because either the 'frontend_' object or the table will not be present.

This change forces Iceberg metadata scanner fragments to be scheduled on
the coordinator. It is not enough to set the DataPartition type to
UNPARTITIONED, because unpartitioned fragments can still be scheduled on
executors. This change introduces a new flag in the TPlanFragment thrift
struct - if it is true, the fragment is always scheduled on the
coordinator.

Testing:
 - Added a regression test in test_coordinators.py.
 - Added a new planner test with two metadata tables and a regular table
   joined together.

Change-Id: Ib4397f64e9def42d2b84ffd7bc14ff31df27d58e
---
M be/src/scheduling/schedule-state.cc
M be/src/scheduling/schedule-state.h
M be/src/scheduling/scheduler.cc
M common/thrift/Planner.thrift
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/PlanFragment.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
A 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-metadata-table-joined-with-regular-table.test
M tests/custom_cluster/test_coordinators.py
9 files changed, 175 insertions(+), 15 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/38/21138/12
--
To view, visit http://gerrit.cloudera.org:8080/21138
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib4397f64e9def42d2b84ffd7bc14ff31df27d58e
Gerrit-Change-Number: 21138
Gerrit-PatchSet: 12
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-12899: Temporary workaround for BINARY in complex types

2024-03-28 Thread Daniel Becker (Code Review)
Daniel Becker has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/21219


Change subject: IMPALA-12899: Temporary workaround for BINARY in complex types
..

IMPALA-12899: Temporary workaround for BINARY in complex types

The BINARY type is currently not supported inside complex types and a
cross-component decision is probably needed to support it (see
IMPALA-11491). We would like to enable EXPAND_COMPLEX_TYPES for Iceberg
metadata tables (IMPALA-12612), which requires that queries with BINARY
inside complex types don't fail. Enabling EXPAND_COMPLEX_TYPES is a more
prioritised issue than IMPALA-11491, so we have come up with a
temporary solution.

This change NULLs out BINARY values in complex types coming from Iceberg
metadata tables and logs a warning.

BINARYs in complex types from regular tables are not affected by this
change.

Testing:
 - Added test queries in iceberg-metadata-tables.test.

Change-Id: I0d834126c7d702a25e957bb6071ecbf0fda2c203
---
M be/src/exec/iceberg-metadata/iceberg-row-reader.cc
M be/src/exec/iceberg-metadata/iceberg-row-reader.h
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/Path.java
M fe/src/main/java/org/apache/impala/analysis/SlotRef.java
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test
6 files changed, 67 insertions(+), 14 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/19/21219/4
--
To view, visit http://gerrit.cloudera.org:8080/21219
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I0d834126c7d702a25e957bb6071ecbf0fda2c203
Gerrit-Change-Number: 21219
Gerrit-PatchSet: 4
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 


[Impala-ASF-CR] IMPALA-12611: Add support to MAP type Iceberg Metadata table columns

2024-03-28 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21125 )

Change subject: IMPALA-12611: Add support to MAP type Iceberg Metadata table 
columns
..


Patch Set 6:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/21125/4/be/src/exec/iceberg-metadata/iceberg-metadata-scanner.h
File be/src/exec/iceberg-metadata/iceberg-metadata-scanner.h:

http://gerrit.cloudera.org:8080/#/c/21125/4/be/src/exec/iceberg-metadata/iceberg-metadata-scanner.h@63
PS4, Line 63:   /// Note that it returns a GlobalRef, that has to be released 
explicitly.
> Probably I get this comment wrong, but I was searching for some code that r
It is used in IcebergRowReader::WriteCollectionSlot(), it is freed at the end 
of the function:
  env->DeleteLocalRef(collection_scanner);


http://gerrit.cloudera.org:8080/#/c/21125/3/be/src/exec/iceberg-metadata/iceberg-row-reader.cc
File be/src/exec/iceberg-metadata/iceberg-row-reader.cc:

http://gerrit.cloudera.org:8080/#/c/21125/3/be/src/exec/iceberg-metadata/iceberg-row-reader.cc@123
PS3, Line 123:   // Skip the unsupported type and set it to NULL
> Does this set Binary cols NULL now?
It shouldn't, in the backend BINARY fields have TYPE_STRING and a separete 
boolean variable that determines whether it is actually a STRING or a BINARY. 
See 
https://github.com/apache/impala/blob/73171cb7164573349bd53a996a51bb7058b778e0/be/src/runtime/types.h#L216.

On the other hand, I don't know where top-level BINARY fields are NULLed for 
metadata tables.


http://gerrit.cloudera.org:8080/#/c/21125/4/be/src/exec/iceberg-metadata/iceberg-row-reader.cc
File be/src/exec/iceberg-metadata/iceberg-row-reader.cc:

http://gerrit.cloudera.org:8080/#/c/21125/4/be/src/exec/iceberg-metadata/iceberg-row-reader.cc@125
PS4, Line 125:   VLOG(3) << "Skipping unsupported column type: " << 
slot_desc->type().type;
> I think this 'accessed_value' is allocated one level above in the call chai
Done. Added the deallocation to the other caller, WriteArrayItem(), too.


http://gerrit.cloudera.org:8080/#/c/21125/4/be/src/exec/iceberg-metadata/iceberg-row-reader.cc@212
PS4, Line 212:   if constexpr (IS_ARRAY) {
> Let's consider moving these DCHECKs into the same IF-ELSE at L228-234.
If we do that and this function is erroneously called with for example an INT, 
the DCHECK that fires will be the one added for "item_tuple_desc != nullptr", 
not the one about the type. I think type info would help more with debugging 
than info about 'item_tuple_desc'. Also, this is a more basic precondition of 
the function.


http://gerrit.cloudera.org:8080/#/c/21125/4/be/src/exec/iceberg-metadata/iceberg-row-reader.cc@220
PS4, Line 220:   const TupleDescriptor* item_tuple_desc = 
slot_desc->children_tuple_descriptor();
> nit: DCHECK item_tuple_desc is not null?
Done


http://gerrit.cloudera.org:8080/#/c/21125/4/be/src/exec/iceberg-metadata/iceberg-row-reader.cc@236
PS4, Line 236:
> name is misleading now that this is not just for arrays. remaining_items?
Done


http://gerrit.cloudera.org:8080/#/c/21125/4/be/src/exec/iceberg-metadata/iceberg-row-reader.cc@297
PS4, Line 297: jobject value;
 :   RETURN_IF_ERROR(met
> Can the key of a map be a struct?
Done, added a DCHECK instead.


http://gerrit.cloudera.org:8080/#/c/21125/3/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test
File 
testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test:

http://gerrit.cloudera.org:8080/#/c/21125/3/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test@761
PS3, Line 761: 
> Can't we do a map_col.KEY and map_col.VALUE for maps? Would be nice to have
We can't, to access KEY and VALUE we have to cross-unnest the collection, and 
it is not supported yet. See https://issues.apache.org/jira/browse/IMPALA-12853.



--
To view, visit http://gerrit.cloudera.org:8080/21125
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8a8b3a574ca45c893315c3b41b33ce4e0eff865a
Gerrit-Change-Number: 21125
Gerrit-PatchSet: 6
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 28 Mar 2024 15:47:36 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12611: Add support to MAP type Iceberg Metadata table columns

2024-03-28 Thread Daniel Becker (Code Review)
Daniel Becker has uploaded a new patch set (#6). ( 
http://gerrit.cloudera.org:8080/21125 )

Change subject: IMPALA-12611: Add support to MAP type Iceberg Metadata table 
columns
..

IMPALA-12611: Add support to MAP type Iceberg Metadata table columns

This change adds support for querying MAP types from Iceberg Metadata
tables.

The 'IcebergMetadataScanner.ArrayScanner' java class is renamed to
'CollectionScanner' and extended to be able to handle maps. For arrays
the iteration returns the element as before, for maps it returns
'Map.Entry' objects.

Note that collections in the FROM clause are still not supported.

Testing:
- Added E2E tests in iceberg-metadata-tables.test.

Change-Id: I8a8b3a574ca45c893315c3b41b33ce4e0eff865a
---
M be/src/exec/iceberg-metadata/iceberg-metadata-scanner.cc
M be/src/exec/iceberg-metadata/iceberg-metadata-scanner.h
M be/src/exec/iceberg-metadata/iceberg-row-reader.cc
M be/src/exec/iceberg-metadata/iceberg-row-reader.h
M fe/src/main/java/org/apache/impala/util/IcebergMetadataScanner.java
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test
6 files changed, 401 insertions(+), 124 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/25/21125/6
--
To view, visit http://gerrit.cloudera.org:8080/21125
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8a8b3a574ca45c893315c3b41b33ce4e0eff865a
Gerrit-Change-Number: 21125
Gerrit-PatchSet: 6
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-12609: Implement SHOW METADATA TABLES IN statement to list Iceberg Metadata tables

2024-03-28 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21026 )

Change subject: IMPALA-12609: Implement SHOW METADATA TABLES IN statement to 
list Iceberg Metadata tables
..


Patch Set 11:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/21026/11/fe/src/main/java/org/apache/impala/service/JniFrontend.java
File fe/src/main/java/org/apache/impala/service/JniFrontend.java:

http://gerrit.cloudera.org:8080/#/c/21026/11/fe/src/main/java/org/apache/impala/service/JniFrontend.java@279
PS11, Line 279: params.getSession()
> Doesn't it return null anyway if the session is not set?
I'm not sure how Thrift works with Java... If it did, would it need an 
isSet...() function? I didn't want to risk getting some garbage value.



--
To view, visit http://gerrit.cloudera.org:8080/21026
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ide10ccf10fc0abf5c270119ba7092c67e712ec49
Gerrit-Change-Number: 21026
Gerrit-PatchSet: 11
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 28 Mar 2024 15:45:07 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12611: Add support to MAP type Iceberg Metadata table columns

2024-03-28 Thread Daniel Becker (Code Review)
Daniel Becker has uploaded a new patch set (#5). ( 
http://gerrit.cloudera.org:8080/21125 )

Change subject: IMPALA-12611: Add support to MAP type Iceberg Metadata table 
columns
..

IMPALA-12611: Add support to MAP type Iceberg Metadata table columns

This change adds support for querying MAP types from Iceberg Metadata
tables.

The 'IcebergMetadataScanner.ArrayScanner' java class is renamed to
'CollectionScanner' and extended to be able to handle maps. For arrays
the iteration returns the element as before, for maps it returns
'Map.Entry' objects.

Note that collections in the FROM clause are still not supported.

Testing:
- Added E2E tests in iceberg-metadata-tables.test.

Change-Id: I8a8b3a574ca45c893315c3b41b33ce4e0eff865a
---
M be/src/exec/iceberg-metadata/iceberg-metadata-scanner.cc
M be/src/exec/iceberg-metadata/iceberg-metadata-scanner.h
M be/src/exec/iceberg-metadata/iceberg-row-reader.cc
M be/src/exec/iceberg-metadata/iceberg-row-reader.h
M fe/src/main/java/org/apache/impala/util/IcebergMetadataScanner.java
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test
6 files changed, 395 insertions(+), 123 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/25/21125/5
--
To view, visit http://gerrit.cloudera.org:8080/21125
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8a8b3a574ca45c893315c3b41b33ce4e0eff865a
Gerrit-Change-Number: 21125
Gerrit-PatchSet: 5
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Zoltan Borok-Nagy 


  1   2   3   4   5   6   7   8   9   10   >