[Impala-ASF-CR] IMPALA-13085: Add warning and NULL out DECIMAL values in Iceberg metadata tables
Daniel Becker has uploaded a new patch set (#5). ( http://gerrit.cloudera.org:8080/21429 ) Change subject: IMPALA-13085: Add warning and NULL out DECIMAL values in Iceberg metadata tables .. IMPALA-13085: Add warning and NULL out DECIMAL values in Iceberg metadata tables DECIMAL values are not supported in Iceberg metadata tables and Impala runs on a DCHECK and crashes if it encounters one. Until this issue is properly fixed (see IMPALA-13080), this commit introduces a temporary solution: DECIMAL values coming from Iceberg metadata tables are NULLed out and a warning is issued. Testing: - added a DECIMAL column to the 'iceberg_metadata_alltypes' test table, so querying the `files` metadata table will include a DECIMAL in the 'readable_metrics' struct. Change-Id: I0c8791805bc4fa2112e092e65366ca2815f3fa22 --- M be/src/exec/iceberg-metadata/iceberg-row-reader.cc M be/src/exec/iceberg-metadata/iceberg-row-reader.h M testdata/datasets/functional/functional_schema_template.sql M testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test 4 files changed, 31 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/29/21429/5 -- To view, visit http://gerrit.cloudera.org:8080/21429 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I0c8791805bc4fa2112e092e65366ca2815f3fa22 Gerrit-Change-Number: 21429 Gerrit-PatchSet: 5 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Peter Rozsa
[Impala-ASF-CR] IMPALA-13091: query test.test iceberg.TestIcebergV2Table.test metadata tables fails on an expected constant
Daniel Becker has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/21440 ) Change subject: IMPALA-13091: query_test.test_iceberg.TestIcebergV2Table.test_metadata_tables fails on an expected constant .. IMPALA-13091: query_test.test_iceberg.TestIcebergV2Table.test_metadata_tables fails on an expected constant IMPALA-13079 added a test in iceberg-metadata-tables.test that included assertions about values that can change across builds, e.g. file sizes, which caused test failures. This commit fixes it by doing two things: 1. narrowing down the result set of the query to the column that the test is really about - this removes some of the problematic values 2. using regexes for the remaining problematic values. Change-Id: Ic056079eed87a68afa95cd111ce2037314cd9620 Reviewed-on: http://gerrit.cloudera.org:8080/21440 Tested-by: Impala Public Jenkins Reviewed-by: Riza Suminto --- M testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test 1 file changed, 3 insertions(+), 3 deletions(-) Approvals: Impala Public Jenkins: Verified Riza Suminto: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/21440 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Ic056079eed87a68afa95cd111ce2037314cd9620 Gerrit-Change-Number: 21440 Gerrit-PatchSet: 2 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Peter Rozsa Gerrit-Reviewer: Riza Suminto
[Impala-ASF-CR] IMPALA-13091: query test.test iceberg.TestIcebergV2Table.test metadata tables fails on an expected constant
Daniel Becker has uploaded this change for review. ( http://gerrit.cloudera.org:8080/21440 Change subject: IMPALA-13091: query_test.test_iceberg.TestIcebergV2Table.test_metadata_tables fails on an expected constant .. IMPALA-13091: query_test.test_iceberg.TestIcebergV2Table.test_metadata_tables fails on an expected constant IMPALA-13079 added a test in iceberg-metadata-tables.test that included assertions about values that can change across builds, e.g. file sizes, which caused test failures. This commit fixes it by doing two things: 1. narrowing down the result set of the query to the column that the test is really about - this removes some of the problematic values 2. using regexes for the remaining problematic values. Change-Id: Ic056079eed87a68afa95cd111ce2037314cd9620 --- M testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test 1 file changed, 3 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/40/21440/1 -- To view, visit http://gerrit.cloudera.org:8080/21440 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ic056079eed87a68afa95cd111ce2037314cd9620 Gerrit-Change-Number: 21440 Gerrit-PatchSet: 1 Gerrit-Owner: Daniel Becker
[Impala-ASF-CR] IMPALA-13085: Add warning and NULL out DECIMAL values in Iceberg metadata tables
Daniel Becker has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/21429 ) Change subject: IMPALA-13085: Add warning and NULL out DECIMAL values in Iceberg metadata tables .. IMPALA-13085: Add warning and NULL out DECIMAL values in Iceberg metadata tables DECIMAL values are not supported in Iceberg metadata tables and Impala runs on a DCHECK and crashes if it encounters one. Until this issue is properly fixed (see IMPALA-13080), this commit introduces a temporary solution: DECIMAL values coming from Iceberg metadata tables are NULLed out and a warning is issued. Testing: - added a DECIMAL column to the 'iceberg_metadata_alltypes' test table, so querying the `files` metadata table will include a DECIMAL in the 'readable_metrics' struct. Change-Id: I0c8791805bc4fa2112e092e65366ca2815f3fa22 --- M be/src/exec/iceberg-metadata/iceberg-row-reader.cc M be/src/exec/iceberg-metadata/iceberg-row-reader.h M testdata/datasets/functional/functional_schema_template.sql M testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test 4 files changed, 31 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/29/21429/3 -- To view, visit http://gerrit.cloudera.org:8080/21429 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I0c8791805bc4fa2112e092e65366ca2815f3fa22 Gerrit-Change-Number: 21429 Gerrit-PatchSet: 3 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Peter Rozsa
[Impala-ASF-CR] IMPALA-13085: Add warning and NULL out DECIMAL values in Iceberg metadata tables
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/21429 ) Change subject: IMPALA-13085: Add warning and NULL out DECIMAL values in Iceberg metadata tables .. Patch Set 3: (3 comments) http://gerrit.cloudera.org:8080/#/c/21429/1/be/src/exec/iceberg-metadata/iceberg-row-reader.cc File be/src/exec/iceberg-metadata/iceberg-row-reader.cc: http://gerrit.cloudera.org:8080/#/c/21429/1/be/src/exec/iceberg-metadata/iceberg-row-reader.cc@125 PS1, Line 125: > represented as NULL, or displayed as NULL. NULLed out sounds a bit unusual. Done http://gerrit.cloudera.org:8080/#/c/21429/1/be/src/exec/iceberg-metadata/iceberg-row-reader.cc@125 PS1, Line 125: RETURN_IF_ERROR(WriteDecimalSlot(slot_desc, tuple, state)); > The warning string could be extracted as a constant. I made it a constexpr char array. I don't think it would make sense to extract it out of this function, it is only used here and makes the function more readable. http://gerrit.cloudera.org:8080/#/c/21429/1/be/src/exec/iceberg-metadata/iceberg-row-reader.cc@126 PS1, Line 126: break; > With this approach, the warning will be posted as many times the query enco I introduced a member that tracks whether the warning has already been emitted. Therefore one warning will be emitted for each scanner that encounters a DECIMAL from a metadata table. Also changed the type of the error from TErrorCode::GENERAL to TErrorCode::NOT_IMPLEMENTED_ERROR so these errors will be aggregated and displayed only once even if there are multiple scanners reading DECIMALs from metadata tables. -- To view, visit http://gerrit.cloudera.org:8080/21429 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0c8791805bc4fa2112e092e65366ca2815f3fa22 Gerrit-Change-Number: 21429 Gerrit-PatchSet: 3 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Peter Rozsa Gerrit-Comment-Date: Thu, 16 May 2024 11:01:54 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-13085: Add warning and NULL out DECIMAL values in Iceberg metadata tables
Daniel Becker has uploaded this change for review. ( http://gerrit.cloudera.org:8080/21429 Change subject: IMPALA-13085: Add warning and NULL out DECIMAL values in Iceberg metadata tables .. IMPALA-13085: Add warning and NULL out DECIMAL values in Iceberg metadata tables DECIMAL values are not supported in Iceberg metadata tables and Impala runs on a DCHECK and crashes if it encounters one. Until this issue is properly fixed (see IMPALA-13080), this commit introduces a temporary solution: DECIMAL values coming from Iceberg metadata tables are NULLed out and a warning is issued. Testing: - added a DECIMAL column to the 'iceberg_metadata_alltypes' test table, so querying the `files` metadata table will include a DECIMAL in the 'readable_metrics' struct. Change-Id: I0c8791805bc4fa2112e092e65366ca2815f3fa22 --- M be/src/exec/iceberg-metadata/iceberg-row-reader.cc M testdata/datasets/functional/functional_schema_template.sql M testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test 3 files changed, 13 insertions(+), 4 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/29/21429/1 -- To view, visit http://gerrit.cloudera.org:8080/21429 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I0c8791805bc4fa2112e092e65366ca2815f3fa22 Gerrit-Change-Number: 21429 Gerrit-PatchSet: 1 Gerrit-Owner: Daniel Becker
[Impala-ASF-CR] IMPALA-13079: Add support for FLOAT/DOUBLE in Iceberg metadata tables
Daniel Becker has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/21425 ) Change subject: IMPALA-13079: Add support for FLOAT/DOUBLE in Iceberg metadata tables .. IMPALA-13079: Add support for FLOAT/DOUBLE in Iceberg metadata tables Until now, the float and double data types were not supported in Iceberg metadata tables. This commit adds support for them. Testing: - added a test table that contains all primitive types (except for decimal, which is still not supported), a struct, an array and a map - added a test query that queries the `files` metadata table of the above table - the 'readable_metrics' struct contains lower and upper bounds for all columns in the original table, with the original type Change-Id: I2171c9aa9b6d2b634b8c511263b1610cb1d7cb29 --- M be/src/exec/iceberg-metadata/iceberg-row-reader.cc M be/src/exec/iceberg-metadata/iceberg-row-reader.h M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv M testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test 5 files changed, 122 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/25/21425/3 -- To view, visit http://gerrit.cloudera.org:8080/21425 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I2171c9aa9b6d2b634b8c511263b1610cb1d7cb29 Gerrit-Change-Number: 21425 Gerrit-PatchSet: 3 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Peter Rozsa
[Impala-ASF-CR] IMPALA-13079: Add support for FLOAT/DOUBLE in Iceberg metadata tables
Daniel Becker has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/21425 ) Change subject: IMPALA-13079: Add support for FLOAT/DOUBLE in Iceberg metadata tables .. IMPALA-13079: Add support for FLOAT/DOUBLE in Iceberg metadata tables Until now, the float and double data types were not supported in Iceberg metadata tables. This commit adds support for them. Testing: - added a test table that contains all primitive types (except for decimal, which is still not supported), a struct, an array and a map - added a test query that queries the `files` metadata table of the above table - the 'readable_metrics' struct contains lower and upper bounds for all columns in the original table, with the original type Change-Id: I2171c9aa9b6d2b634b8c511263b1610cb1d7cb29 --- M be/src/exec/iceberg-metadata/iceberg-row-reader.cc M be/src/exec/iceberg-metadata/iceberg-row-reader.h M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv M testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test 5 files changed, 105 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/25/21425/2 -- To view, visit http://gerrit.cloudera.org:8080/21425 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I2171c9aa9b6d2b634b8c511263b1610cb1d7cb29 Gerrit-Change-Number: 21425 Gerrit-PatchSet: 2 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Peter Rozsa
[Impala-ASF-CR] IMPALA-13079: Add support for FLOAT/DOUBLE in Iceberg metadata tables
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/21425 ) Change subject: IMPALA-13079: Add support for FLOAT/DOUBLE in Iceberg metadata tables .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/21425/1/testdata/datasets/functional/functional_schema_template.sql File testdata/datasets/functional/functional_schema_template.sql: http://gerrit.cloudera.org:8080/#/c/21425/1/testdata/datasets/functional/functional_schema_template.sql@3916 PS1, Line 3916: INSERT INTO {db_name}{db_suffix}.{table_name} VALUES ( > nit: More rows with different values could give better insight into how upp You're right, added a new row. -- To view, visit http://gerrit.cloudera.org:8080/21425 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2171c9aa9b6d2b634b8c511263b1610cb1d7cb29 Gerrit-Change-Number: 21425 Gerrit-PatchSet: 2 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Peter Rozsa Gerrit-Comment-Date: Wed, 15 May 2024 12:18:10 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-13079: Add support for FLOAT/DOUBLE in Iceberg metadata tables
Daniel Becker has uploaded this change for review. ( http://gerrit.cloudera.org:8080/21425 Change subject: IMPALA-13079: Add support for FLOAT/DOUBLE in Iceberg metadata tables .. IMPALA-13079: Add support for FLOAT/DOUBLE in Iceberg metadata tables Until now, the float and double data types were not supported in Iceberg metadata tables. This commit adds support for them. Testing: - added a test table that contains all primitive types (except for decimal, which is still not supported), a struct, an array and a map - added a test query that queries the `files` metadata table of the above table - the 'readable_metrics' struct contains lower and upper bounds for all columns in the original table, with the original type Change-Id: I2171c9aa9b6d2b634b8c511263b1610cb1d7cb29 --- M be/src/exec/iceberg-metadata/iceberg-row-reader.cc M be/src/exec/iceberg-metadata/iceberg-row-reader.h M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv M testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test 5 files changed, 92 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/25/21425/1 -- To view, visit http://gerrit.cloudera.org:8080/21425 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I2171c9aa9b6d2b634b8c511263b1610cb1d7cb29 Gerrit-Change-Number: 21425 Gerrit-PatchSet: 1 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Peter Rozsa
[Impala-ASF-CR] IMPALA-13036: Document Iceberg metadata tables
Daniel Becker has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/21387 ) Change subject: IMPALA-13036: Document Iceberg metadata tables .. IMPALA-13036: Document Iceberg metadata tables This change adds documentation on how Iceberg metadata tables can be used. Testing: - built docs locally Change-Id: Ic453f567b814cb4363a155e2008029e94efb6ed1 Reviewed-on: http://gerrit.cloudera.org:8080/21387 Tested-by: Impala Public Jenkins Reviewed-by: Peter Rozsa --- M docs/topics/impala_iceberg.xml 1 file changed, 72 insertions(+), 0 deletions(-) Approvals: Impala Public Jenkins: Verified Peter Rozsa: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/21387 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Ic453f567b814cb4363a155e2008029e94efb6ed1 Gerrit-Change-Number: 21387 Gerrit-PatchSet: 3 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Peter Rozsa
[Impala-ASF-CR] IMPALA-13055: Some Iceberg metadata table tests don't assert
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/21394 ) Change subject: IMPALA-13055: Some Iceberg metadata table tests don't assert .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/21394 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie47093f25a70253b3e6faca27d466d7cf6999fad Gerrit-Change-Number: 21394 Gerrit-PatchSet: 2 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Thu, 09 May 2024 11:53:08 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/21131 ) Change subject: IMPALA-11499: Refactor UrlEncode function to handle special characters .. Patch Set 27: Code-Review+2 Thanks Pranav! -- To view, visit http://gerrit.cloudera.org:8080/21131 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb Gerrit-Change-Number: 21131 Gerrit-PatchSet: 27 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Zihao Ye Gerrit-Comment-Date: Thu, 09 May 2024 10:06:59 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/21131 ) Change subject: IMPALA-11499: Refactor UrlEncode function to handle special characters .. Patch Set 23: (1 comment) http://gerrit.cloudera.org:8080/#/c/21131/23/tests/query_test/test_insert.py File tests/query_test/test_insert.py: http://gerrit.cloudera.org:8080/#/c/21131/23/tests/query_test/test_insert.py@301 PS23, Line 301: "create table {}(i int) partitioned by (p string) stored as parquet".format(tbl)) One more thing: you should run this test with an Iceberg table too. You could extract the common parts (mostly everything except for the create and insert statements) into a helper function. -- To view, visit http://gerrit.cloudera.org:8080/21131 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb Gerrit-Change-Number: 21131 Gerrit-PatchSet: 23 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Zihao Ye Gerrit-Comment-Date: Tue, 07 May 2024 14:19:54 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/21131 ) Change subject: IMPALA-11499: Refactor UrlEncode function to handle special characters .. Patch Set 23: (2 comments) http://gerrit.cloudera.org:8080/#/c/21131/23/tests/query_test/test_insert.py File tests/query_test/test_insert.py: http://gerrit.cloudera.org:8080/#/c/21131/23/tests/query_test/test_insert.py@320 PS23, Line 320: result line "the result line..." http://gerrit.cloudera.org:8080/#/c/21131/23/tests/query_test/test_insert.py@321 PS23, Line 321: the values present inside it are just verified. "we only verify that the value is present in string representation of the whole row" -- To view, visit http://gerrit.cloudera.org:8080/21131 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb Gerrit-Change-Number: 21131 Gerrit-PatchSet: 23 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Zihao Ye Gerrit-Comment-Date: Tue, 07 May 2024 12:46:40 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/21131 ) Change subject: IMPALA-11499: Refactor UrlEncode function to handle special characters .. Patch Set 23: (2 comments) Just a few nits. http://gerrit.cloudera.org:8080/#/c/21131/23/be/src/util/coding-util-test.cc File be/src/util/coding-util-test.cc: http://gerrit.cloudera.org:8080/#/c/21131/23/be/src/util/coding-util-test.cc@94 PS23, Line 94: TestUrl("/home/impala/directory/", "%2Fhome%2Fimpala%2Fdirectory%2F", false); These strings could also be extracted into variables. http://gerrit.cloudera.org:8080/#/c/21131/23/tests/query_test/test_insert.py File tests/query_test/test_insert.py: http://gerrit.cloudera.org:8080/#/c/21131/23/tests/query_test/test_insert.py@320 PS23, Line 320: are "is a" -- To view, visit http://gerrit.cloudera.org:8080/21131 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb Gerrit-Change-Number: 21131 Gerrit-PatchSet: 23 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Zihao Ye Gerrit-Comment-Date: Tue, 07 May 2024 12:24:14 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-13047: Support restarting/killing specified impalads in start-impala-cluster.py
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/21376 ) Change subject: IMPALA-13047: Support restarting/killing specified impalads in start-impala-cluster.py .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/21376/1/bin/start-impala-cluster.py File bin/start-impala-cluster.py: http://gerrit.cloudera.org:8080/#/c/21376/1/bin/start-impala-cluster.py@1118 PS1, Line 1118: existing_cluster_size = cluster_ops.get_cluster().get_max_impalad_id() + 1 > Yeah, that's an issue. Removed this check when using --kill. Also improved It's still not good if you get the existing cluster size from the max impalad id: then killing the last and the first impalas will cause different behaviour in the future. For example start 3, kill number 2, then restart number 0, it runs on the assertion again. Isn't it possible to use the length of cluster_ops.get_cluster().__impalads? -- To view, visit http://gerrit.cloudera.org:8080/21376 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id4a863853341959eb6870b662d82188e7d570971 Gerrit-Change-Number: 21376 Gerrit-PatchSet: 2 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Mon, 06 May 2024 15:21:01 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-13055: Some Iceberg metadata table tests don't assert
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/21394 ) Change subject: IMPALA-13055: Some Iceberg metadata table tests don't assert .. Patch Set 1: (5 comments) Thanks Gábor! http://gerrit.cloudera.org:8080/#/c/21394/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21394/1//COMMIT_MSG@7 PS1, Line 7: IMPALA-13055: Some Iceberg metadata table tests don't assert Do you think it is an error in the test framework? If so, could you open a separate Jira for it with repro instructions? http://gerrit.cloudera.org:8080/#/c/21394/1//COMMIT_MSG@11 PS1, Line 11: unconditionally passes We should make it clear the it is not the actual result from Impala that can be anything but the "expected" string. http://gerrit.cloudera.org:8080/#/c/21394/1/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test File testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test: http://gerrit.cloudera.org:8080/#/c/21394/1/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test@49 PS1, Line 49: : VERIFY_IS_SUBSET Is adding VERIFY_IS_SUBSET needed? I ran the test locally without it and it passed. http://gerrit.cloudera.org:8080/#/c/21394/1/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test@365 PS1, Line 365: # Currently not supported, complex type slots are set to NULL (IMPALA-12205) This comment is stale, we should change it. http://gerrit.cloudera.org:8080/#/c/21394/1/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test@371 PS1, Line 371: '{.*}' As this test section deals with complex types, I think we shouldn't put a wildcard for all the result. We should include details and use regexes for easily changing values (file size etc.). -- To view, visit http://gerrit.cloudera.org:8080/21394 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie47093f25a70253b3e6faca27d466d7cf6999fad Gerrit-Change-Number: 21394 Gerrit-PatchSet: 1 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Mon, 06 May 2024 14:15:44 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-13035: Querying metadata tables from non-Iceberg tables throws IllegalArgumentException
Daniel Becker has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/21361 ) Change subject: IMPALA-13035: Querying metadata tables from non-Iceberg tables throws IllegalArgumentException .. IMPALA-13035: Querying metadata tables from non-Iceberg tables throws IllegalArgumentException When attempting to query a metadata table of a non-Iceberg table the analyzer throws 'IllegalArgumentException'. The problem is that 'IcebergMetadataTable.isIcebergMetadataTable()' doesn't actually check whether the given path belongs to a valid metadata table, it only checks whether the path could syntactically refer to one. This is because it is called in 'Path.getCandidateTables()', at which point analysis has not been done yet. However, 'IcebergMetadataTable.isIcebergMetadataTable()' is also called in 'Analyzer.getTable()'. If 'isIcebergMetadataTable()' returns true, 'getTable()' tries to instantiate an 'IcebergMetadataTable' object with the table ref of the base table. If that table is not an Iceberg table, a precondition check fails. This change renames 'isIcebergMetadataTable()' to 'canBeIcebergMetadataTable()' and adds a new 'isIcebergMetadataTable()' function, which also takes an 'Analyzer' as a parameter. With the help of the 'Analyzer' it is possible to determine whether the base table is an Iceberg table. 'Analyzer.getTable()' then uses this new 'isIcebergMetadataTable()' function instead of canBeIcebergMetadataTable(). The constructor of 'IcebergMetadataTable' is also modified to take an 'FeIcebergTable' as the parameter for the base table instead of a general 'FeTable'. Testing: - Added a test query in iceberg-metadata-tables.test. Change-Id: Ia7c25ed85a8813011537c73f0aaf72db1501f9ef Reviewed-on: http://gerrit.cloudera.org:8080/21361 Tested-by: Impala Public Jenkins Reviewed-by: Peter Rozsa --- M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/Path.java M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergMetadataTable.java M fe/src/main/java/org/apache/impala/service/DescribeResultFactory.java M fe/src/main/java/org/apache/impala/service/Frontend.java M testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test 6 files changed, 52 insertions(+), 16 deletions(-) Approvals: Impala Public Jenkins: Verified Peter Rozsa: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/21361 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Ia7c25ed85a8813011537c73f0aaf72db1501f9ef Gerrit-Change-Number: 21361 Gerrit-PatchSet: 5 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Peter Rozsa
[Impala-ASF-CR] IMPALA-13036: Document Iceberg metadata tables
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/21387 ) Change subject: IMPALA-13036: Document Iceberg metadata tables .. Patch Set 2: (5 comments) http://gerrit.cloudera.org:8080/#/c/21387/1/docs/topics/impala_iceberg.xml File docs/topics/impala_iceberg.xml: http://gerrit.cloudera.org:8080/#/c/21387/1/docs/topics/impala_iceberg.xml@729 PS1, Line 729: > nit:, I don't think a comma is needed before "and". http://gerrit.cloudera.org:8080/#/c/21387/1/docs/topics/impala_iceberg.xml@730 PS1, Line 730: fr > I'm not sure: from? Yes, neither is perfect, I changed it to "from". http://gerrit.cloudera.org:8080/#/c/21387/1/docs/topics/impala_iceberg.xml@730 PS1, Line 730: > nit:, I don't think a comma is needed here. http://gerrit.cloudera.org:8080/#/c/21387/1/docs/topics/impala_iceberg.xml@730 PS1, Line 730: > *direclty Added a sentence at the end to make it clearer. http://gerrit.cloudera.org:8080/#/c/21387/1/docs/topics/impala_iceberg.xml@770 PS1, Line 770: > *are The subject is "unnesting", which is singular, that's why it's "is". -- To view, visit http://gerrit.cloudera.org:8080/21387 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic453f567b814cb4363a155e2008029e94efb6ed1 Gerrit-Change-Number: 21387 Gerrit-PatchSet: 2 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Peter Rozsa Gerrit-Comment-Date: Fri, 03 May 2024 09:45:39 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-13036: Document Iceberg metadata tables
Daniel Becker has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/21387 ) Change subject: IMPALA-13036: Document Iceberg metadata tables .. IMPALA-13036: Document Iceberg metadata tables This change adds documentation on how Iceberg metadata tables can be used. Testing: - built docs locally Change-Id: Ic453f567b814cb4363a155e2008029e94efb6ed1 --- M docs/topics/impala_iceberg.xml 1 file changed, 72 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/87/21387/2 -- To view, visit http://gerrit.cloudera.org:8080/21387 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ic453f567b814cb4363a155e2008029e94efb6ed1 Gerrit-Change-Number: 21387 Gerrit-PatchSet: 2 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Peter Rozsa
[Impala-ASF-CR] IMPALA-13036: Document Iceberg metadata tables
Daniel Becker has uploaded this change for review. ( http://gerrit.cloudera.org:8080/21387 Change subject: IMPALA-13036: Document Iceberg metadata tables .. IMPALA-13036: Document Iceberg metadata tables This change adds documentation on how Iceberg metadata tables can be used. Testing: - built docs locally Change-Id: Ic453f567b814cb4363a155e2008029e94efb6ed1 --- M docs/topics/impala_iceberg.xml 1 file changed, 70 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/87/21387/1 -- To view, visit http://gerrit.cloudera.org:8080/21387 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ic453f567b814cb4363a155e2008029e94efb6ed1 Gerrit-Change-Number: 21387 Gerrit-PatchSet: 1 Gerrit-Owner: Daniel Becker
[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/21131 ) Change subject: IMPALA-11499: Refactor UrlEncode function to handle special characters .. Patch Set 15: (2 comments) Thanks Pranav! In the meantime I discovered something else, but that shouldn't be too hard to fix. http://gerrit.cloudera.org:8080/#/c/21131/15/be/src/util/coding-util.cc File be/src/util/coding-util.cc: http://gerrit.cloudera.org:8080/#/c/21131/15/be/src/util/coding-util.cc@66 PS15, Line 66: ss << '%' << static_cast(ch); I played with it a bit and it doesn't work with for example '\t' and '\n' (for example insert into my_part_tbl partition(p='a\tt') values (0); ). Let's take '\t'. Its value is 9, which only has a single hexadecimal digit. Now we'll escape it as "%9" but Hive has "%09". To match Hive's behaviour, do this: 1. Add this include in the appropriate include group: #include 2. Add "<< setfill('0')" after "hex" on L58. 3. Add "<< setw(2)" after inserting the % sign on L66. This will pad hex values so that they are always at least 2 chars wide. Note that setw is reset after insertion, so it has to be within the loop. This also reveals a test deficiency: we should ideally test every character that is in 'SpecialCharacters'. I don't know if we can insert arbitrary byte values in Impala strings, maybe Quanlong can help there. But even if that's not possible, we should cover the ones for which an escape sequence exists (i.e. '\t', '\n', '\v' etc.). http://gerrit.cloudera.org:8080/#/c/21131/15/be/src/util/coding-util.cc@66 PS15, Line 66: ch I think it would be good if we first cast it to unsigned char because of what happens if we encounter a character > 127. This is possible if the input contains one, and we are in non-Hive-compat mode. If char is a signed type (that is implementation dependent), then the value > 127 will be interpreted as a negative number. When we convert it to uint32_t, it will be sign-extended. For example, instead of "%FF" we will have "%". If we convert it first to unsigned char, it will not be sign-extended. -- To view, visit http://gerrit.cloudera.org:8080/21131 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb Gerrit-Change-Number: 21131 Gerrit-PatchSet: 15 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Zihao Ye Gerrit-Comment-Date: Tue, 30 Apr 2024 14:01:31 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/21131 ) Change subject: IMPALA-11499: Refactor UrlEncode function to handle special characters .. Patch Set 13: (4 comments) http://gerrit.cloudera.org:8080/#/c/21131/13//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21131/13//COMMIT_MSG@18 PS13, Line 18: addresses makes http://gerrit.cloudera.org:8080/#/c/21131/13//COMMIT_MSG@18 PS13, Line 18: two I've almost forgotten the third: the list of characters to be escaped was extended to match the current list in Hive. http://gerrit.cloudera.org:8080/#/c/21131/13//COMMIT_MSG@19 PS13, Line 19: there was a "set" I didn't mean it as a literal text to be included. Instead, it could be: "Before the change, the set of characters that need to be escaped was stored as a string. The current patch uses an unordered_set instead." http://gerrit.cloudera.org:8080/#/c/21131/13//COMMIT_MSG@25 PS13, Line 25: Nit: "... byte and whose inclusion ...". -- To view, visit http://gerrit.cloudera.org:8080/21131 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb Gerrit-Change-Number: 21131 Gerrit-PatchSet: 13 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Zihao Ye Gerrit-Comment-Date: Tue, 30 Apr 2024 11:31:32 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-13047: Support restarting/killing specified impalads in start-impala-cluster.py
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/21376 ) Change subject: IMPALA-13047: Support restarting/killing specified impalads in start-impala-cluster.py .. Patch Set 1: (4 comments) http://gerrit.cloudera.org:8080/#/c/21376/1/bin/start-impala-cluster.py File bin/start-impala-cluster.py: http://gerrit.cloudera.org:8080/#/c/21376/1/bin/start-impala-cluster.py@95 PS1, Line 95: Start Nit: Starts. http://gerrit.cloudera.org:8080/#/c/21376/1/bin/start-impala-cluster.py@275 PS1, Line 275: kill_processes([t for t in find_user_processes(binary_names)], force) Could use "list(find_user_processes(binary_names))" instead of list comprehension. http://gerrit.cloudera.org:8080/#/c/21376/1/bin/start-impala-cluster.py@833 PS1, Line 833: node_ids[i] Optional: could extract into a variable, also used on L840. http://gerrit.cloudera.org:8080/#/c/21376/1/bin/start-impala-cluster.py@1118 PS1, Line 1118: assert existing_cluster_size == expected_cluster_size,\ If we kill some impalads, doesn't that change the cluster size? If I try to kill impalad1 first, then impalad2, this assertion is triggered. $ bin/start-impala-cluster.py $ bin/start-impala-cluster.py --kill --impalad_ids 1 13:13:50 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es) 13:13:50 MainThread: Killed impalad (pid 369056) $ bin/start-impala-cluster.py --kill --impalad_ids 2 13:13:53 MainThread: Found 2 impalad/1 statestored/1 catalogd process(es) Traceback (most recent call last): File "bin/start-impala-cluster.py", line 1119, in "Commands on specified impalads shouldn't change the cluster size" AssertionError: Commands on specified impalads shouldn't change the cluster size -- To view, visit http://gerrit.cloudera.org:8080/21376 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id4a863853341959eb6870b662d82188e7d570971 Gerrit-Change-Number: 21376 Gerrit-PatchSet: 1 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Tue, 30 Apr 2024 11:16:29 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/21131 ) Change subject: IMPALA-11499: Refactor UrlEncode function to handle special characters .. Patch Set 12: (6 comments) Thanks Pranav, we're close. http://gerrit.cloudera.org:8080/#/c/21131/12//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21131/12//COMMIT_MSG@15 PS12, Line 15: ' Use double quotes here ("\u00FF"), this is not a valid character literal but it is a valid string literal. Same also later on this line. http://gerrit.cloudera.org:8080/#/c/21131/12//COMMIT_MSG@18 PS12, Line 18: the special characters "the special characters that need to be escaped" would be better, it's not all special chars. http://gerrit.cloudera.org:8080/#/c/21131/12//COMMIT_MSG@18 PS12, Line 18: , Nit: no need for the comma here. http://gerrit.cloudera.org:8080/#/c/21131/12//COMMIT_MSG@18 PS12, Line 18: included I think it could be formulated better. There are two things that have changed: 1. '\xFF' was corrected to '\x7F' and 2. before this change we also had a "set" of characters that needed to be encoded, but it used to be stored as a string and now it's stored as an unordered_set (which is much better in my opinion). I think we should mention these separately. http://gerrit.cloudera.org:8080/#/c/21131/12/be/src/util/coding-util.cc File be/src/util/coding-util.cc: http://gerrit.cloudera.org:8080/#/c/21131/12/be/src/util/coding-util.cc@19 PS12, Line 19: #include Let's fix include groups: there should be an empty line after #include "util/coding-util.h": after it come the standard library includes. Also, leave a line after because the boost and sasl includes are includes from other libraries than std. Usually we group includes like this: 1. header belonging to the current .cc file 2. standard library includes 3. other (third party) library includes 4. impala includes Each group is ordered alphabetically and there is an empty line between groups. http://gerrit.cloudera.org:8080/#/c/21131/12/be/src/util/coding-util.cc@50 PS12, Line 50: '\x0C' This could be '\f' (form feed), see https://www.compart.com/en/unicode/U+000C. -- To view, visit http://gerrit.cloudera.org:8080/21131 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb Gerrit-Change-Number: 21131 Gerrit-PatchSet: 12 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Zihao Ye Gerrit-Comment-Date: Tue, 30 Apr 2024 10:36:08 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-13046: Update Iceberg mixed format deletes test
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/21373 ) Change subject: IMPALA-13046: Update Iceberg mixed format deletes test .. Patch Set 1: Code-Review+2 Thanks for fixing this, Michael. -- To view, visit http://gerrit.cloudera.org:8080/21373 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I87c23cc541983223c6b766372f4e582c33ae6836 Gerrit-Change-Number: 21373 Gerrit-PatchSet: 1 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 30 Apr 2024 09:48:13 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/21131 ) Change subject: IMPALA-11499: Refactor UrlEncode function to handle special characters .. Patch Set 11: (17 comments) http://gerrit.cloudera.org:8080/#/c/21131/11//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21131/11//COMMIT_MSG@9 PS11, Line 9: The An error - we're introducing it now. http://gerrit.cloudera.org:8080/#/c/21131/11//COMMIT_MSG@14 PS11, Line 14: was matching with matched one of the ... http://gerrit.cloudera.org:8080/#/c/21131/11//COMMIT_MSG@15 PS11, Line 15: '\u00FF' '\u00FF' is not a valid character literal in C++. "\u00FF" was included in a string, it was part of a string literal. At last I found out that it can actually be used in C++ to specify arbitrary unicode values, but of course it may be encoded on multiple bytes, see https://en.cppreference.com/w/cpp/language/escape. Also include in the commit message that including this character was probably a mistake from the beginning, it should have been '\x7F'. http://gerrit.cloudera.org:8080/#/c/21131/11//COMMIT_MSG@18 PS11, Line 18: '\xFF' See the comment above, we should explain that it was a mistake from the beginning. http://gerrit.cloudera.org:8080/#/c/21131/11//COMMIT_MSG@22 PS11, Line 22: Some It would be good to include in which file the new tests are. http://gerrit.cloudera.org:8080/#/c/21131/11//COMMIT_MSG@22 PS11, Line 22: for both parquet and iceberg Parquet and Iceberg are not exclusive things, we also use Parquet in our Iceberg tables. Parquet is a file format and Iceberg is a table format. We could say "tests on both traditional Hive tables and Iceberg tables ...". http://gerrit.cloudera.org:8080/#/c/21131/11/be/src/util/coding-util.cc File be/src/util/coding-util.cc: http://gerrit.cloudera.org:8080/#/c/21131/11/be/src/util/coding-util.cc@25 PS11, Line 25: #include The includes within a group should be in alphabetical order, so this should go before L23. http://gerrit.cloudera.org:8080/#/c/21131/11/be/src/util/coding-util.cc@31 PS11, Line 31: #include "sasl/saslutil.h" Includes from the standard library should be the first group after the header, i.e. this should come after #include . http://gerrit.cloudera.org:8080/#/c/21131/11/be/src/util/coding-util.cc@51 PS11, Line 51: '\x20' > Represents space character For those characters that have a readable literal representation, e.g. " ", "\n" etc., we should use that even if Hive doesn't. http://gerrit.cloudera.org:8080/#/c/21131/11/be/src/util/coding-util.cc@57 PS11, Line 57: Used It would be clearer like this: "uppercase" and "hex" only affect the insertion of integers, not that of char values. http://gerrit.cloudera.org:8080/#/c/21131/11/be/src/util/coding-util.cc@62 PS11, Line 62: // Hive-compat mode, and the character is not alphanumeric or is one This doesn't match the code. Try this: "... b) we are not in Hive-compat mode and the character is not alphanumeric and it is not one of the characters specifically excluded from escaping (see ShouldNotEscape())." http://gerrit.cloudera.org:8080/#/c/21131/11/testdata/workloads/functional-query/queries/QueryTest/unicode-column-name.test File testdata/workloads/functional-query/queries/QueryTest/unicode-column-name.test: http://gerrit.cloudera.org:8080/#/c/21131/11/testdata/workloads/functional-query/queries/QueryTest/unicode-column-name.test@302 PS11, Line 302: show tables; I don't think it's needed. Also, if another table is added later before this query, this will have to be changed as well. http://gerrit.cloudera.org:8080/#/c/21131/11/testdata/workloads/functional-query/queries/QueryTest/unicode-column-name.test@309 PS11, Line 309: insert into unicode_partition_values partition(p='运营业务数据') values (0); You don't need to put these inserts into separate QUERY segments. It could also go together with the select on L318. http://gerrit.cloudera.org:8080/#/c/21131/11/testdata/workloads/functional-query/queries/QueryTest/unicode-column-name.test@327 PS11, Line 327: drop table unicode_partition_values; You don't need to drop the table, the tables are created in a unique database, see test_column_unicode.py::TestColumnUnicode::test_create_unicode_table. http://gerrit.cloudera.org:8080/#/c/21131/11/testdata/workloads/functional-query/queries/QueryTest/unicode-column-name.test@337 PS11, Line 337: show tables; I don't think this is needed. http://gerrit.cloudera.org:8080/#/c/21131/11/testdata/workloads/functional-query/queries/QueryTest/unicode-column-name.test@344 PS11, Line 344: insert into unicode_partition_values_iceberg values (0, '运'); You should do the same insertions into the Iceberg table as into the other one. http://gerrit.cloudera.org:8080/#/c/21131/11/testdata/workloads/functional-query/queries/QueryTest/unicode-column-name.test@354 PS11, Line 354: drop table
[Impala-ASF-CR] IMPALA-13035: Querying metadata tables from non-Iceberg tables throws IllegalArgumentException
Daniel Becker has uploaded a new patch set (#4). ( http://gerrit.cloudera.org:8080/21361 ) Change subject: IMPALA-13035: Querying metadata tables from non-Iceberg tables throws IllegalArgumentException .. IMPALA-13035: Querying metadata tables from non-Iceberg tables throws IllegalArgumentException When attempting to query a metadata table of a non-Iceberg table the analyzer throws 'IllegalArgumentException'. The problem is that 'IcebergMetadataTable.isIcebergMetadataTable()' doesn't actually check whether the given path belongs to a valid metadata table, it only checks whether the path could syntactically refer to one. This is because it is called in 'Path.getCandidateTables()', at which point analysis has not been done yet. However, 'IcebergMetadataTable.isIcebergMetadataTable()' is also called in 'Analyzer.getTable()'. If 'isIcebergMetadataTable()' returns true, 'getTable()' tries to instantiate an 'IcebergMetadataTable' object with the table ref of the base table. If that table is not an Iceberg table, a precondition check fails. This change renames 'isIcebergMetadataTable()' to 'canBeIcebergMetadataTable()' and adds a new 'isIcebergMetadataTable()' function, which also takes an 'Analyzer' as a parameter. With the help of the 'Analyzer' it is possible to determine whether the base table is an Iceberg table. 'Analyzer.getTable()' then uses this new 'isIcebergMetadataTable()' function instead of canBeIcebergMetadataTable(). The constructor of 'IcebergMetadataTable' is also modified to take an 'FeIcebergTable' as the parameter for the base table instead of a general 'FeTable'. Testing: - Added a test query in iceberg-metadata-tables.test. Change-Id: Ia7c25ed85a8813011537c73f0aaf72db1501f9ef --- M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/Path.java M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergMetadataTable.java M fe/src/main/java/org/apache/impala/service/DescribeResultFactory.java M fe/src/main/java/org/apache/impala/service/Frontend.java M testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test 6 files changed, 52 insertions(+), 16 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/61/21361/4 -- To view, visit http://gerrit.cloudera.org:8080/21361 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia7c25ed85a8813011537c73f0aaf72db1501f9ef Gerrit-Change-Number: 21361 Gerrit-PatchSet: 4 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Peter Rozsa
[Impala-ASF-CR] IMPALA-13035: Querying metadata tables from non-Iceberg tables throws IllegalArgumentException
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/21361 ) Change subject: IMPALA-13035: Querying metadata tables from non-Iceberg tables throws IllegalArgumentException .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/21361/2/fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergMetadataTable.java File fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergMetadataTable.java: http://gerrit.cloudera.org:8080/#/c/21361/2/fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergMetadataTable.java@148 PS2, Line 148: // If the metadata table has already been analyzed in the query, the table cache will > If a metadata table is analyzed before and it's referenced in a different p Thanks, I added a comment for it. -- To view, visit http://gerrit.cloudera.org:8080/21361 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia7c25ed85a8813011537c73f0aaf72db1501f9ef Gerrit-Change-Number: 21361 Gerrit-PatchSet: 3 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Peter Rozsa Gerrit-Comment-Date: Mon, 29 Apr 2024 11:23:39 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-13035: Querying metadata tables from non-Iceberg tables throws IllegalArgumentException
Daniel Becker has uploaded this change for review. ( http://gerrit.cloudera.org:8080/21361 Change subject: IMPALA-13035: Querying metadata tables from non-Iceberg tables throws IllegalArgumentException .. IMPALA-13035: Querying metadata tables from non-Iceberg tables throws IllegalArgumentException When attempting to query a metadata table of a non-Iceberg table the analyzer throws 'IllegalArgumentException'. The problem is that 'IcebergMetadataTable.isIcebergMetadataTable()' doesn't actually check whether the given path belongs to a valid metadata table, it only checks whether the path could syntactically refer to one. This is because it is called in 'Path.getCandidateTables()', at which point analysis has not been done yet. However, 'IcebergMetadataTable.isIcebergMetadataTable()' is also called in 'Analyzer.getTable()'. If 'isIcebergMetadataTable()' returns true, 'getTable()' tries to instantiate an 'IcebergMetadataTable' object with the table ref of the base table. If that table is not an Iceberg table, a precondition check fails. This change renames 'isIcebergMetadataTable()' to 'canBeIcebergMetadataTable()' and adds a new 'isIcebergMetadataTable()' function, which also takes an 'Analyzer' as a parameter. With the help of the 'Analyzer' it is possible to determine whether the base table is an Iceberg table. 'Analyzer.getTable()' then uses this new 'isIcebergMetadataTable()' function instead of canBeIcebergMetadataTable(). The constructor of 'IcebergMetadataTable' is also modified to take an 'FeIcebergTable' as the parameter for the base table instead of a general 'FeTable'. Testing: - Added a test query in iceberg-metadata-tables.test. Change-Id: Ia7c25ed85a8813011537c73f0aaf72db1501f9ef --- M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/Path.java M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergMetadataTable.java M fe/src/main/java/org/apache/impala/service/DescribeResultFactory.java M fe/src/main/java/org/apache/impala/service/Frontend.java M testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test 6 files changed, 50 insertions(+), 16 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/61/21361/3 -- To view, visit http://gerrit.cloudera.org:8080/21361 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ia7c25ed85a8813011537c73f0aaf72db1501f9ef Gerrit-Change-Number: 21361 Gerrit-PatchSet: 3 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Peter Rozsa
[Impala-ASF-CR] IMPALA-12973,IMPALA-11491,IMPALA-12651: Support BINARY nested in complex types in select list
Daniel Becker has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/21269 ) Change subject: IMPALA-12973,IMPALA-11491,IMPALA-12651: Support BINARY nested in complex types in select list .. IMPALA-12973,IMPALA-11491,IMPALA-12651: Support BINARY nested in complex types in select list Binary fields in complex types are currently not supported at all for regular tables (an error is returned). For Iceberg metadata tables, IMPALA-12899 added a temporary workaround to allow queries that contain these fields to succeed by NULLing them out. This change adds support for displaying them with base64 encoding for both regular and Iceberg metadata tables. Complex types are displayed in JSON format, so simply inserting the bytes of the binary fields is not acceptable as it would produce invalid JSON. Base64 is a widely used encoding that allows representing arbitrary binary information using only a limited set of ASCII characters. This change also adds support for top level binary columns in Iceberg metadata tables. However, these are not base64 encoded but are returned in raw byte format - this is consistent with how top level binary columns from regular (non-metadata) tables are handled. Testing: - added test queries in iceberg-metadata-tables.test referencing both nested and top level binary fields; also updated existing queries - moved relevant tests (queries extracting binary fields from within complex types) from nested-types-scanner-basic.test to a new binary-in-complex-type.test file and also added a query that selects the containing complex types; this new test file is run from test_scanners.py::TestBinaryInComplexType::\ test_binary_in_complex_type - moved negative tests in AnalyzerTest.TestUnsupportedTypes() to AnalyzeStmtsTest.TestComplexTypesInSelectList() and converted them to positive tests (expecting success); a negative test already in AnalyzeStmtsTest.TestComplexTypesInSelectList() was also converted Change-Id: I7b1d7fa332a901f05a46e0199e13fb841d2687c2 Reviewed-on: http://gerrit.cloudera.org:8080/21269 Tested-by: Impala Public Jenkins Reviewed-by: Csaba Ringhofer --- M be/src/exec/iceberg-metadata/iceberg-metadata-scanner.cc M be/src/exec/iceberg-metadata/iceberg-metadata-scanner.h M be/src/exec/iceberg-metadata/iceberg-row-reader.cc M be/src/exec/iceberg-metadata/iceberg-row-reader.h M be/src/rpc/jni-thrift-util.h M be/src/runtime/complex-value-writer.inline.h M be/src/util/jni-util.cc M be/src/util/jni-util.h M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/SlotRef.java M fe/src/main/java/org/apache/impala/util/IcebergMetadataScanner.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java M testdata/data/README A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/data/0-0-data-danielbecker_20240408174043_c3737eaf-db30-4b88-aafb-f23c0f3c1dd3-job_17125053806420_0002-1-1.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/64da0e56-efa3-4025-bef1-1047fdd9a2b0-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/snap-3079551887386250470-1-64da0e56-efa3-4025-bef1-1047fdd9a2b0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/v1.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/v2.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/version-hint.txt M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv A testdata/workloads/functional-query/queries/QueryTest/binary-in-complex-type.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test M testdata/workloads/functional-query/queries/QueryTest/nested-types-scanner-basic.test M tests/query_test/test_scanners.py 26 files changed, 441 insertions(+), 157 deletions(-) Approvals: Impala Public Jenkins: Verified Csaba Ringhofer: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/21269 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I7b1d7fa332a901f05a46e0199e13fb841d2687c2 Gerrit-Change-Number: 21269 Gerrit-PatchSet: 12 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs
[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/21131 ) Change subject: IMPALA-11499: Refactor UrlEncode function to handle special characters .. Patch Set 6: (13 comments) http://gerrit.cloudera.org:8080/#/c/21131/6//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21131/6//COMMIT_MSG@7 PS6, Line 7: IMPALA-11499: Refactor UrlEncode function to handle special We don't usually break the title into two lines, and it is not too long either. http://gerrit.cloudera.org:8080/#/c/21131/6//COMMIT_MSG@12 PS6, Line 12: specialCharacterMap Nit: surround with quotes ('specialCharacterMap'). http://gerrit.cloudera.org:8080/#/c/21131/6//COMMIT_MSG@18 PS6, Line 18: Additionally, the function clears the output string '*out' was assigned to also before this change, so its old contents were discarded. This sentence could be removed. http://gerrit.cloudera.org:8080/#/c/21131/3/be/src/util/coding-util.cc File be/src/util/coding-util.cc: http://gerrit.cloudera.org:8080/#/c/21131/3/be/src/util/coding-util.cc@48 PS3, Line 48: {'#', "%23"}, > Should be const. This comment has not been addressed. http://gerrit.cloudera.org:8080/#/c/21131/6/be/src/util/coding-util.cc File be/src/util/coding-util.cc: http://gerrit.cloudera.org:8080/#/c/21131/6/be/src/util/coding-util.cc@23 PS6, Line 23: #include I don't think we need iostream. http://gerrit.cloudera.org:8080/#/c/21131/6/be/src/util/coding-util.cc@46 PS6, Line 46: std::unordered_map Could also be static. Alternatively, we could put all these static functions and variables into an anonymous namespace, which is a more modern way of ensuring that these are not visible outside this translation unit. http://gerrit.cloudera.org:8080/#/c/21131/6/be/src/util/coding-util.cc@55 PS6, Line 55: '\xFF The problem in the old code was that "\u00FF" translates to two bytes: [194,191]. Note that the notation '\u' is Java-specific. The process is broken because the code takes the input bytes one by one. The hanzi character '?', in UTF-8, is [232, 191, 144]. Its second character, 191, matched with the second character of "\u00FF" and that caused the problem. I don't know why '\xFF' was included in the first place. If we'd like to handle UTF-8 only, then '\xFF' is not a valid byte. If we'd like to handle non-UTF-8 bytes, then we should handle all other bytes > 127. After some digging in Hive, I found this commit that expanded the list of escaped characters: https://github.com/apache/hive/commit/32b046bf93e3d041b2bb07a1c9b4e1ef5d977ddf The list before that commit is very similar to what we have here in the old code, except they have '\u007F' instead of '\u00FF'. That makes more sense as that is a control character for DELETE, see https://www.compart.com/en/unicode/U+007F. If we remove '\xFF', we can see that the escaped values in the map are the same as how we HEX encode values on L85, so there is no need for these values to be included in a map. Instead, we could have a set (std::unordered_set) with the keys in the current map (without '\xFF' but including '\x7F'), and we would encode input characters if they are in the set (taking into account hive_compat of course). That would also simplify the code. http://gerrit.cloudera.org:8080/#/c/21131/6/be/src/util/coding-util.cc@66 PS6, Line 66: (*out).clear(); No need to clear it if it is assigned to at the end of the function. http://gerrit.cloudera.org:8080/#/c/21131/6/be/src/util/coding-util.cc@69 PS6, Line 69: for (int i = 0; i < in_len; ++i) { Could use a range-based for-loop: for (char ch : in) {...}. http://gerrit.cloudera.org:8080/#/c/21131/6/be/src/util/coding-util.cc@74 PS6, Line 74: is not : // alphanumeric or one of the commonly excluded characters Could be misleading: does "not" also apply to "one of the commonly..."? If not, we could write for example "is not alphanumeric or is one of the commonly excluded ...". http://gerrit.cloudera.org:8080/#/c/21131/6/be/src/util/coding-util.cc@80 PS6, Line 80: !(std::isalnum(ch) || ShouldNotEscape(ch)) This is easier to understand if converted to a form without parentheses: !std::isalnum(ch) && !ShouldNotEscape(ch) http://gerrit.cloudera.org:8080/#/c/21131/6/be/src/util/coding-util.cc@85 PS6, Line 85: std::uppercase << std::hex We could put std::uppercase and std::hex after L68, before the loop, as these settings don't get reset automatically, so no need to set them every time. On the other hand, I would include a comment stating that this only applies to when integer values are inserted and not when char values, therefore it doesn't cause a problem when not escaping characters (like on L88). http://gerrit.cloudera.org:8080/#/c/21131/3/testdata/workloads/functional-query/queries/QueryTest/unicode-column-name.test File testdata/workloads/functional-query/queries/QueryTest/unicode-column-name.test:
[Impala-ASF-CR] IMPALA-12973,IMPALA-11491,IMPALA-12651: Support BINARY nested in complex types in select list
Daniel Becker has uploaded a new patch set (#11). ( http://gerrit.cloudera.org:8080/21269 ) Change subject: IMPALA-12973,IMPALA-11491,IMPALA-12651: Support BINARY nested in complex types in select list .. IMPALA-12973,IMPALA-11491,IMPALA-12651: Support BINARY nested in complex types in select list Binary fields in complex types are currently not supported at all for regular tables (an error is returned). For Iceberg metadata tables, IMPALA-12899 added a temporary workaround to allow queries that contain these fields to succeed by NULLing them out. This change adds support for displaying them with base64 encoding for both regular and Iceberg metadata tables. Complex types are displayed in JSON format, so simply inserting the bytes of the binary fields is not acceptable as it would produce invalid JSON. Base64 is a widely used encoding that allows representing arbitrary binary information using only a limited set of ASCII characters. This change also adds support for top level binary columns in Iceberg metadata tables. However, these are not base64 encoded but are returned in raw byte format - this is consistent with how top level binary columns from regular (non-metadata) tables are handled. Testing: - added test queries in iceberg-metadata-tables.test referencing both nested and top level binary fields; also updated existing queries - moved relevant tests (queries extracting binary fields from within complex types) from nested-types-scanner-basic.test to a new binary-in-complex-type.test file and also added a query that selects the containing complex types; this new test file is run from test_scanners.py::TestBinaryInComplexType::\ test_binary_in_complex_type - moved negative tests in AnalyzerTest.TestUnsupportedTypes() to AnalyzeStmtsTest.TestComplexTypesInSelectList() and converted them to positive tests (expecting success); a negative test already in AnalyzeStmtsTest.TestComplexTypesInSelectList() was also converted Change-Id: I7b1d7fa332a901f05a46e0199e13fb841d2687c2 --- M be/src/exec/iceberg-metadata/iceberg-metadata-scanner.cc M be/src/exec/iceberg-metadata/iceberg-metadata-scanner.h M be/src/exec/iceberg-metadata/iceberg-row-reader.cc M be/src/exec/iceberg-metadata/iceberg-row-reader.h M be/src/rpc/jni-thrift-util.h M be/src/runtime/complex-value-writer.inline.h M be/src/util/jni-util.cc M be/src/util/jni-util.h M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/SlotRef.java M fe/src/main/java/org/apache/impala/util/IcebergMetadataScanner.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java M testdata/data/README A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/data/0-0-data-danielbecker_20240408174043_c3737eaf-db30-4b88-aafb-f23c0f3c1dd3-job_17125053806420_0002-1-1.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/64da0e56-efa3-4025-bef1-1047fdd9a2b0-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/snap-3079551887386250470-1-64da0e56-efa3-4025-bef1-1047fdd9a2b0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/v1.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/v2.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/version-hint.txt M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv A testdata/workloads/functional-query/queries/QueryTest/binary-in-complex-type.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test M testdata/workloads/functional-query/queries/QueryTest/nested-types-scanner-basic.test M tests/query_test/test_scanners.py 26 files changed, 441 insertions(+), 157 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/21269/11 -- To view, visit http://gerrit.cloudera.org:8080/21269 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7b1d7fa332a901f05a46e0199e13fb841d2687c2 Gerrit-Change-Number: 21269 Gerrit-PatchSet: 11 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs
[Impala-ASF-CR] IMPALA-13029: Tests for multi format equality deletes
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/21348 ) Change subject: IMPALA-13029: Tests for multi format equality deletes .. Patch Set 2: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/21348 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7f0ebf7f4d401877741eb3e1c990f1318ac2b4ba Gerrit-Change-Number: 21348 Gerrit-PatchSet: 2 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 24 Apr 2024 15:32:02 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12973,IMPALA-11491,IMPALA-12651: Support BINARY nested in complex types in select list
Daniel Becker has uploaded a new patch set (#10). ( http://gerrit.cloudera.org:8080/21269 ) Change subject: IMPALA-12973,IMPALA-11491,IMPALA-12651: Support BINARY nested in complex types in select list .. IMPALA-12973,IMPALA-11491,IMPALA-12651: Support BINARY nested in complex types in select list Binary fields in complex types are currently not supported at all for regular tables (an error is returned). For Iceberg metadata tables, IMPALA-12899 added a temporary workaround to allow queries that contain these fields to succeed by NULLing them out. This change adds support for displaying them with base64 encoding for both regular and Iceberg metadata tables. Complex types are displayed in JSON format, so simply inserting the bytes of the binary fields is not acceptable as it would produce invalid JSON. Base64 is a widely used encoding that allows representing arbitrary binary information using only a limited set of ASCII characters. This change also adds support for top level binary columns in Iceberg metadata tables. However, these are not base64 encoded but are returned in raw byte format - this is consistent with how top level binary columns from regular (non-metadata) tables are handled. Testing: - added test queries in iceberg-metadata-tables.test referencing both nested and top level binary fields; also updated existing queries - moved relevant tests (queries extracting binary fields from within complex types) from nested-types-scanner-basic.test to a new binary-in-complex-type.test file and also added a query that selects the containing complex types; this new test file is run from test_scanners.py::TestBinaryInComplexType::\ test_binary_in_complex_type - moved negative tests in AnalyzerTest.TestUnsupportedTypes() to AnalyzeStmtsTest.TestComplexTypesInSelectList() and converted them to positive tests (expecting success); a negative test already in AnalyzeStmtsTest.TestComplexTypesInSelectList() was also converted Change-Id: I7b1d7fa332a901f05a46e0199e13fb841d2687c2 --- M be/src/exec/iceberg-metadata/iceberg-metadata-scanner.cc M be/src/exec/iceberg-metadata/iceberg-metadata-scanner.h M be/src/exec/iceberg-metadata/iceberg-row-reader.cc M be/src/exec/iceberg-metadata/iceberg-row-reader.h M be/src/rpc/jni-thrift-util.h M be/src/runtime/complex-value-writer.inline.h M be/src/util/jni-util.cc M be/src/util/jni-util.h M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/SlotRef.java M fe/src/main/java/org/apache/impala/util/IcebergMetadataScanner.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java M testdata/data/README A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/data/0-0-data-danielbecker_20240408174043_c3737eaf-db30-4b88-aafb-f23c0f3c1dd3-job_17125053806420_0002-1-1.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/64da0e56-efa3-4025-bef1-1047fdd9a2b0-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/snap-3079551887386250470-1-64da0e56-efa3-4025-bef1-1047fdd9a2b0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/v1.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/v2.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/version-hint.txt M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv A testdata/workloads/functional-query/queries/QueryTest/binary-in-complex-type.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test M testdata/workloads/functional-query/queries/QueryTest/nested-types-scanner-basic.test M tests/query_test/test_scanners.py 26 files changed, 441 insertions(+), 157 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/21269/10 -- To view, visit http://gerrit.cloudera.org:8080/21269 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7b1d7fa332a901f05a46e0199e13fb841d2687c2 Gerrit-Change-Number: 21269 Gerrit-PatchSet: 10 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs
[Impala-ASF-CR] IMPALA-12950: Improve error message in case of out-of-range numeric conversions
Daniel Becker has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/21331 ) Change subject: IMPALA-12950: Improve error message in case of out-of-range numeric conversions .. IMPALA-12950: Improve error message in case of out-of-range numeric conversions IMPALA-12035 introduced checks for numeric conversions that are unsafe and can fail (if the target type cannot store the value, the behaviour is undefined): - from floating-point types to integer types - from double to float However, it can be difficult to trace which part of the query caused this based on the error message. This change adds the source type, the destination type and the value to be converted to the error message. Unfortunately, at this point in the BE, the original SQL is not available, so we cannot reference that. Testing: - extended existing tests in expr-test.cc. Change-Id: Ieeed52e25f155818c35c11a8a6821708476ffb32 --- M be/src/exprs/cast-functions-ir.cc M be/src/exprs/expr-test.cc M be/src/udf/udf.h 3 files changed, 85 insertions(+), 24 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/31/21331/3 -- To view, visit http://gerrit.cloudera.org:8080/21331 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ieeed52e25f155818c35c11a8a6821708476ffb32 Gerrit-Change-Number: 21331 Gerrit-PatchSet: 3 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Peter Rozsa
[Impala-ASF-CR] IMPALA-12950: Improve error message in case of out-of-range numeric conversions
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/21331 ) Change subject: IMPALA-12950: Improve error message in case of out-of-range numeric conversions .. Patch Set 3: (4 comments) http://gerrit.cloudera.org:8080/#/c/21331/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21331/2//COMMIT_MSG@7 PS2, Line 7: : > nit: missing whitespace Done http://gerrit.cloudera.org:8080/#/c/21331/2//COMMIT_MSG@12 PS2, Line 12: floating-point > nit: floating-point Done http://gerrit.cloudera.org:8080/#/c/21331/2/be/src/exprs/cast-functions-ir.cc File be/src/exprs/cast-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/21331/2/be/src/exprs/cast-functions-ir.cc@76 PS2, Line 76: } else { > The default case is missing, it could be added as "UNKNOWN TYPE" or somethi It's a good point. These are the only types we'd like to cover here, so originally I wanted to add static_assert(false); but it doesn't compile. I could write static_assert(!std::is_same_v), which is always false, however this link suggests even that may be ill-formed: https://stackoverflow.com/questions/38304847/constexpr-if-and-static-assert On the other hand, this latter approach seems to work in practice, but I'm not sure we should do that if it's not guaranteed. Leaving out the default case deterministically leads to a warning about no return statement so I did it like this. Maybe adding a DCHECK would be best, but I don't like that there doesn't seem to be a clean and concise way of doing it compile time without repeating the types (e.g. static_assert that T is one of the types, or some SFINAE magic). In the new patch set I went with static_assert(!std::is_same_v), it correctly fires when I try to instantiate the template with a different type, but I don't know if it's guaranteed or not. What do you think? http://gerrit.cloudera.org:8080/#/c/21331/2/be/src/exprs/cast-functions-ir.cc@182 PS2, Line 182: constexpr const char* FROM_TYPE_NAME = TypeToName(); > Could you please add test cases to cover each condition? Extended existing tests and added some new ones in expr-test.cc. -- To view, visit http://gerrit.cloudera.org:8080/21331 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ieeed52e25f155818c35c11a8a6821708476ffb32 Gerrit-Change-Number: 21331 Gerrit-PatchSet: 3 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Peter Rozsa Gerrit-Comment-Date: Tue, 23 Apr 2024 13:03:07 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12973,IMPALA-11491,IMPALA-12651: Support BINARY nested in complex types in select list
Daniel Becker has uploaded a new patch set (#8). ( http://gerrit.cloudera.org:8080/21269 ) Change subject: IMPALA-12973,IMPALA-11491,IMPALA-12651: Support BINARY nested in complex types in select list .. IMPALA-12973,IMPALA-11491,IMPALA-12651: Support BINARY nested in complex types in select list Binary fields in complex types are currently not supported at all for regular tables (an error is returned). For Iceberg metadata tables, IMPALA-12899 added a temporary workaround to allow queries that contain these fields to succeed by NULLing them out. This change adds support for displaying them with base64 encoding for both regular and Iceberg metadata tables. Complex types are displayed in JSON format, so simply inserting the bytes of the binary fields is not acceptable as it would produce invalid JSON. Base64 is a widely used encoding that allows representing arbitrary binary information using only a limited set of ASCII characters. This change also adds support for top level binary columns in Iceberg metadata tables. However, these are not base64 encoded but are returned in raw byte format - this is consistent with how top level binary columns from regular (non-metadata) tables are handled. Testing: - added test queries in iceberg-metadata-tables.test referencing both nested and top level binary fields; also updated existing queries - moved relevant tests (queries extracting binary fields from within complex types) from nested-types-scanner-basic.test to a new binary-in-complex-type.test file and also added a query that selects the containing complex types; this new test file is run from test_scanners.py::TestBinaryInComplexType::\ test_binary_in_complex_type - moved negative tests in AnalyzerTest.TestUnsupportedTypes() to AnalyzeStmtsTest.TestComplexTypesInSelectList() and converted them to positive tests (expecting success); a negative test already in AnalyzeStmtsTest.TestComplexTypesInSelectList() was also converted Change-Id: I7b1d7fa332a901f05a46e0199e13fb841d2687c2 --- M be/src/exec/iceberg-metadata/iceberg-metadata-scanner.cc M be/src/exec/iceberg-metadata/iceberg-metadata-scanner.h M be/src/exec/iceberg-metadata/iceberg-row-reader.cc M be/src/exec/iceberg-metadata/iceberg-row-reader.h M be/src/rpc/jni-thrift-util.h M be/src/runtime/complex-value-writer.inline.h M be/src/util/jni-util.cc M be/src/util/jni-util.h M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/SlotRef.java M fe/src/main/java/org/apache/impala/util/IcebergMetadataScanner.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java M testdata/data/README A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/data/0-0-data-danielbecker_20240408174043_c3737eaf-db30-4b88-aafb-f23c0f3c1dd3-job_17125053806420_0002-1-1.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/64da0e56-efa3-4025-bef1-1047fdd9a2b0-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/snap-3079551887386250470-1-64da0e56-efa3-4025-bef1-1047fdd9a2b0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/v1.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/v2.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/version-hint.txt M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv A testdata/workloads/functional-query/queries/QueryTest/binary-in-complex-type.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test M testdata/workloads/functional-query/queries/QueryTest/nested-types-scanner-basic.test M tests/query_test/test_scanners.py 26 files changed, 439 insertions(+), 155 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/21269/8 -- To view, visit http://gerrit.cloudera.org:8080/21269 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7b1d7fa332a901f05a46e0199e13fb841d2687c2 Gerrit-Change-Number: 21269 Gerrit-PatchSet: 8 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs
[Impala-ASF-CR] IMPALA-12973,IMPALA-11491,IMPALA-12651: Support BINARY nested in complex types in select list
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/21269 ) Change subject: IMPALA-12973,IMPALA-11491,IMPALA-12651: Support BINARY nested in complex types in select list .. Patch Set 7: (2 comments) Thanks for the review, Gábor! http://gerrit.cloudera.org:8080/#/c/21269/7/be/src/exec/iceberg-metadata/iceberg-row-reader.cc File be/src/exec/iceberg-metadata/iceberg-row-reader.cc: http://gerrit.cloudera.org:8080/#/c/21269/7/be/src/exec/iceberg-metadata/iceberg-row-reader.cc@227 PS7, Line 227: RETURN_IF_ERROR(GuardType::create(env, jbuffer, _guard)); > if create() returned an error here we'd leak memory because of 'jbuffer', r No, the type of 'jbuffer' is jbyteArray or jstring, and they are local JNI object references which will be freed by the JVM. Only the buffer returned by env->GetByteArrayElements() or env->GetStringUTFChars() (called within create()) has to be freed manually, that's what 'jbuffer_guard' takes care of. http://gerrit.cloudera.org:8080/#/c/21269/7/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test File testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test: http://gerrit.cloudera.org:8080/#/c/21269/7/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test@804 PS7, Line 804: select data_file from functional_parquet.iceberg_query_metadata.entries; > The query should go into the QUERY section Thanks for catching it. -- To view, visit http://gerrit.cloudera.org:8080/21269 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7b1d7fa332a901f05a46e0199e13fb841d2687c2 Gerrit-Change-Number: 21269 Gerrit-PatchSet: 7 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Comment-Date: Mon, 22 Apr 2024 14:26:29 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-13000: Document OPTIMIZE TABLE
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/21320 ) Change subject: IMPALA-13000: Document OPTIMIZE TABLE .. Patch Set 3: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/21320 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I851669686ed4da610dcac97c9b88ff23b0a4a647 Gerrit-Change-Number: 21320 Gerrit-PatchSet: 3 Gerrit-Owner: Noemi Pap-Takacs Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 22 Apr 2024 10:40:35 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12950:Improve error message in case of out-of-range numeric conversions
Daniel Becker has uploaded this change for review. ( http://gerrit.cloudera.org:8080/21331 Change subject: IMPALA-12950:Improve error message in case of out-of-range numeric conversions .. IMPALA-12950:Improve error message in case of out-of-range numeric conversions IMPALA-12035 introduced checks for numeric conversions that are unsafe and can fail (if the target type cannot store the value, the behaviour is undefined): - from floating point types to integer types - from double to float However, it can be difficult to trace which part of the query caused this based on the error message. This change adds the source type, the destination type and the value to be converted to the error message. Unfortunately, at this point in the BE, the original SQL is not available, so we cannot reference that. Change-Id: Ieeed52e25f155818c35c11a8a6821708476ffb32 --- M be/src/exprs/cast-functions-ir.cc M be/src/udf/udf.h 2 files changed, 35 insertions(+), 8 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/31/21331/2 -- To view, visit http://gerrit.cloudera.org:8080/21331 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ieeed52e25f155818c35c11a8a6821708476ffb32 Gerrit-Change-Number: 21331 Gerrit-PatchSet: 2 Gerrit-Owner: Daniel Becker
[Impala-ASF-CR] IMPALA-13004: Fix heap-use-after-free error in ExprTest AiFunctionsTest
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/21315 ) Change subject: IMPALA-13004: Fix heap-use-after-free error in ExprTest AiFunctionsTest .. Patch Set 3: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/21315 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3bb9dcf9d72cce7ad37d5bc25821cf6ee55a8ab5 Gerrit-Change-Number: 21315 Gerrit-PatchSet: 3 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Thu, 18 Apr 2024 13:53:35 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-13000: Document OPTIMIZE TABLE
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/21320 ) Change subject: IMPALA-13000: Document OPTIMIZE TABLE .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/21320/1/docs/topics/impala_iceberg.xml File docs/topics/impala_iceberg.xml: http://gerrit.cloudera.org:8080/#/c/21320/1/docs/topics/impala_iceberg.xml@556 PS1, Line 556: able_na > [] is quite standard notation, and we are using it extensively in the Impal I'm also okay with leaving [db_name]. I think a separate top-level page or even just a paragraph showing the proper syntax would be even better. http://gerrit.cloudera.org:8080/#/c/21320/2/docs/topics/impala_iceberg.xml File docs/topics/impala_iceberg.xml: http://gerrit.cloudera.org:8080/#/c/21320/2/docs/topics/impala_iceberg.xml@566 PS2, Line 566: using If you want to make it even clearer that all files are rewritten (not just the ones with the latest schema), you could write "rewrite all files, converting them (if necessary) to the latest table schema". I'm not sure it's needed, I'm also okay with the current wording. -- To view, visit http://gerrit.cloudera.org:8080/21320 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I851669686ed4da610dcac97c9b88ff23b0a4a647 Gerrit-Change-Number: 21320 Gerrit-PatchSet: 2 Gerrit-Owner: Noemi Pap-Takacs Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 18 Apr 2024 13:20:58 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12973,IMPALA-11491,IMPALA-12651: Support BINARY nested in complex types in select list
Daniel Becker has uploaded a new patch set (#7). ( http://gerrit.cloudera.org:8080/21269 ) Change subject: IMPALA-12973,IMPALA-11491,IMPALA-12651: Support BINARY nested in complex types in select list .. IMPALA-12973,IMPALA-11491,IMPALA-12651: Support BINARY nested in complex types in select list Binary fields in complex types are currently not supported at all for regular tables (an error is returned). For Iceberg metadata tables, IMPALA-12899 added a temporary workaround to allow queries that contain these fields to succeed by NULLing them out. This change adds support for displaying them with base64 encoding for both regular and Iceberg metadata tables. Complex types are displayed in JSON format, so simply inserting the bytes of the binary fields is not acceptable as it would produce invalid JSON. Base64 is a widely used encoding that allows representing arbitrary binary information using only a limited set of ASCII characters. This change also adds support for top level binary columns in Iceberg metadata tables. However, these are not base64 encoded but are returned in raw byte format - this is consistent with how top level binary columns from regular (non-metadata) tables are handled. Testing: - added test queries in iceberg-metadata-tables.test referencing both nested and top level binary fields; also updated existing queries - moved relevant tests (queries extracting binary fields from within complex types) from nested-types-scanner-basic.test to a new binary-in-complex-type.test file and also added a query that selects the containing complex types; this new test file is run from test_scanners.py::TestBinaryInComplexType::\ test_binary_in_complex_type - moved negative tests in AnalyzerTest.TestUnsupportedTypes() to AnalyzeStmtsTest.TestComplexTypesInSelectList() and converted them to positive tests (expecting success); a negative test already in AnalyzeStmtsTest.TestComplexTypesInSelectList() was also converted Change-Id: I7b1d7fa332a901f05a46e0199e13fb841d2687c2 --- M be/src/exec/iceberg-metadata/iceberg-metadata-scanner.cc M be/src/exec/iceberg-metadata/iceberg-metadata-scanner.h M be/src/exec/iceberg-metadata/iceberg-row-reader.cc M be/src/exec/iceberg-metadata/iceberg-row-reader.h M be/src/rpc/jni-thrift-util.h M be/src/runtime/complex-value-writer.inline.h M be/src/util/jni-util.cc M be/src/util/jni-util.h M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/SlotRef.java M fe/src/main/java/org/apache/impala/util/IcebergMetadataScanner.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java M testdata/data/README A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/data/0-0-data-danielbecker_20240408174043_c3737eaf-db30-4b88-aafb-f23c0f3c1dd3-job_17125053806420_0002-1-1.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/64da0e56-efa3-4025-bef1-1047fdd9a2b0-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/snap-3079551887386250470-1-64da0e56-efa3-4025-bef1-1047fdd9a2b0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/v1.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/v2.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/version-hint.txt M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv A testdata/workloads/functional-query/queries/QueryTest/binary-in-complex-type.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test M testdata/workloads/functional-query/queries/QueryTest/nested-types-scanner-basic.test M tests/query_test/test_scanners.py 26 files changed, 439 insertions(+), 155 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/21269/7 -- To view, visit http://gerrit.cloudera.org:8080/21269 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7b1d7fa332a901f05a46e0199e13fb841d2687c2 Gerrit-Change-Number: 21269 Gerrit-PatchSet: 7 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs
[Impala-ASF-CR] IMPALA-13004: Fix heap-use-after-free error in ExprTest AiFunctionsTest
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/21315 ) Change subject: IMPALA-13004: Fix heap-use-after-free error in ExprTest AiFunctionsTest .. Patch Set 2: Code-Review+1 (2 comments) Thanks, just some nits. http://gerrit.cloudera.org:8080/#/c/21315/2/be/src/exprs/ai-functions.inline.h File be/src/exprs/ai-functions.inline.h: http://gerrit.cloudera.org:8080/#/c/21315/2/be/src/exprs/ai-functions.inline.h@108 PS2, Line 108: move 'Move' is good from the context of the change, but if someone is reading the new code it's a bit strange. I think "place" or "put" would be better. http://gerrit.cloudera.org:8080/#/c/21315/2/be/src/exprs/ai-functions.inline.h@108 PS2, Line 108: loop Nit: it is not a loop, I wrote it wrong in my comment. "'if' statement" would be better. -- To view, visit http://gerrit.cloudera.org:8080/21315 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3bb9dcf9d72cce7ad37d5bc25821cf6ee55a8ab5 Gerrit-Change-Number: 21315 Gerrit-PatchSet: 2 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Thu, 18 Apr 2024 12:54:05 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-13004: Fix heap-use-after-free error in ExprTest AiFunctionsTest
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/21315 ) Change subject: IMPALA-13004: Fix heap-use-after-free error in ExprTest AiFunctionsTest .. Patch Set 1: (2 comments) Thanks Yida, it looks good. http://gerrit.cloudera.org:8080/#/c/21315/1/be/src/exprs/ai-functions-ir.cc File be/src/exprs/ai-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/21315/1/be/src/exprs/ai-functions-ir.cc@88 PS1, Line 88: string Nit: we could keep the "std::" prefix, it is also used at L81 and L74. http://gerrit.cloudera.org:8080/#/c/21315/1/be/src/exprs/ai-functions.inline.h File be/src/exprs/ai-functions.inline.h: http://gerrit.cloudera.org:8080/#/c/21315/1/be/src/exprs/ai-functions.inline.h@107 PS1, Line 107: Document overrides; Is it intentional that 'overrides' is taken out of the loop? Is it because 'payload' references data that 'overrides' owns? -- To view, visit http://gerrit.cloudera.org:8080/21315 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3bb9dcf9d72cce7ad37d5bc25821cf6ee55a8ab5 Gerrit-Change-Number: 21315 Gerrit-PatchSet: 1 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Wed, 17 Apr 2024 15:24:19 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-13000: Document OPTIMIZE TABLE
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/21320 ) Change subject: IMPALA-13000: Document OPTIMIZE TABLE .. Patch Set 1: (5 comments) Thanks Noémi! http://gerrit.cloudera.org:8080/#/c/21320/1/docs/topics/impala_iceberg.xml File docs/topics/impala_iceberg.xml: http://gerrit.cloudera.org:8080/#/c/21320/1/docs/topics/impala_iceberg.xml@553 PS1, Line 553: many small, : delete It doesn't feel right for me, maybe "can write many small data files and/(as well as) delete files. http://gerrit.cloudera.org:8080/#/c/21320/1/docs/topics/impala_iceberg.xml@554 PS1, Line 554: update Do we have 'update files'? My understanding is that an update adds new data files and delete files. If this is correct, I don't think we should call them 'update files'. Of course we should mention that updates (as well as deletes and inserts) contribute to the problem. http://gerrit.cloudera.org:8080/#/c/21320/1/docs/topics/impala_iceberg.xml@556 PS1, Line 556: db_name Do we always have to use the fully qualified name? If not, it may not be important to include the database name here. http://gerrit.cloudera.org:8080/#/c/21320/1/docs/topics/impala_iceberg.xml@577 PS1, Line 577: Views cannot be optimized. Maybe take it out of the list into a separate paragraph because it doesn't fit well with the introduction of the list ("To execute table optimization:"). http://gerrit.cloudera.org:8080/#/c/21320/1/docs/topics/impala_iceberg.xml@585 PS1, Line 585: Do you think it would be worth to add a warning that because OPTIMIZE re-writes the whole table, it can take a long time, so it shouldn't be called too often? -- To view, visit http://gerrit.cloudera.org:8080/21320 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I851669686ed4da610dcac97c9b88ff23b0a4a647 Gerrit-Change-Number: 21320 Gerrit-PatchSet: 1 Gerrit-Owner: Noemi Pap-Takacs Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 17 Apr 2024 13:47:28 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-13008: test metadata tables failed in Ubuntu 20 build
Daniel Becker has uploaded this change for review. ( http://gerrit.cloudera.org:8080/21317 Change subject: IMPALA-13008: test_metadata_tables failed in Ubuntu 20 build .. IMPALA-13008: test_metadata_tables failed in Ubuntu 20 build TestIcebergV2Table::test_metadata_tables failed in Ubuntu 20 build in a release candidate because the file sizes in some queries didn't match the expected ones. As Impala writes its version into the Parquet files it writes, the file sizes can change with the release (especially as SNAPSHOT or RELEASE is part of the full version, and their lengths differ). This change updates the failing tests to take regexes for the file sizes instead of concrete values. Change-Id: Iad8fd0d9920034e7dbe6c605bed7579fbe3b5b1f --- M testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test 1 file changed, 7 insertions(+), 7 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/17/21317/1 -- To view, visit http://gerrit.cloudera.org:8080/21317 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Iad8fd0d9920034e7dbe6c605bed7579fbe3b5b1f Gerrit-Change-Number: 21317 Gerrit-PatchSet: 1 Gerrit-Owner: Daniel Becker
[Impala-ASF-CR] IMPALA-13002: Iceberg V2 tables with Avro delete files aren't read properly
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/21301 ) Change subject: IMPALA-13002: Iceberg V2 tables with Avro delete files aren't read properly .. Patch Set 1: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/21301 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iff13198991caf32c51cd9e0ace4454fd00216cf6 Gerrit-Change-Number: 21301 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Tue, 16 Apr 2024 09:48:05 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12973,IMPALA-11491,IMPALA-12651: Support BINARY nested in complex types in select list
Daniel Becker has uploaded a new patch set (#6). ( http://gerrit.cloudera.org:8080/21269 ) Change subject: IMPALA-12973,IMPALA-11491,IMPALA-12651: Support BINARY nested in complex types in select list .. IMPALA-12973,IMPALA-11491,IMPALA-12651: Support BINARY nested in complex types in select list Binary fields in complex types are currently not supported at all for regular tables (an error is returned). For Iceberg metadata tables, IMPALA-12899 added a temporary workaround to allow queries that contain these fields to succeed by NULLing them out. This change adds support for displaying them with base64 encoding for both regular and Iceberg metadata tables. Complex types are displayed in JSON format, so simply inserting the bytes of the binary fields is not acceptable as it would produce invalid JSON. Base64 is a widely used encoding that allows representing arbitrary binary information using only a limited set of ASCII characters. This change also adds support for top level binary columns in Iceberg metadata tables. However, these are not base64 encoded but are returned in raw byte format - this is consistent with how top level binary columns from regular (non-metadata) tables are handled. Testing: - added test queries in iceberg-metadata-tables.test referencing both nested and top level binary fields; also updated existing queries - moved relevant tests (queries extracting binary fields from within complex types) from nested-types-scanner-basic.test to a new binary-in-complex-type.test file and also added a query that selects the containing complex types; this new test file is run from test_scanners.py::TestBinaryInComplexType::\ test_binary_in_complex_type - moved negative tests in AnalyzerTest.TestUnsupportedTypes() to AnalyzeStmtsTest.TestComplexTypesInSelectList() and converted them to positive tests (expecting success); a negative test already in AnalyzeStmtsTest.TestComplexTypesInSelectList() was also converted Change-Id: I7b1d7fa332a901f05a46e0199e13fb841d2687c2 --- M be/src/exec/iceberg-metadata/iceberg-metadata-scanner.cc M be/src/exec/iceberg-metadata/iceberg-metadata-scanner.h M be/src/exec/iceberg-metadata/iceberg-row-reader.cc M be/src/exec/iceberg-metadata/iceberg-row-reader.h M be/src/rpc/jni-thrift-util.h M be/src/runtime/complex-value-writer.inline.h M be/src/util/jni-util.cc M be/src/util/jni-util.h M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/SlotRef.java M fe/src/main/java/org/apache/impala/util/IcebergMetadataScanner.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java M testdata/data/README A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/data/0-0-data-danielbecker_20240408174043_c3737eaf-db30-4b88-aafb-f23c0f3c1dd3-job_17125053806420_0002-1-1.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/64da0e56-efa3-4025-bef1-1047fdd9a2b0-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/snap-3079551887386250470-1-64da0e56-efa3-4025-bef1-1047fdd9a2b0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/v1.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/v2.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/version-hint.txt M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv A testdata/workloads/functional-query/queries/QueryTest/binary-in-complex-type.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test M testdata/workloads/functional-query/queries/QueryTest/nested-types-scanner-basic.test M tests/query_test/test_scanners.py 26 files changed, 439 insertions(+), 154 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/21269/6 -- To view, visit http://gerrit.cloudera.org:8080/21269 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7b1d7fa332a901f05a46e0199e13fb841d2687c2 Gerrit-Change-Number: 21269 Gerrit-PatchSet: 6 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs
[Impala-ASF-CR] IMPALA-12996: Add support for DATE in Iceberg metadata tables
Daniel Becker has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/21292 ) Change subject: IMPALA-12996: Add support for DATE in Iceberg metadata tables .. IMPALA-12996: Add support for DATE in Iceberg metadata tables DATE fields in Iceberg metadata tables were NULLed out before this change. This change adds support for displaying their actual values. DATE fields are stored as 32-bit integers (storing the number of days since the epoch), so they are handled similarly to INTS, but if they are out of the valid DATE range, their value is set to DateValue::INVALID_DAYS_SINCE_EPOCH. Tests: - added a test query and adjusted existing ones in iceberg-metadata-tables.test Change-Id: Ib2223385f90555b1f9b22f3e27fa0e2489c3b9b5 --- M be/src/exec/iceberg-metadata/iceberg-row-reader.cc M be/src/exec/iceberg-metadata/iceberg-row-reader.h M testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test 3 files changed, 52 insertions(+), 10 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/92/21292/3 -- To view, visit http://gerrit.cloudera.org:8080/21292 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ib2223385f90555b1f9b22f3e27fa0e2489c3b9b5 Gerrit-Change-Number: 21292 Gerrit-PatchSet: 3 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-12996: Add support for DATE in Iceberg metadata tables
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/21292 ) Change subject: IMPALA-12996: Add support for DATE in Iceberg metadata tables .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/21292/2/be/src/exec/iceberg-metadata/iceberg-row-reader.cc File be/src/exec/iceberg-metadata/iceberg-row-reader.cc: http://gerrit.cloudera.org:8080/#/c/21292/2/be/src/exec/iceberg-metadata/iceberg-row-reader.cc@97 PS2, Line 97: TYPE_DATE > This shouldn't cause issue in practice, but it would be better to validate Thanks for pointing it out, I added the check by using the DataValue constructor. -- To view, visit http://gerrit.cloudera.org:8080/21292 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib2223385f90555b1f9b22f3e27fa0e2489c3b9b5 Gerrit-Change-Number: 21292 Gerrit-PatchSet: 3 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Fri, 12 Apr 2024 13:56:51 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12996: Add support for DATE in Iceberg metadata tables
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/21292 ) Change subject: IMPALA-12996: Add support for DATE in Iceberg metadata tables .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/21292/2/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test File testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test: http://gerrit.cloudera.org:8080/#/c/21292/2/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test@a805 PS2, Line 805: This is no longer needed because complex types are now expanded by default for Iceberg metadata tables. -- To view, visit http://gerrit.cloudera.org:8080/21292 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib2223385f90555b1f9b22f3e27fa0e2489c3b9b5 Gerrit-Change-Number: 21292 Gerrit-PatchSet: 2 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Daniel Becker Gerrit-Comment-Date: Fri, 12 Apr 2024 12:48:49 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12996: Add support for DATE in Iceberg metadata tables
Daniel Becker has uploaded this change for review. ( http://gerrit.cloudera.org:8080/21292 Change subject: IMPALA-12996: Add support for DATE in Iceberg metadata tables .. IMPALA-12996: Add support for DATE in Iceberg metadata tables DATE fields in Iceberg metadata tables were NULLed out before this change. This change adds support for displaying their actual values. DATE fields are stored as 32-bit integers so they are handled as if they were INTs. Tests: - added a test query and adjusted existing ones in iceberg-metadata-tables.test Change-Id: Ib2223385f90555b1f9b22f3e27fa0e2489c3b9b5 --- M be/src/exec/iceberg-metadata/iceberg-row-reader.cc M testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test 2 files changed, 23 insertions(+), 7 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/92/21292/2 -- To view, visit http://gerrit.cloudera.org:8080/21292 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ib2223385f90555b1f9b22f3e27fa0e2489c3b9b5 Gerrit-Change-Number: 21292 Gerrit-PatchSet: 2 Gerrit-Owner: Daniel Becker
[Impala-ASF-CR] IMPALA-12973,IMPALA-11491,IMPALA-12651: Support BINARY nested in complex types in select list
Daniel Becker has uploaded a new patch set (#4). ( http://gerrit.cloudera.org:8080/21269 ) Change subject: IMPALA-12973,IMPALA-11491,IMPALA-12651: Support BINARY nested in complex types in select list .. IMPALA-12973,IMPALA-11491,IMPALA-12651: Support BINARY nested in complex types in select list Binary fields in complex types are currently not supported at all for regular tables (an error is returned). For Iceberg metadata tables, IMPALA-12899 added a temporary workaround to allow queries that contain these fields to succeed by NULLing them out. This change adds support for displaying them with base64 encoding for both regular and Iceberg metadata tables. Complex types are displayed in JSON format, so simply inserting the bytes of the binary fields is not acceptable as it would produce invalid JSON. Base64 is a widely used encoding that allows representing arbitrary binary information using only a limited set of ASCII characters. This change also adds support for top level binary columns in Iceberg metadata tables. However, these are not base64 encoded but are returned in raw byte format - this is consistent with how top level binary columns from regular (non-metadata) tables are handled. Testing: - added test queries in iceberg-metadata-tables.test referencing both nested and top level binary fields; also updated existing queries - moved relevant tests (queries extracting binary fields from within complex types) from nested-types-scanner-basic.test to a new binary-in-complex-type.test file and also added a query that selects the containing complex types; this new test file is run from test_scanners.py::TestBinaryInComplexType::\ test_binary_in_complex_type - moved negative tests in AnalyzerTest.TestUnsupportedTypes() to AnalyzeStmtsTest.TestComplexTypesInSelectList() and converted them to positive tests (expecting success); a negative test already in AnalyzeStmtsTest.TestComplexTypesInSelectList() was also converted Change-Id: I7b1d7fa332a901f05a46e0199e13fb841d2687c2 --- M be/src/exec/iceberg-metadata/iceberg-metadata-scanner.cc M be/src/exec/iceberg-metadata/iceberg-metadata-scanner.h M be/src/exec/iceberg-metadata/iceberg-row-reader.cc M be/src/exec/iceberg-metadata/iceberg-row-reader.h M be/src/rpc/jni-thrift-util.h M be/src/runtime/complex-value-writer.inline.h M be/src/util/jni-util.cc M be/src/util/jni-util.h M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/SlotRef.java M fe/src/main/java/org/apache/impala/util/IcebergMetadataScanner.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java M testdata/data/README A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/data/0-0-data-danielbecker_20240408174043_c3737eaf-db30-4b88-aafb-f23c0f3c1dd3-job_17125053806420_0002-1-1.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/64da0e56-efa3-4025-bef1-1047fdd9a2b0-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/snap-3079551887386250470-1-64da0e56-efa3-4025-bef1-1047fdd9a2b0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/v1.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/v2.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/version-hint.txt M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv A testdata/workloads/functional-query/queries/QueryTest/binary-in-complex-type.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test M testdata/workloads/functional-query/queries/QueryTest/nested-types-scanner-basic.test M tests/query_test/test_scanners.py 26 files changed, 438 insertions(+), 154 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/21269/4 -- To view, visit http://gerrit.cloudera.org:8080/21269 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7b1d7fa332a901f05a46e0199e13fb841d2687c2 Gerrit-Change-Number: 21269 Gerrit-PatchSet: 4 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs
[Impala-ASF-CR] IMPALA-12973,IMPALA-11491,IMPALA-12651: Support BINARY nested in complex types in select list
Daniel Becker has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/21269 ) Change subject: IMPALA-12973,IMPALA-11491,IMPALA-12651: Support BINARY nested in complex types in select list .. IMPALA-12973,IMPALA-11491,IMPALA-12651: Support BINARY nested in complex types in select list Binary fields in complex types are currently not supported at all for regular tables (an error is returned). For Iceberg metadata tables, IMPALA-12899 added a temporary workaround to allow queries that contain these fields to succeed by NULLing them out. This change adds support for displaying them with base64 encoding for both regular and Iceberg metadata tables. Complex types are displayed in JSON format, so simply inserting the bytes of the binary fields is not acceptable as it would produce invalid JSON. Base64 is a widely used encoding that allows representing arbitrary binary information using only a limited set of ASCII characters. This change also adds support for top level binary columns in Iceberg metadata tables. However, these are not base64 encoded but are returned in raw byte format - this is consistent with how top level binary columns from regular (non-metadata) tables are handled. Testing: - added test queries in iceberg-metadata-tables.test referencing both nested and top level binary fields; also updated existing queries - moved relevant tests (queries extracting binary fields from within complex types) from nested-types-scanner-basic.test to a new binary-in-complex-type.test file and also added a query that selects the containing complex types; this new test file is run from test_scanners.py::TestBinaryInComplexType::\ test_binary_in_complex_type - moved negative tests in AnalyzerTest.TestUnsupportedTypes() to AnalyzeStmtsTest.TestComplexTypesInSelectList() and converted them to positive tests (expecting success); a negative test already in AnalyzeStmtsTest.TestComplexTypesInSelectList() was also converted Change-Id: I7b1d7fa332a901f05a46e0199e13fb841d2687c2 --- M be/src/exec/iceberg-metadata/iceberg-metadata-scanner.cc M be/src/exec/iceberg-metadata/iceberg-metadata-scanner.h M be/src/exec/iceberg-metadata/iceberg-row-reader.cc M be/src/exec/iceberg-metadata/iceberg-row-reader.h M be/src/rpc/jni-thrift-util.h M be/src/runtime/complex-value-writer.inline.h M be/src/util/jni-util.cc M be/src/util/jni-util.h M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/SlotRef.java M fe/src/main/java/org/apache/impala/util/IcebergMetadataScanner.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java M testdata/data/README A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/data/0-0-data-danielbecker_20240408174043_c3737eaf-db30-4b88-aafb-f23c0f3c1dd3-job_17125053806420_0002-1-1.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/64da0e56-efa3-4025-bef1-1047fdd9a2b0-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/snap-3079551887386250470-1-64da0e56-efa3-4025-bef1-1047fdd9a2b0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/v1.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/v2.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/version-hint.txt M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv A testdata/workloads/functional-query/queries/QueryTest/binary-in-complex-type.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test M testdata/workloads/functional-query/queries/QueryTest/nested-types-scanner-basic.test M tests/query_test/test_scanners.py 26 files changed, 440 insertions(+), 153 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/21269/3 -- To view, visit http://gerrit.cloudera.org:8080/21269 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7b1d7fa332a901f05a46e0199e13fb841d2687c2 Gerrit-Change-Number: 21269 Gerrit-PatchSet: 3 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs
[Impala-ASF-CR] [TEST][WIP]IMPALA-11499: Refactor UrlEncode function to handle special characters
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/21131 ) Change subject: [TEST][WIP]IMPALA-11499: Refactor UrlEncode function to handle special characters .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/21131/3/be/src/util/coding-util.cc File be/src/util/coding-util.cc: http://gerrit.cloudera.org:8080/#/c/21131/3/be/src/util/coding-util.cc@83 PS3, Line 83: if (!hive_compat) { > That was previously possible. The following works for me before this change The exception is thrown here in the catalogd: https://github.com/apache/impala/blob/70c35425d3f8ac68b23fb8e0d08e12ee763965d7/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java#L1770 It is a Preconditions check. The partition name is not found in the map. 'partName' is "p=12?!@%23$%25^&%2A()|%3F%2F.~`", while 'nameToPartitionMap_' contains "p=12?!@%23$%25%5E&%2A()|%3F%2F.~`". The difference is the caret ("^"). What I think happens is that the HMS encodes it to "%5E", and that is what is present in the 'nameToPartitionMap_' map. But we don't encode it, hence 'partName' contains "^". Indeed, if I add the mapping {'^', "%5E"} in 'specialCharacterMap' in this file, there is no error. -- To view, visit http://gerrit.cloudera.org:8080/21131 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb Gerrit-Change-Number: 21131 Gerrit-PatchSet: 3 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Zihao Ye Gerrit-Comment-Date: Thu, 11 Apr 2024 10:38:30 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12810: Simplify IcebergDeleteNode and IcebergDeleteBuilder
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/21258 ) Change subject: IMPALA-12810: Simplify IcebergDeleteNode and IcebergDeleteBuilder .. Patch Set 9: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/21258 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3ba02b33433990950b49628f11e732e01ed8a34d Gerrit-Change-Number: 21258 Gerrit-PatchSet: 9 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 10 Apr 2024 15:10:18 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12986: Base64Encode fails if the 'out len' output parameter is passed with certain values
Daniel Becker has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/21271 ) Change subject: IMPALA-12986: Base64Encode fails if the 'out_len' output parameter is passed with certain values .. IMPALA-12986: Base64Encode fails if the 'out_len' output parameter is passed with certain values The Base64Encode function in coding-util.h with signature bool Base64Encode(const char* in, int64_t in_len, int64_t out_max, char* out, int64_t* out_len); fails if '*out_len', when passed to the function, contains a value that does not fit in a 32 bit integer. Internally we use the int sasl_encode64(const char *in, unsigned inlen, char *out, unsigned outmax, unsigned *outlen); function and explicitly cast 'out_len' to 'unsigned*'. The error is that the called sasl_encode64() function only sets the four lower bytes of '*out_len' (assuming that 'unsigned' is a 32 bit integer), and if the upper bytes are not all zero, the resulting value of '*out_len' will be incorrect. This change changes the type of 'out_len' from 'int64_t*' to 'unsigned*' to match the type that sasl_encode64() expects. Base64Decode() is also updated to use 'unsigned*'. Before this change it used an intermediate 32 bit local variable to avoid this issue. Testing: - added a regression test in coding-util-test.cc Change-Id: I35ae59fc9b3280f89ea4f7d95d27d2f21751001f --- M be/src/exec/text-converter.inline.h M be/src/exprs/string-functions-ir.cc M be/src/util/coding-util-test.cc M be/src/util/coding-util.cc M be/src/util/coding-util.h M be/src/util/runtime-profile.cc 6 files changed, 43 insertions(+), 16 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/71/21271/3 -- To view, visit http://gerrit.cloudera.org:8080/21271 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I35ae59fc9b3280f89ea4f7d95d27d2f21751001f Gerrit-Change-Number: 21271 Gerrit-PatchSet: 3 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Peter Rozsa Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-12986: Base64Encode fails if the 'out len' output parameter is passed with certain values
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/21271 ) Change subject: IMPALA-12986: Base64Encode fails if the 'out_len' output parameter is passed with certain values .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/21271/2/be/src/util/coding-util.cc File be/src/util/coding-util.cc: http://gerrit.cloudera.org:8080/#/c/21271/2/be/src/util/coding-util.cc@125 PS2, Line 125: bool Base64Encode(const char* in, int64_t in_len, int64_t out_max, char* out, > What do you think about changing Base64* functions to use uint32_t* or size I actually think we should use 'unsigned' because that's what sasl_encode64() uses. It's very ugly and error-prone to cast from an integer pointer to the other when the size of the original and destination types are different. Now I think the current error is also because of that: we cast int64_t* to unsigned* (32 bits in this case) and the called function will only write into the lower 4 bytes, and if the upper 4 bytes are non-zero, the result will fail the check at L134 (*out_len != out_max - 1). -- To view, visit http://gerrit.cloudera.org:8080/21271 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I35ae59fc9b3280f89ea4f7d95d27d2f21751001f Gerrit-Change-Number: 21271 Gerrit-PatchSet: 2 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Peter Rozsa Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 10 Apr 2024 13:58:37 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12810: Simplify IcebergDeleteNode and IcebergDeleteBuilder
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/21258 ) Change subject: IMPALA-12810: Simplify IcebergDeleteNode and IcebergDeleteBuilder .. Patch Set 8: (2 comments) http://gerrit.cloudera.org:8080/#/c/21258/7/be/src/runtime/krpc-data-stream-sender.cc File be/src/runtime/krpc-data-stream-sender.cc: http://gerrit.cloudera.org:8080/#/c/21258/7/be/src/runtime/krpc-data-stream-sender.cc@1125 PS7, Line 1125: Or > I can change it if you feel strong about it, but I think concise and simple I think it's a bit unexpected/abrupt after the previous paragraphs which are full sentences. In isolation it would be ok but I think it's better to match the style. http://gerrit.cloudera.org:8080/#/c/21258/8/testdata/data/README File testdata/data/README: http://gerrit.cloudera.org:8080/#/c/21258/8/testdata/data/README@1134 PS8, Line 1134: 1) Created the table via Impala and added some records to it. Could you include the CREATE TABLE and INSERT statements for reproducibility? -- To view, visit http://gerrit.cloudera.org:8080/21258 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3ba02b33433990950b49628f11e732e01ed8a34d Gerrit-Change-Number: 21258 Gerrit-PatchSet: 8 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 10 Apr 2024 13:19:08 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12973,IMPALA-11491,IMPALA-12651: Support BINARY nested in complex types in select list
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/21269 ) Change subject: IMPALA-12973,IMPALA-11491,IMPALA-12651: Support BINARY nested in complex types in select list .. Patch Set 2: (7 comments) http://gerrit.cloudera.org:8080/#/c/21269/1/be/src/exec/iceberg-metadata/iceberg-row-reader.cc File be/src/exec/iceberg-metadata/iceberg-row-reader.cc: http://gerrit.cloudera.org:8080/#/c/21269/1/be/src/exec/iceberg-metadata/iceberg-row-reader.cc@135 PS1, Line 135: DCHECK(false) << "Unsupported column type: " << slot_desc->type().type; > Are there still unsupported types that are intentionally nulled? How is thi There aren't any more types that we don't support but are present in metadata tables (TINYINT and SHORTINT are not Iceberg types). In our tests we have "select *" for all metadata tables so if Iceberg later adds e.g. a DATE field our tests will catch it. Added a DCHECK instead. http://gerrit.cloudera.org:8080/#/c/21269/1/be/src/exec/iceberg-metadata/iceberg-row-reader.cc@187 PS1, Line 187: const jobject _value, void* slot, MemPool* tuple_data_pool) { > line too long (92 > 90) Done http://gerrit.cloudera.org:8080/#/c/21269/1/be/src/exec/iceberg-metadata/iceberg-row-reader.cc@198 PS1, Line 198: RETURN_IF_ERROR(metadata_scanner_->ConvertJavaByteBufferToByteArray( > tl/dr: The binary handling seems good to me, but I have concerns about the I agree with what you say, let's see if we have capacity to do it. Please create a Jira for it. Thanks. http://gerrit.cloudera.org:8080/#/c/21269/1/be/src/exec/iceberg-metadata/iceberg-row-reader.cc@209 PS1, Line 209: > nit: extra ; Done http://gerrit.cloudera.org:8080/#/c/21269/1/be/src/rpc/jni-thrift-util.h File be/src/rpc/jni-thrift-util.h: http://gerrit.cloudera.org:8080/#/c/21269/1/be/src/rpc/jni-thrift-util.h@62 PS1, Line 62: } > With JniByteArrayGuard we could just return DeserializeThriftMsg directly Done http://gerrit.cloudera.org:8080/#/c/21269/1/be/src/util/jni-util.h File be/src/util/jni-util.h: http://gerrit.cloudera.org:8080/#/c/21269/1/be/src/util/jni-util.h@115 PS1, Line 115: /// is more restricted, see https://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/functions.html#GetPrimitiveArrayCritical_ReleasePrimitiveArrayCritical > line too long (162 > 90) Done http://gerrit.cloudera.org:8080/#/c/21269/1/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test File testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test: http://gerrit.cloudera.org:8080/#/c/21269/1/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test@780 PS1, Line 780: > nit: extra lines? (line 791) We have these extra empty lines in this file, for example on L686 and L765 from before. I think they contribute to making the code easier to read and navigate. -- To view, visit http://gerrit.cloudera.org:8080/21269 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7b1d7fa332a901f05a46e0199e13fb841d2687c2 Gerrit-Change-Number: 21269 Gerrit-PatchSet: 2 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Comment-Date: Wed, 10 Apr 2024 11:25:16 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12973,IMPALA-11491,IMPALA-12651: Support BINARY nested in complex types in select list
Daniel Becker has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/21269 ) Change subject: IMPALA-12973,IMPALA-11491,IMPALA-12651: Support BINARY nested in complex types in select list .. IMPALA-12973,IMPALA-11491,IMPALA-12651: Support BINARY nested in complex types in select list Binary fields in complex types are currently not supported at all for regular tables (an error is returned). For Iceberg metadata tables, IMPALA-12899 added a temporary workaround to allow queries that contain these fields to succeed by NULLing them out. This change adds support for displaying them with base64 encoding for both regular and Iceberg metadata tables. Complex types are displayed in JSON format, so simply inserting the bytes of the binary fields is not acceptable as it would produce invalid JSON. Base64 is a widely used encoding that allows representing arbitrary binary information using only a limited set of ASCII characters. This change also adds support for top level binary columns in Iceberg metadata tables. However, these are not base64 encoded but are returned in raw byte format - this is consistent with how top level binary columns from regular (non-metadata) tables are handled. Testing: - added test queries in iceberg-metadata-tables.test referencing both nested and top level binary fields; also updated existing queries - moved relevant tests (queries extracting binary fields from within complex types) from nested-types-scanner-basic.test to a new binary-in-complex-type.test file and also added a query that selects the containing complex types; this new test file is run from test_scanners.py::TestBinaryInComplexType::\ test_binary_in_complex_type - moved negative tests in AnalyzerTest.TestUnsupportedTypes() to AnalyzeStmtsTest.TestComplexTypesInSelectList() and converted them to positive tests (expecting success); a negative test already in AnalyzeStmtsTest.TestComplexTypesInSelectList() was also converted Change-Id: I7b1d7fa332a901f05a46e0199e13fb841d2687c2 --- M be/src/exec/iceberg-metadata/iceberg-metadata-scanner.cc M be/src/exec/iceberg-metadata/iceberg-metadata-scanner.h M be/src/exec/iceberg-metadata/iceberg-row-reader.cc M be/src/exec/iceberg-metadata/iceberg-row-reader.h M be/src/rpc/jni-thrift-util.h M be/src/runtime/complex-value-writer.inline.h M be/src/util/jni-util.cc M be/src/util/jni-util.h M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/SlotRef.java M fe/src/main/java/org/apache/impala/util/IcebergMetadataScanner.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java M testdata/data/README A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/data/0-0-data-danielbecker_20240408174043_c3737eaf-db30-4b88-aafb-f23c0f3c1dd3-job_17125053806420_0002-1-1.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/64da0e56-efa3-4025-bef1-1047fdd9a2b0-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/snap-3079551887386250470-1-64da0e56-efa3-4025-bef1-1047fdd9a2b0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/v1.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/v2.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/version-hint.txt M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv A testdata/workloads/functional-query/queries/QueryTest/binary-in-complex-type.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test M testdata/workloads/functional-query/queries/QueryTest/nested-types-scanner-basic.test M tests/query_test/test_scanners.py 26 files changed, 441 insertions(+), 155 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/21269/2 -- To view, visit http://gerrit.cloudera.org:8080/21269 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7b1d7fa332a901f05a46e0199e13fb841d2687c2 Gerrit-Change-Number: 21269 Gerrit-PatchSet: 2 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs
[Impala-ASF-CR] IMPALA-12810: Simplify IcebergDeleteNode and IcebergDeleteBuilder
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/21258 ) Change subject: IMPALA-12810: Simplify IcebergDeleteNode and IcebergDeleteBuilder .. Patch Set 7: (2 comments) Thanks. http://gerrit.cloudera.org:8080/#/c/21258/7/be/src/exec/iceberg-delete-builder.cc File be/src/exec/iceberg-delete-builder.cc: http://gerrit.cloudera.org:8080/#/c/21258/7/be/src/exec/iceberg-delete-builder.cc@283 PS7, Line 283: ErrorMsg(TErrorCode::GENERAL, "NULL found as file_path in delete file")); Can't we return or continue here? http://gerrit.cloudera.org:8080/#/c/21258/7/be/src/runtime/krpc-data-stream-sender.cc File be/src/runtime/krpc-data-stream-sender.cc: http://gerrit.cloudera.org:8080/#/c/21258/7/be/src/runtime/krpc-data-stream-sender.cc@1125 PS7, Line 1125: Or Nit: something like "A third case is..." would be nicer. -- To view, visit http://gerrit.cloudera.org:8080/21258 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3ba02b33433990950b49628f11e732e01ed8a34d Gerrit-Change-Number: 21258 Gerrit-PatchSet: 7 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 10 Apr 2024 10:20:49 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12810: Simplify IcebergDeleteNode and IcebergDeleteBuilder
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/21258 ) Change subject: IMPALA-12810: Simplify IcebergDeleteNode and IcebergDeleteBuilder .. Patch Set 4: (3 comments) http://gerrit.cloudera.org:8080/#/c/21258/4//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21258/4//COMMIT_MSG@13 PS4, Line 13: IcebergDeleteBuilder now also tolerates NULL file paths Is it possible to add a test for this? http://gerrit.cloudera.org:8080/#/c/21258/4/be/src/exec/iceberg-delete-builder.h File be/src/exec/iceberg-delete-builder.h: http://gerrit.cloudera.org:8080/#/c/21258/4/be/src/exec/iceberg-delete-builder.h@79 PS4, Line 79: /// Partitioned: Both data and delete files are hashed by the file path. This means Shouldn't we mention DIRECTED mode here? http://gerrit.cloudera.org:8080/#/c/21258/4/be/src/exec/iceberg-delete-builder.cc File be/src/exec/iceberg-delete-builder.cc: http://gerrit.cloudera.org:8080/#/c/21258/4/be/src/exec/iceberg-delete-builder.cc@272 PS4, Line 272: state Do we use 'state' anywhere? If not, this parameter could also be removed from AddBatch(). -- To view, visit http://gerrit.cloudera.org:8080/21258 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3ba02b33433990950b49628f11e732e01ed8a34d Gerrit-Change-Number: 21258 Gerrit-PatchSet: 4 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 09 Apr 2024 14:31:45 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12986: Base64Encode fails if the 'out len' output parameter is passed with certain values
Daniel Becker has uploaded this change for review. ( http://gerrit.cloudera.org:8080/21271 Change subject: IMPALA-12986: Base64Encode fails if the 'out_len' output parameter is passed with certain values .. IMPALA-12986: Base64Encode fails if the 'out_len' output parameter is passed with certain values The Base64Encode function in coding-util.h with signature bool Base64Encode(const char* in, int64_t in_len, int64_t out_max, char* out, int64_t* out_len); fails if '*out_len', when passed to the function, contains a negative value or a value that does not fit in a 32 bit integer. Internally we use the int sasl_encode64(const char *in, unsigned inlen, char *out, unsigned outmax, unsigned *outlen); function and explicitly cast 'out_len' to 'unsigned*'. The success of this function shouldn't depend on the value of '*out_len' because it is an output parameter, so this change sets '*out_len' to zero before passing it to sasl_encode64(). Testing: - added a regression test in coding-util-test.cc Change-Id: I35ae59fc9b3280f89ea4f7d95d27d2f21751001f --- M be/src/util/coding-util-test.cc M be/src/util/coding-util.cc M be/src/util/coding-util.h 3 files changed, 32 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/71/21271/2 -- To view, visit http://gerrit.cloudera.org:8080/21271 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I35ae59fc9b3280f89ea4f7d95d27d2f21751001f Gerrit-Change-Number: 21271 Gerrit-PatchSet: 2 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Peter Rozsa Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-5323: Support BINARY columns in Kudu tables
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/18868 ) Change subject: IMPALA-5323: Support BINARY columns in Kudu tables .. Patch Set 8: Code-Review+1 Thanks Csaba, looks good. -- To view, visit http://gerrit.cloudera.org:8080/18868 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iff701a4b3a09ce7b6982c5d238e65f3d4f3d1151 Gerrit-Change-Number: 18868 Gerrit-PatchSet: 8 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Peter Rozsa Gerrit-Comment-Date: Tue, 09 Apr 2024 10:37:47 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12973,IMPALA-11491,IMPALA-12651: Support BINARY nested in complex types in select list
Daniel Becker has uploaded this change for review. ( http://gerrit.cloudera.org:8080/21269 Change subject: IMPALA-12973,IMPALA-11491,IMPALA-12651: Support BINARY nested in complex types in select list .. IMPALA-12973,IMPALA-11491,IMPALA-12651: Support BINARY nested in complex types in select list Binary fields in complex types are currently not supported at all for regular tables (an error is returned). For Iceberg metadata tables, IMPALA-12899 added a temporary workaround to allow queries that contain these fields to succeed by NULLing them out. This change adds support for displaying them with base64 encoding for both regular and Iceberg metadata tables. Complex types are displayed in JSON format, so simply inserting the bytes of the binary fields is not acceptable as it would produce invalid JSON. Base64 is a widely used encoding that allows representing arbitrary binary information using only a limited set of ASCII characters. This change also adds support for top level binary columns in Iceberg metadata tables. However, these are not base64 encoded but are returned in raw byte format - this is consistent with how top level binary columns from regular (non-metadata) tables are handled. Testing: - added test queries in iceberg-metadata-tables.test referencing both nested and top level binary fields; also updated existing queries - moved relevant tests (queries extracting binary fields from within complex types) from nested-types-scanner-basic.test to a new binary-in-complex-type.test file and also added a query that selects the containing complex types; this new test file is run from test_scanners.py::TestBinaryInComplexType::\ test_binary_in_complex_type - moved negative tests in AnalyzerTest.TestUnsupportedTypes() to AnalyzeStmtsTest.TestComplexTypesInSelectList() and converted them to positive tests (expecting success); a negative test already in AnalyzeStmtsTest.TestComplexTypesInSelectList() was also converted Change-Id: I7b1d7fa332a901f05a46e0199e13fb841d2687c2 --- M be/src/exec/iceberg-metadata/iceberg-metadata-scanner.cc M be/src/exec/iceberg-metadata/iceberg-metadata-scanner.h M be/src/exec/iceberg-metadata/iceberg-row-reader.cc M be/src/exec/iceberg-metadata/iceberg-row-reader.h M be/src/rpc/jni-thrift-util.h M be/src/runtime/complex-value-writer.inline.h M be/src/util/jni-util.cc M be/src/util/jni-util.h M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/SlotRef.java M fe/src/main/java/org/apache/impala/util/IcebergMetadataScanner.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java M testdata/data/README A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/data/0-0-data-danielbecker_20240408174043_c3737eaf-db30-4b88-aafb-f23c0f3c1dd3-job_17125053806420_0002-1-1.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/64da0e56-efa3-4025-bef1-1047fdd9a2b0-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/snap-3079551887386250470-1-64da0e56-efa3-4025-bef1-1047fdd9a2b0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/v1.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/v2.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/version-hint.txt M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv A testdata/workloads/functional-query/queries/QueryTest/binary-in-complex-type.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test M testdata/workloads/functional-query/queries/QueryTest/nested-types-scanner-basic.test M tests/query_test/test_scanners.py 26 files changed, 438 insertions(+), 150 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/21269/1 -- To view, visit http://gerrit.cloudera.org:8080/21269 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I7b1d7fa332a901f05a46e0199e13fb841d2687c2 Gerrit-Change-Number: 21269 Gerrit-PatchSet: 1 Gerrit-Owner: Daniel Becker
[Impala-ASF-CR] IMPALA-5323: Support BINARY columns in Kudu tables
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/18868 ) Change subject: IMPALA-5323: Support BINARY columns in Kudu tables .. Patch Set 7: (3 comments) Looks good, thanks. http://gerrit.cloudera.org:8080/#/c/18868/7/be/src/exec/kudu/kudu-util-ir.cc File be/src/exec/kudu/kudu-util-ir.cc: http://gerrit.cloudera.org:8080/#/c/18868/7/be/src/exec/kudu/kudu-util-ir.cc@56 PS7, Line 56: const char* VALUE_ERROR_MSG = "Could not set Kudu row value."; Could be constexpr: constexpr const char* VALUE_ERROR_MSG = "..."; http://gerrit.cloudera.org:8080/#/c/18868/5/fe/src/main/java/org/apache/impala/analysis/CastExpr.java File fe/src/main/java/org/apache/impala/analysis/CastExpr.java: http://gerrit.cloudera.org:8080/#/c/18868/5/fe/src/main/java/org/apache/impala/analysis/CastExpr.java@198 PS5, Line 198:L > I agree with Peter that it should be a separate patch. Could you create a Jira ticket for it and reference the number here? http://gerrit.cloudera.org:8080/#/c/18868/7/testdata/workloads/functional-query/queries/QueryTest/binary-type.test File testdata/workloads/functional-query/queries/QueryTest/binary-type.test: http://gerrit.cloudera.org:8080/#/c/18868/7/testdata/workloads/functional-query/queries/QueryTest/binary-type.test@170 PS7, Line 170: s Nit: constant folding. See also on L182. -- To view, visit http://gerrit.cloudera.org:8080/18868 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iff701a4b3a09ce7b6982c5d238e65f3d4f3d1151 Gerrit-Change-Number: 18868 Gerrit-PatchSet: 7 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Peter Rozsa Gerrit-Comment-Date: Thu, 04 Apr 2024 13:50:16 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12612: SELECT * queries expand complex type columns from Iceberg metadata tables
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/21236 ) Change subject: IMPALA-12612: SELECT * queries expand complex type columns from Iceberg metadata tables .. Patch Set 5: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/21236 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia298705ba54411cc439e99d5cb27184093541f02 Gerrit-Change-Number: 21236 Gerrit-PatchSet: 5 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Thu, 04 Apr 2024 09:15:39 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12612: SELECT * queries expand complex type columns from Iceberg metadata tables
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/21236 ) Change subject: IMPALA-12612: SELECT * queries expand complex type columns from Iceberg metadata tables .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/21236/1/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test File testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test: http://gerrit.cloudera.org:8080/#/c/21236/1/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test@1110 PS1, Line 1110: select readable_metrics.i.* from functional_parquet.iceberg_query_metadata.`files`; > I changed this to expand readable_metrics.i Sorry, I didn't notice that readable_metrics.i doesn't contain complex types, so expanding it is not relevant for this patch. We could try "select data_file.* from functional_parquet.iceberg_query_metadata.entries" instead, that contains some maps. -- To view, visit http://gerrit.cloudera.org:8080/21236 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia298705ba54411cc439e99d5cb27184093541f02 Gerrit-Change-Number: 21236 Gerrit-PatchSet: 3 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Thu, 04 Apr 2024 07:54:45 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12612: Expand complex type columns from Iceberg metadata tables
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/21236 ) Change subject: IMPALA-12612: Expand complex type columns from Iceberg metadata tables .. Patch Set 1: (5 comments) Thanks Gábor, it looks good, I have only a few remarks. http://gerrit.cloudera.org:8080/#/c/21236/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21236/1//COMMIT_MSG@7 PS1, Line 7: IMPALA-12612: Expand complex type columns from Iceberg metadata tables The title could include "select *", otherwise it's not clear what this refers to. http://gerrit.cloudera.org:8080/#/c/21236/1//COMMIT_MSG@9 PS1, Line 9: , Nit: should come after "behave". http://gerrit.cloudera.org:8080/#/c/21236/1//COMMIT_MSG@14 PS1, Line 14: Note, the behavior of handling nested columns from regular tables We could mention that although this is technically a breaking change, metadata tables are a very recent feature so it is not problematic. http://gerrit.cloudera.org:8080/#/c/21236/1/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test File testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test: http://gerrit.cloudera.org:8080/#/c/21236/1/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test@25 PS1, Line 25: hdfs://localhost:20500 Is it intentional that these are changed from "$NAMENODE" to "hdfs://localhost:20500"? Applies also to some of the other queries. http://gerrit.cloudera.org:8080/#/c/21236/1/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test@1110 PS1, Line 1110: select readable_metrics.* from functional_parquet.iceberg_query_metadata.`files`; This example can be confusing at first because the 'readable_metrics' struct itself only contains a single struct. I thought first that it failed to expand to the columns "column_size", "value_count" etc. We could either mention it in a comment and/or expand "readable_metrics.i" instead. -- To view, visit http://gerrit.cloudera.org:8080/21236 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia298705ba54411cc439e99d5cb27184093541f02 Gerrit-Change-Number: 21236 Gerrit-PatchSet: 1 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Wed, 03 Apr 2024 21:52:04 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12969: Release JNI array if DeserializeThriftMsg failed
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/21234 ) Change subject: IMPALA-12969: Release JNI array if DeserializeThriftMsg failed .. Patch Set 1: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/21234/1/be/src/rpc/jni-thrift-util.h File be/src/rpc/jni-thrift-util.h: http://gerrit.cloudera.org:8080/#/c/21234/1/be/src/rpc/jni-thrift-util.h@61 PS1, Line 61: Status status = DeserializeThriftMsg( We could use something like JniUtil::JniUtfCharGuard for arrays as well. I understand that it is urgent to correct this error so it can be done in another commit. -- To view, visit http://gerrit.cloudera.org:8080/21234 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id2c0335b12e9289ae851d0ec050765951a8ca6c7 Gerrit-Change-Number: 21234 Gerrit-PatchSet: 1 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Wed, 03 Apr 2024 21:39:20 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12899: Temporary workaround for BINARY in complex types
Daniel Becker has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/21219 ) Change subject: IMPALA-12899: Temporary workaround for BINARY in complex types .. IMPALA-12899: Temporary workaround for BINARY in complex types The BINARY type is currently not supported inside complex types and a cross-component decision is probably needed to support it (see IMPALA-11491). We would like to enable EXPAND_COMPLEX_TYPES for Iceberg metadata tables (IMPALA-12612), which requires that queries with BINARY inside complex types don't fail. Enabling EXPAND_COMPLEX_TYPES is a more prioritised issue than IMPALA-11491, so we have come up with a temporary solution. This change NULLs out BINARY values in complex types coming from Iceberg metadata tables and logs a warning. BINARYs in complex types from regular tables are not affected by this change. Testing: - Added test queries in iceberg-metadata-tables.test. Change-Id: I0d834126c7d702a25e957bb6071ecbf0fda2c203 Reviewed-on: http://gerrit.cloudera.org:8080/21219 Reviewed-by: Gabor Kaszab Tested-by: Impala Public Jenkins --- M be/src/exec/iceberg-metadata/iceberg-row-reader.cc M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/Path.java M fe/src/main/java/org/apache/impala/analysis/SlotRef.java M testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test 5 files changed, 64 insertions(+), 6 deletions(-) Approvals: Gabor Kaszab: Looks good to me, approved Impala Public Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/21219 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I0d834126c7d702a25e957bb6071ecbf0fda2c203 Gerrit-Change-Number: 21219 Gerrit-PatchSet: 10 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-12899: Temporary workaround for BINARY in complex types
Daniel Becker has uploaded a new patch set (#9). ( http://gerrit.cloudera.org:8080/21219 ) Change subject: IMPALA-12899: Temporary workaround for BINARY in complex types .. IMPALA-12899: Temporary workaround for BINARY in complex types The BINARY type is currently not supported inside complex types and a cross-component decision is probably needed to support it (see IMPALA-11491). We would like to enable EXPAND_COMPLEX_TYPES for Iceberg metadata tables (IMPALA-12612), which requires that queries with BINARY inside complex types don't fail. Enabling EXPAND_COMPLEX_TYPES is a more prioritised issue than IMPALA-11491, so we have come up with a temporary solution. This change NULLs out BINARY values in complex types coming from Iceberg metadata tables and logs a warning. BINARYs in complex types from regular tables are not affected by this change. Testing: - Added test queries in iceberg-metadata-tables.test. Change-Id: I0d834126c7d702a25e957bb6071ecbf0fda2c203 --- M be/src/exec/iceberg-metadata/iceberg-row-reader.cc M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/Path.java M fe/src/main/java/org/apache/impala/analysis/SlotRef.java M testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test 5 files changed, 64 insertions(+), 6 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/19/21219/9 -- To view, visit http://gerrit.cloudera.org:8080/21219 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I0d834126c7d702a25e957bb6071ecbf0fda2c203 Gerrit-Change-Number: 21219 Gerrit-PatchSet: 9 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-12899: Temporary workaround for BINARY in complex types
Daniel Becker has uploaded a new patch set (#8). ( http://gerrit.cloudera.org:8080/21219 ) Change subject: IMPALA-12899: Temporary workaround for BINARY in complex types .. IMPALA-12899: Temporary workaround for BINARY in complex types The BINARY type is currently not supported inside complex types and a cross-component decision is probably needed to support it (see IMPALA-11491). We would like to enable EXPAND_COMPLEX_TYPES for Iceberg metadata tables (IMPALA-12612), which requires that queries with BINARY inside complex types don't fail. Enabling EXPAND_COMPLEX_TYPES is a more prioritised issue than IMPALA-11491, so we have come up with a temporary solution. This change NULLs out BINARY values in complex types coming from Iceberg metadata tables and logs a warning. BINARYs in complex types from regular tables are not affected by this change. Testing: - Added test queries in iceberg-metadata-tables.test. Change-Id: I0d834126c7d702a25e957bb6071ecbf0fda2c203 --- M be/src/exec/iceberg-metadata/iceberg-row-reader.cc M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/Path.java M fe/src/main/java/org/apache/impala/analysis/SlotRef.java M testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test 5 files changed, 64 insertions(+), 6 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/19/21219/8 -- To view, visit http://gerrit.cloudera.org:8080/21219 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I0d834126c7d702a25e957bb6071ecbf0fda2c203 Gerrit-Change-Number: 21219 Gerrit-PatchSet: 8 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-12899: Temporary workaround for BINARY in complex types
Daniel Becker has uploaded a new patch set (#7). ( http://gerrit.cloudera.org:8080/21219 ) Change subject: IMPALA-12899: Temporary workaround for BINARY in complex types .. IMPALA-12899: Temporary workaround for BINARY in complex types The BINARY type is currently not supported inside complex types and a cross-component decision is probably needed to support it (see IMPALA-11491). We would like to enable EXPAND_COMPLEX_TYPES for Iceberg metadata tables (IMPALA-12612), which requires that queries with BINARY inside complex types don't fail. Enabling EXPAND_COMPLEX_TYPES is a more prioritised issue than IMPALA-11491, so we have come up with a temporary solution. This change NULLs out BINARY values in complex types coming from Iceberg metadata tables and logs a warning. BINARYs in complex types from regular tables are not affected by this change. Testing: - Added test queries in iceberg-metadata-tables.test. Change-Id: I0d834126c7d702a25e957bb6071ecbf0fda2c203 --- M be/src/exec/iceberg-metadata/iceberg-row-reader.cc M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/Path.java M fe/src/main/java/org/apache/impala/analysis/SlotRef.java M testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test 5 files changed, 64 insertions(+), 6 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/19/21219/7 -- To view, visit http://gerrit.cloudera.org:8080/21219 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I0d834126c7d702a25e957bb6071ecbf0fda2c203 Gerrit-Change-Number: 21219 Gerrit-PatchSet: 7 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-12899: Temporary workaround for BINARY in complex types
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/21219 ) Change subject: IMPALA-12899: Temporary workaround for BINARY in complex types .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/21219/6/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test File testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test: http://gerrit.cloudera.org:8080/#/c/21219/6/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test@822 PS6, Line 822: 1,'/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/data/delete-074a9e19e61b766e-652a169e0001_800513971_data.0.parq','PARQUET',0,1,1606,'{2147483546:215,2147483545:51}','{2147483546:1,2147483545:1}','{2147483546:0,2147483545:0}','NULL','{2147483546:null,2147483545:null}','{2147483546:null,2147483545:null}','NULL','NULL','NULL',NULL,'{"d":{"column_size":null,"value_count":null,"null_value_count":null,"nan_value_count":null,"lower_bound":null,"upper_bound":null},"i":{"column_size":null,"value_count":null,"null_value_count":null,"nan_value_count":null,"lower_bound":null,"upper_bound":null},"s":{"column_size":null,"value_count":null,"null_value_count":null,"nan_value_count":null,"lower_bound":null,"upper_bound":null}}' > shouldn't these also be regexes similarly as above because of the paths? These files are copied into the table directory, not generated with INSERT/DELETE statements. They are also present with concrete values in the query at L617. I experimented with regexes also but I couldn't get the order of the results correctly, although I changed the expected order and also tried adding VERIFY_IS_EQUAL_SORTED. Therefore I looked for a table for which the file paths are constant across data loads so we don't have to use regexes. -- To view, visit http://gerrit.cloudera.org:8080/21219 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0d834126c7d702a25e957bb6071ecbf0fda2c203 Gerrit-Change-Number: 21219 Gerrit-PatchSet: 6 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Tue, 02 Apr 2024 14:39:33 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12899: Temporary workaround for BINARY in complex types
Daniel Becker has uploaded a new patch set (#6). ( http://gerrit.cloudera.org:8080/21219 ) Change subject: IMPALA-12899: Temporary workaround for BINARY in complex types .. IMPALA-12899: Temporary workaround for BINARY in complex types The BINARY type is currently not supported inside complex types and a cross-component decision is probably needed to support it (see IMPALA-11491). We would like to enable EXPAND_COMPLEX_TYPES for Iceberg metadata tables (IMPALA-12612), which requires that queries with BINARY inside complex types don't fail. Enabling EXPAND_COMPLEX_TYPES is a more prioritised issue than IMPALA-11491, so we have come up with a temporary solution. This change NULLs out BINARY values in complex types coming from Iceberg metadata tables and logs a warning. BINARYs in complex types from regular tables are not affected by this change. Testing: - Added test queries in iceberg-metadata-tables.test. Change-Id: I0d834126c7d702a25e957bb6071ecbf0fda2c203 --- M be/src/exec/iceberg-metadata/iceberg-row-reader.cc M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/Path.java M fe/src/main/java/org/apache/impala/analysis/SlotRef.java M testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test 5 files changed, 64 insertions(+), 6 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/19/21219/6 -- To view, visit http://gerrit.cloudera.org:8080/21219 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I0d834126c7d702a25e957bb6071ecbf0fda2c203 Gerrit-Change-Number: 21219 Gerrit-PatchSet: 6 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-12899: Temporary workaround for BINARY in complex types
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/21219 ) Change subject: IMPALA-12899: Temporary workaround for BINARY in complex types .. Patch Set 6: (3 comments) http://gerrit.cloudera.org:8080/#/c/21219/4/be/src/exec/iceberg-metadata/iceberg-row-reader.cc File be/src/exec/iceberg-metadata/iceberg-row-reader.cc: http://gerrit.cloudera.org:8080/#/c/21219/4/be/src/exec/iceberg-metadata/iceberg-row-reader.cc@328 PS4, Line 328: return long_cl_; > I understand this will be needed for the permanent solution and not the NUL Done http://gerrit.cloudera.org:8080/#/c/21219/4/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test File testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test: http://gerrit.cloudera.org:8080/#/c/21219/4/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test@813 PS4, Line 813: ntent":0,"file_path":".*/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_quer > not sure about this but for dockerised builds don't the file paths start wi Done. There are also some other parts for which a regex is needed: the exact file names and the size of the delete file. Let's see if this is enough or even more needs to be captured by regexes. http://gerrit.cloudera.org:8080/#/c/21219/4/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test@819 PS4, Line 819: set EXPAND_COMPLEX_TYPES=1; > One possible extra test case is to set the EXPAND_COMPLEX_TYPES true and do Done -- To view, visit http://gerrit.cloudera.org:8080/21219 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0d834126c7d702a25e957bb6071ecbf0fda2c203 Gerrit-Change-Number: 21219 Gerrit-PatchSet: 6 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Tue, 02 Apr 2024 14:29:35 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12611: Add support to MAP type Iceberg Metadata table columns
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/21125 ) Change subject: IMPALA-12611: Add support to MAP type Iceberg Metadata table columns .. Patch Set 10: (4 comments) http://gerrit.cloudera.org:8080/#/c/21125/8/be/src/exec/iceberg-metadata/iceberg-metadata-scanner.cc File be/src/exec/iceberg-metadata/iceberg-metadata-scanner.cc: http://gerrit.cloudera.org:8080/#/c/21125/8/be/src/exec/iceberg-metadata/iceberg-metadata-scanner.cc@156 PS8, Line 156: jobject map_entry; > Shouldn't we release 'map_entry' at the end of this function? Done http://gerrit.cloudera.org:8080/#/c/21125/8/be/src/exec/iceberg-metadata/iceberg-row-reader.cc File be/src/exec/iceberg-metadata/iceberg-row-reader.cc: http://gerrit.cloudera.org:8080/#/c/21125/8/be/src/exec/iceberg-metadata/iceberg-row-reader.cc@270 PS8, Line 270: DeleteLocalRef > Isn't 'collection_scanner' a GlobalRef? We call DeleteLocalRef here so I'm I checked the code and JNI doc again and I think it is actually a local ref and the comment was wrong. http://gerrit.cloudera.org:8080/#/c/21125/8/be/src/exec/iceberg-metadata/iceberg-row-reader.cc@270 PS8, Line 270: env->DeleteLocalRef(collection_scanner); > I think we can leak memory if any of the RETURN_IF_ERROR or RETURN_IF_CANCE I checked the code and JNI doc again and I think it is actually a local ref and the comment was wrong. I updated the comments in iceberg-metadata-scanner.h. If the reference is indeed local, deleting it may not be as important. This is what the doc says about deleting local references (https://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/design.html#:~:text=The%20JNI%20divides%20object%20references,until%20they%20are%20explicitly%20freed): In most cases, the programmer should rely on the VM to free all local references after the native method returns. However, there are times when the programmer should explicitly free a local reference. Consider, for example, the following situations: - A native method accesses a large Java object, thereby creating a local reference to the Java object. The native method then performs additional computation before returning to the caller. The local reference to the large Java object will prevent the object from being garbage collected, even if the object is no longer used in the remainder of the computation. - A native method creates a large number of local references, although not all of them are used at the same time. Since the VM needs a certain amount of space to keep track of a local reference, creating too many local references may cause the system to run out of memory. For example, a native method loops through a large array of objects, retrieves the elements as local references, and operates on one element at each iteration. After each iteration, the programmer no longer needs the local reference to the array element. I think Tamas added the deletes because of the second case. If an error occurs or the query is cancelled we won't create new (local) references, so freeing these local references is not important. If you'd like to I am open to creating a wrapper for these refs though. http://gerrit.cloudera.org:8080/#/c/21125/8/be/src/exec/iceberg-metadata/iceberg-row-reader.cc@288 PS8, Line 288: env->DeleteLocalRef(item); > Same comment about leaking memory Done -- To view, visit http://gerrit.cloudera.org:8080/21125 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8a8b3a574ca45c893315c3b41b33ce4e0eff865a Gerrit-Change-Number: 21125 Gerrit-PatchSet: 10 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 02 Apr 2024 12:36:22 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12611: Add support to MAP type Iceberg Metadata table columns
Daniel Becker has uploaded a new patch set (#10). ( http://gerrit.cloudera.org:8080/21125 ) Change subject: IMPALA-12611: Add support to MAP type Iceberg Metadata table columns .. IMPALA-12611: Add support to MAP type Iceberg Metadata table columns This change adds support for querying MAP types from Iceberg Metadata tables. The 'IcebergMetadataScanner.ArrayScanner' java class is renamed to 'CollectionScanner' and extended to be able to handle maps. For arrays the iteration returns the element as before, for maps it returns 'Map.Entry' objects. Note that collections in the FROM clause are still not supported. Testing: - Added E2E tests in iceberg-metadata-tables.test. Change-Id: I8a8b3a574ca45c893315c3b41b33ce4e0eff865a --- M be/src/exec/iceberg-metadata/iceberg-metadata-scanner.cc M be/src/exec/iceberg-metadata/iceberg-metadata-scanner.h M be/src/exec/iceberg-metadata/iceberg-row-reader.cc M be/src/exec/iceberg-metadata/iceberg-row-reader.h M fe/src/main/java/org/apache/impala/util/IcebergMetadataScanner.java M testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test 6 files changed, 400 insertions(+), 124 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/25/21125/10 -- To view, visit http://gerrit.cloudera.org:8080/21125 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8a8b3a574ca45c893315c3b41b33ce4e0eff865a Gerrit-Change-Number: 21125 Gerrit-PatchSet: 10 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-12611: Add support to MAP type Iceberg Metadata table columns
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/21125 ) Change subject: IMPALA-12611: Add support to MAP type Iceberg Metadata table columns .. Patch Set 8: (1 comment) http://gerrit.cloudera.org:8080/#/c/21125/7/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test File testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test: http://gerrit.cloudera.org:8080/#/c/21125/7/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test@763 PS7, Line 763: select > Removed h.made_current_at and h.snapshot_id from the select list because th It seems that in s.summary, for the 'overwrite' column, "added-files-size" and "total-files-size" in the map vary by builds, trying a regex instead of a concrete value. -- To view, visit http://gerrit.cloudera.org:8080/21125 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8a8b3a574ca45c893315c3b41b33ce4e0eff865a Gerrit-Change-Number: 21125 Gerrit-PatchSet: 8 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 02 Apr 2024 10:50:09 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12611: Add support to MAP type Iceberg Metadata table columns
Daniel Becker has uploaded a new patch set (#8). ( http://gerrit.cloudera.org:8080/21125 ) Change subject: IMPALA-12611: Add support to MAP type Iceberg Metadata table columns .. IMPALA-12611: Add support to MAP type Iceberg Metadata table columns This change adds support for querying MAP types from Iceberg Metadata tables. The 'IcebergMetadataScanner.ArrayScanner' java class is renamed to 'CollectionScanner' and extended to be able to handle maps. For arrays the iteration returns the element as before, for maps it returns 'Map.Entry' objects. Note that collections in the FROM clause are still not supported. Testing: - Added E2E tests in iceberg-metadata-tables.test. Change-Id: I8a8b3a574ca45c893315c3b41b33ce4e0eff865a --- M be/src/exec/iceberg-metadata/iceberg-metadata-scanner.cc M be/src/exec/iceberg-metadata/iceberg-metadata-scanner.h M be/src/exec/iceberg-metadata/iceberg-row-reader.cc M be/src/exec/iceberg-metadata/iceberg-row-reader.h M fe/src/main/java/org/apache/impala/util/IcebergMetadataScanner.java M testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test 6 files changed, 399 insertions(+), 124 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/25/21125/8 -- To view, visit http://gerrit.cloudera.org:8080/21125 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8a8b3a574ca45c893315c3b41b33ce4e0eff865a Gerrit-Change-Number: 21125 Gerrit-PatchSet: 8 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-12609: Implement SHOW METADATA TABLES IN statement to list Iceberg Metadata tables
Daniel Becker has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/21026 ) Change subject: IMPALA-12609: Implement SHOW METADATA TABLES IN statement to list Iceberg Metadata tables .. IMPALA-12609: Implement SHOW METADATA TABLES IN statement to list Iceberg Metadata tables After this change, the new SHOW METADATA TABLES IN statement can be used to list all the available metadata tables of an Iceberg table. Note that in contrast to querying the contents of Iceberg metadata tables, this does not require fully qualified paths, e.g. both SHOW METADATA TABLES IN functional_parquet.iceberg_query_metadata; and USE functional_parquet; SHOW METADATA TABLES IN iceberg_query_metadata; work. The available metadata tables for all Iceberg tables are the same, corresponding to the values of the enum "org.apache.iceberg.MetadataTableType", so there is actually no need to pass the name of the regular table for which the metadata table list is requested through Thrift. This change, however, does send the table name because this way - if we add support for metadata tables for other table formats, the table name/path will be necessary to determine the correct list of metadata tables - we could later add support for different authorisation policies for individual tables - we can check also at the point of generating the list of metadata tables that the table is an Iceberg table Testing: - added and updated tests in ParserTest, AnalyzeDDLTest, ToSqlTest and AuthorizationStmtTest - added a custom cluster test in test_authorization.py - added functional tests in iceberg-metadata-tables.test Change-Id: Ide10ccf10fc0abf5c270119ba7092c67e712ec49 Reviewed-on: http://gerrit.cloudera.org:8080/21026 Tested-by: Impala Public Jenkins Reviewed-by: Zoltan Borok-Nagy --- M be/src/service/client-request-state.cc M be/src/service/frontend.cc M be/src/service/frontend.h M be/src/service/impala-http-handler.cc M common/thrift/Frontend.thrift M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java A fe/src/main/java/org/apache/impala/analysis/ShowMetadataTablesStmt.java M fe/src/main/java/org/apache/impala/analysis/ShowTablesStmt.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/main/java/org/apache/impala/service/JniFrontend.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java M fe/src/test/java/org/apache/impala/authorization/AuthorizationStmtTest.java M testdata/datasets/functional/functional_schema_template.sql M testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test M tests/authorization/test_authorization.py 18 files changed, 476 insertions(+), 59 deletions(-) Approvals: Impala Public Jenkins: Verified Zoltan Borok-Nagy: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/21026 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Ide10ccf10fc0abf5c270119ba7092c67e712ec49 Gerrit-Change-Number: 21026 Gerrit-PatchSet: 16 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-12609: Implement SHOW METADATA TABLES IN statement to list Iceberg Metadata tables
Daniel Becker has removed a vote on this change. Change subject: IMPALA-12609: Implement SHOW METADATA TABLES IN statement to list Iceberg Metadata tables .. Removed Code-Review+2 by Daniel Becker -- To view, visit http://gerrit.cloudera.org:8080/21026 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: deleteVote Gerrit-Change-Id: Ide10ccf10fc0abf5c270119ba7092c67e712ec49 Gerrit-Change-Number: 21026 Gerrit-PatchSet: 15 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-12609: Implement SHOW METADATA TABLES IN statement to list Iceberg Metadata tables
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/21026 ) Change subject: IMPALA-12609: Implement SHOW METADATA TABLES IN statement to list Iceberg Metadata tables .. Patch Set 15: (1 comment) http://gerrit.cloudera.org:8080/#/c/21026/15/fe/src/test/java/org/apache/impala/authorization/AuthorizationStmtTest.java File fe/src/test/java/org/apache/impala/authorization/AuthorizationStmtTest.java: http://gerrit.cloudera.org:8080/#/c/21026/15/fe/src/test/java/org/apache/impala/authorization/AuthorizationStmtTest.java@1258 PS15, Line 1258: functional_parquet Removed the table name because "functional_parquet.iceberg_query_metadata" and "functional_parquet.*.*" varied with builds. -- To view, visit http://gerrit.cloudera.org:8080/21026 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ide10ccf10fc0abf5c270119ba7092c67e712ec49 Gerrit-Change-Number: 21026 Gerrit-PatchSet: 15 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 01 Apr 2024 10:47:26 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12609: Implement SHOW METADATA TABLES IN statement to list Iceberg Metadata tables
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/21026 ) Change subject: IMPALA-12609: Implement SHOW METADATA TABLES IN statement to list Iceberg Metadata tables .. Patch Set 15: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/21026 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ide10ccf10fc0abf5c270119ba7092c67e712ec49 Gerrit-Change-Number: 21026 Gerrit-PatchSet: 15 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 01 Apr 2024 10:40:00 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12611: Add support to MAP type Iceberg Metadata table columns
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/21125 ) Change subject: IMPALA-12611: Add support to MAP type Iceberg Metadata table columns .. Patch Set 7: (1 comment) http://gerrit.cloudera.org:8080/#/c/21125/7/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test File testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test: http://gerrit.cloudera.org:8080/#/c/21125/7/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test@763 PS7, Line 763: select Removed h.made_current_at and h.snapshot_id from the select list because the values change with every data load. -- To view, visit http://gerrit.cloudera.org:8080/21125 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8a8b3a574ca45c893315c3b41b33ce4e0eff865a Gerrit-Change-Number: 21125 Gerrit-PatchSet: 7 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Sun, 31 Mar 2024 22:53:29 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12611: Add support to MAP type Iceberg Metadata table columns
Daniel Becker has uploaded a new patch set (#7). ( http://gerrit.cloudera.org:8080/21125 ) Change subject: IMPALA-12611: Add support to MAP type Iceberg Metadata table columns .. IMPALA-12611: Add support to MAP type Iceberg Metadata table columns This change adds support for querying MAP types from Iceberg Metadata tables. The 'IcebergMetadataScanner.ArrayScanner' java class is renamed to 'CollectionScanner' and extended to be able to handle maps. For arrays the iteration returns the element as before, for maps it returns 'Map.Entry' objects. Note that collections in the FROM clause are still not supported. Testing: - Added E2E tests in iceberg-metadata-tables.test. Change-Id: I8a8b3a574ca45c893315c3b41b33ce4e0eff865a --- M be/src/exec/iceberg-metadata/iceberg-metadata-scanner.cc M be/src/exec/iceberg-metadata/iceberg-metadata-scanner.h M be/src/exec/iceberg-metadata/iceberg-row-reader.cc M be/src/exec/iceberg-metadata/iceberg-row-reader.h M fe/src/main/java/org/apache/impala/util/IcebergMetadataScanner.java M testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test 6 files changed, 399 insertions(+), 124 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/25/21125/7 -- To view, visit http://gerrit.cloudera.org:8080/21125 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8a8b3a574ca45c893315c3b41b33ce4e0eff865a Gerrit-Change-Number: 21125 Gerrit-PatchSet: 7 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-12609: Implement SHOW METADATA TABLES IN statement to list Iceberg Metadata tables
Daniel Becker has uploaded a new patch set (#15). ( http://gerrit.cloudera.org:8080/21026 ) Change subject: IMPALA-12609: Implement SHOW METADATA TABLES IN statement to list Iceberg Metadata tables .. IMPALA-12609: Implement SHOW METADATA TABLES IN statement to list Iceberg Metadata tables After this change, the new SHOW METADATA TABLES IN statement can be used to list all the available metadata tables of an Iceberg table. Note that in contrast to querying the contents of Iceberg metadata tables, this does not require fully qualified paths, e.g. both SHOW METADATA TABLES IN functional_parquet.iceberg_query_metadata; and USE functional_parquet; SHOW METADATA TABLES IN iceberg_query_metadata; work. The available metadata tables for all Iceberg tables are the same, corresponding to the values of the enum "org.apache.iceberg.MetadataTableType", so there is actually no need to pass the name of the regular table for which the metadata table list is requested through Thrift. This change, however, does send the table name because this way - if we add support for metadata tables for other table formats, the table name/path will be necessary to determine the correct list of metadata tables - we could later add support for different authorisation policies for individual tables - we can check also at the point of generating the list of metadata tables that the table is an Iceberg table Testing: - added and updated tests in ParserTest, AnalyzeDDLTest, ToSqlTest and AuthorizationStmtTest - added a custom cluster test in test_authorization.py - added functional tests in iceberg-metadata-tables.test Change-Id: Ide10ccf10fc0abf5c270119ba7092c67e712ec49 --- M be/src/service/client-request-state.cc M be/src/service/frontend.cc M be/src/service/frontend.h M be/src/service/impala-http-handler.cc M common/thrift/Frontend.thrift M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java A fe/src/main/java/org/apache/impala/analysis/ShowMetadataTablesStmt.java M fe/src/main/java/org/apache/impala/analysis/ShowTablesStmt.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/main/java/org/apache/impala/service/JniFrontend.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java M fe/src/test/java/org/apache/impala/authorization/AuthorizationStmtTest.java M testdata/datasets/functional/functional_schema_template.sql M testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test M tests/authorization/test_authorization.py 18 files changed, 476 insertions(+), 59 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/21026/15 -- To view, visit http://gerrit.cloudera.org:8080/21026 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ide10ccf10fc0abf5c270119ba7092c67e712ec49 Gerrit-Change-Number: 21026 Gerrit-PatchSet: 15 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-12609: Implement SHOW METADATA TABLES IN statement to list Iceberg Metadata tables
Daniel Becker has uploaded a new patch set (#13). ( http://gerrit.cloudera.org:8080/21026 ) Change subject: IMPALA-12609: Implement SHOW METADATA TABLES IN statement to list Iceberg Metadata tables .. IMPALA-12609: Implement SHOW METADATA TABLES IN statement to list Iceberg Metadata tables After this change, the new SHOW METADATA TABLES IN statement can be used to list all the available metadata tables of an Iceberg table. Note that in contrast to querying the contents of Iceberg metadata tables, this does not require fully qualified paths, e.g. both SHOW METADATA TABLES IN functional_parquet.iceberg_query_metadata; and USE functional_parquet; SHOW METADATA TABLES IN iceberg_query_metadata; work. The available metadata tables for all Iceberg tables are the same, corresponding to the values of the enum "org.apache.iceberg.MetadataTableType", so there is actually no need to pass the name of the regular table for which the metadata table list is requested through Thrift. This change, however, does send the table name because this way - if we add support for metadata tables for other table formats, the table name/path will be necessary to determine the correct list of metadata tables - we could later add support for different authorisation policies for individual tables - we can check also at the point of generating the list of metadata tables that the table is an Iceberg table Testing: - added and updated tests in ParserTest, AnalyzeDDLTest, ToSqlTest and AuthorizationStmtTest - added a custom cluster test in test_authorization.py - added functional tests in iceberg-metadata-tables.test Change-Id: Ide10ccf10fc0abf5c270119ba7092c67e712ec49 --- M be/src/service/client-request-state.cc M be/src/service/frontend.cc M be/src/service/frontend.h M be/src/service/impala-http-handler.cc M common/thrift/Frontend.thrift M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java A fe/src/main/java/org/apache/impala/analysis/ShowMetadataTablesStmt.java M fe/src/main/java/org/apache/impala/analysis/ShowTablesStmt.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/main/java/org/apache/impala/service/JniFrontend.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java M fe/src/test/java/org/apache/impala/authorization/AuthorizationStmtTest.java M testdata/datasets/functional/functional_schema_template.sql M testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test M tests/authorization/test_authorization.py 18 files changed, 476 insertions(+), 59 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/21026/13 -- To view, visit http://gerrit.cloudera.org:8080/21026 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ide10ccf10fc0abf5c270119ba7092c67e712ec49 Gerrit-Change-Number: 21026 Gerrit-PatchSet: 13 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-12609: Implement SHOW METADATA TABLES IN statement to list Iceberg Metadata tables
Daniel Becker has uploaded a new patch set (#12). ( http://gerrit.cloudera.org:8080/21026 ) Change subject: IMPALA-12609: Implement SHOW METADATA TABLES IN statement to list Iceberg Metadata tables .. IMPALA-12609: Implement SHOW METADATA TABLES IN statement to list Iceberg Metadata tables After this change, the new SHOW METADATA TABLES IN statement can be used to list all the available metadata tables of an Iceberg table. Note that in contrast to querying the contents of Iceberg metadata tables, this does not require fully qualified paths, e.g. both SHOW METADATA TABLES IN functional_parquet.iceberg_query_metadata; and USE functional_parquet; SHOW METADATA TABLES IN iceberg_query_metadata; work. The available metadata tables for all Iceberg tables are the same, corresponding to the values of the enum "org.apache.iceberg.MetadataTableType", so there is actually no need to pass the name of the regular table for which the metadata table list is requested through Thrift. This change, however, does send the table name because this way - if we add support for metadata tables for other table formats, the table name/path will be necessary to determine the correct list of metadata tables - we could later add support for different authorisation policies for individual tables - we can check also at the point of generating the list of metadata tables that the table is an Iceberg table Testing: - added and updated tests in ParserTest, AnalyzeDDLTest, ToSqlTest and AuthorizationStmtTest - added a custom cluster test in test_authorization.py - added functional tests in iceberg-metadata-tables.test Change-Id: Ide10ccf10fc0abf5c270119ba7092c67e712ec49 --- M be/src/service/client-request-state.cc M be/src/service/frontend.cc M be/src/service/frontend.h M be/src/service/impala-http-handler.cc M common/thrift/Frontend.thrift M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java A fe/src/main/java/org/apache/impala/analysis/ShowMetadataTablesStmt.java M fe/src/main/java/org/apache/impala/analysis/ShowTablesStmt.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/main/java/org/apache/impala/service/JniFrontend.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java M fe/src/test/java/org/apache/impala/authorization/AuthorizationStmtTest.java M testdata/datasets/functional/functional_schema_template.sql M testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test M tests/authorization/test_authorization.py 18 files changed, 476 insertions(+), 59 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/21026/12 -- To view, visit http://gerrit.cloudera.org:8080/21026 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ide10ccf10fc0abf5c270119ba7092c67e712ec49 Gerrit-Change-Number: 21026 Gerrit-PatchSet: 12 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-12809: Iceberg metadata table scanner should always be scheduled to the coordinator
Daniel Becker has uploaded a new patch set (#12). ( http://gerrit.cloudera.org:8080/21138 ) Change subject: IMPALA-12809: Iceberg metadata table scanner should always be scheduled to the coordinator .. IMPALA-12809: Iceberg metadata table scanner should always be scheduled to the coordinator On clusters with dedicated coordinators and executors the Iceberg metadata scanner fragment(s) can be scheduled to executors, for example during a join. The fragment in this case will fail a precondition check, because either the 'frontend_' object or the table will not be present. This change forces Iceberg metadata scanner fragments to be scheduled on the coordinator. It is not enough to set the DataPartition type to UNPARTITIONED, because unpartitioned fragments can still be scheduled on executors. This change introduces a new flag in the TPlanFragment thrift struct - if it is true, the fragment is always scheduled on the coordinator. Testing: - Added a regression test in test_coordinators.py. - Added a new planner test with two metadata tables and a regular table joined together. Change-Id: Ib4397f64e9def42d2b84ffd7bc14ff31df27d58e --- M be/src/scheduling/schedule-state.cc M be/src/scheduling/schedule-state.h M be/src/scheduling/scheduler.cc M common/thrift/Planner.thrift M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/PlanFragment.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java A testdata/workloads/functional-planner/queries/PlannerTest/iceberg-metadata-table-joined-with-regular-table.test M tests/custom_cluster/test_coordinators.py 9 files changed, 175 insertions(+), 15 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/38/21138/12 -- To view, visit http://gerrit.cloudera.org:8080/21138 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ib4397f64e9def42d2b84ffd7bc14ff31df27d58e Gerrit-Change-Number: 21138 Gerrit-PatchSet: 12 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-12899: Temporary workaround for BINARY in complex types
Daniel Becker has uploaded this change for review. ( http://gerrit.cloudera.org:8080/21219 Change subject: IMPALA-12899: Temporary workaround for BINARY in complex types .. IMPALA-12899: Temporary workaround for BINARY in complex types The BINARY type is currently not supported inside complex types and a cross-component decision is probably needed to support it (see IMPALA-11491). We would like to enable EXPAND_COMPLEX_TYPES for Iceberg metadata tables (IMPALA-12612), which requires that queries with BINARY inside complex types don't fail. Enabling EXPAND_COMPLEX_TYPES is a more prioritised issue than IMPALA-11491, so we have come up with a temporary solution. This change NULLs out BINARY values in complex types coming from Iceberg metadata tables and logs a warning. BINARYs in complex types from regular tables are not affected by this change. Testing: - Added test queries in iceberg-metadata-tables.test. Change-Id: I0d834126c7d702a25e957bb6071ecbf0fda2c203 --- M be/src/exec/iceberg-metadata/iceberg-row-reader.cc M be/src/exec/iceberg-metadata/iceberg-row-reader.h M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/Path.java M fe/src/main/java/org/apache/impala/analysis/SlotRef.java M testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test 6 files changed, 67 insertions(+), 14 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/19/21219/4 -- To view, visit http://gerrit.cloudera.org:8080/21219 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I0d834126c7d702a25e957bb6071ecbf0fda2c203 Gerrit-Change-Number: 21219 Gerrit-PatchSet: 4 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Gabor Kaszab
[Impala-ASF-CR] IMPALA-12611: Add support to MAP type Iceberg Metadata table columns
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/21125 ) Change subject: IMPALA-12611: Add support to MAP type Iceberg Metadata table columns .. Patch Set 6: (8 comments) http://gerrit.cloudera.org:8080/#/c/21125/4/be/src/exec/iceberg-metadata/iceberg-metadata-scanner.h File be/src/exec/iceberg-metadata/iceberg-metadata-scanner.h: http://gerrit.cloudera.org:8080/#/c/21125/4/be/src/exec/iceberg-metadata/iceberg-metadata-scanner.h@63 PS4, Line 63: /// Note that it returns a GlobalRef, that has to be released explicitly. > Probably I get this comment wrong, but I was searching for some code that r It is used in IcebergRowReader::WriteCollectionSlot(), it is freed at the end of the function: env->DeleteLocalRef(collection_scanner); http://gerrit.cloudera.org:8080/#/c/21125/3/be/src/exec/iceberg-metadata/iceberg-row-reader.cc File be/src/exec/iceberg-metadata/iceberg-row-reader.cc: http://gerrit.cloudera.org:8080/#/c/21125/3/be/src/exec/iceberg-metadata/iceberg-row-reader.cc@123 PS3, Line 123: // Skip the unsupported type and set it to NULL > Does this set Binary cols NULL now? It shouldn't, in the backend BINARY fields have TYPE_STRING and a separete boolean variable that determines whether it is actually a STRING or a BINARY. See https://github.com/apache/impala/blob/73171cb7164573349bd53a996a51bb7058b778e0/be/src/runtime/types.h#L216. On the other hand, I don't know where top-level BINARY fields are NULLed for metadata tables. http://gerrit.cloudera.org:8080/#/c/21125/4/be/src/exec/iceberg-metadata/iceberg-row-reader.cc File be/src/exec/iceberg-metadata/iceberg-row-reader.cc: http://gerrit.cloudera.org:8080/#/c/21125/4/be/src/exec/iceberg-metadata/iceberg-row-reader.cc@125 PS4, Line 125: VLOG(3) << "Skipping unsupported column type: " << slot_desc->type().type; > I think this 'accessed_value' is allocated one level above in the call chai Done. Added the deallocation to the other caller, WriteArrayItem(), too. http://gerrit.cloudera.org:8080/#/c/21125/4/be/src/exec/iceberg-metadata/iceberg-row-reader.cc@212 PS4, Line 212: if constexpr (IS_ARRAY) { > Let's consider moving these DCHECKs into the same IF-ELSE at L228-234. If we do that and this function is erroneously called with for example an INT, the DCHECK that fires will be the one added for "item_tuple_desc != nullptr", not the one about the type. I think type info would help more with debugging than info about 'item_tuple_desc'. Also, this is a more basic precondition of the function. http://gerrit.cloudera.org:8080/#/c/21125/4/be/src/exec/iceberg-metadata/iceberg-row-reader.cc@220 PS4, Line 220: const TupleDescriptor* item_tuple_desc = slot_desc->children_tuple_descriptor(); > nit: DCHECK item_tuple_desc is not null? Done http://gerrit.cloudera.org:8080/#/c/21125/4/be/src/exec/iceberg-metadata/iceberg-row-reader.cc@236 PS4, Line 236: > name is misleading now that this is not just for arrays. remaining_items? Done http://gerrit.cloudera.org:8080/#/c/21125/4/be/src/exec/iceberg-metadata/iceberg-row-reader.cc@297 PS4, Line 297: jobject value; : RETURN_IF_ERROR(met > Can the key of a map be a struct? Done, added a DCHECK instead. http://gerrit.cloudera.org:8080/#/c/21125/3/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test File testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test: http://gerrit.cloudera.org:8080/#/c/21125/3/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test@761 PS3, Line 761: > Can't we do a map_col.KEY and map_col.VALUE for maps? Would be nice to have We can't, to access KEY and VALUE we have to cross-unnest the collection, and it is not supported yet. See https://issues.apache.org/jira/browse/IMPALA-12853. -- To view, visit http://gerrit.cloudera.org:8080/21125 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8a8b3a574ca45c893315c3b41b33ce4e0eff865a Gerrit-Change-Number: 21125 Gerrit-PatchSet: 6 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 28 Mar 2024 15:47:36 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12611: Add support to MAP type Iceberg Metadata table columns
Daniel Becker has uploaded a new patch set (#6). ( http://gerrit.cloudera.org:8080/21125 ) Change subject: IMPALA-12611: Add support to MAP type Iceberg Metadata table columns .. IMPALA-12611: Add support to MAP type Iceberg Metadata table columns This change adds support for querying MAP types from Iceberg Metadata tables. The 'IcebergMetadataScanner.ArrayScanner' java class is renamed to 'CollectionScanner' and extended to be able to handle maps. For arrays the iteration returns the element as before, for maps it returns 'Map.Entry' objects. Note that collections in the FROM clause are still not supported. Testing: - Added E2E tests in iceberg-metadata-tables.test. Change-Id: I8a8b3a574ca45c893315c3b41b33ce4e0eff865a --- M be/src/exec/iceberg-metadata/iceberg-metadata-scanner.cc M be/src/exec/iceberg-metadata/iceberg-metadata-scanner.h M be/src/exec/iceberg-metadata/iceberg-row-reader.cc M be/src/exec/iceberg-metadata/iceberg-row-reader.h M fe/src/main/java/org/apache/impala/util/IcebergMetadataScanner.java M testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test 6 files changed, 401 insertions(+), 124 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/25/21125/6 -- To view, visit http://gerrit.cloudera.org:8080/21125 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8a8b3a574ca45c893315c3b41b33ce4e0eff865a Gerrit-Change-Number: 21125 Gerrit-PatchSet: 6 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-12609: Implement SHOW METADATA TABLES IN statement to list Iceberg Metadata tables
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/21026 ) Change subject: IMPALA-12609: Implement SHOW METADATA TABLES IN statement to list Iceberg Metadata tables .. Patch Set 11: (1 comment) http://gerrit.cloudera.org:8080/#/c/21026/11/fe/src/main/java/org/apache/impala/service/JniFrontend.java File fe/src/main/java/org/apache/impala/service/JniFrontend.java: http://gerrit.cloudera.org:8080/#/c/21026/11/fe/src/main/java/org/apache/impala/service/JniFrontend.java@279 PS11, Line 279: params.getSession() > Doesn't it return null anyway if the session is not set? I'm not sure how Thrift works with Java... If it did, would it need an isSet...() function? I didn't want to risk getting some garbage value. -- To view, visit http://gerrit.cloudera.org:8080/21026 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ide10ccf10fc0abf5c270119ba7092c67e712ec49 Gerrit-Change-Number: 21026 Gerrit-PatchSet: 11 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 28 Mar 2024 15:45:07 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12611: Add support to MAP type Iceberg Metadata table columns
Daniel Becker has uploaded a new patch set (#5). ( http://gerrit.cloudera.org:8080/21125 ) Change subject: IMPALA-12611: Add support to MAP type Iceberg Metadata table columns .. IMPALA-12611: Add support to MAP type Iceberg Metadata table columns This change adds support for querying MAP types from Iceberg Metadata tables. The 'IcebergMetadataScanner.ArrayScanner' java class is renamed to 'CollectionScanner' and extended to be able to handle maps. For arrays the iteration returns the element as before, for maps it returns 'Map.Entry' objects. Note that collections in the FROM clause are still not supported. Testing: - Added E2E tests in iceberg-metadata-tables.test. Change-Id: I8a8b3a574ca45c893315c3b41b33ce4e0eff865a --- M be/src/exec/iceberg-metadata/iceberg-metadata-scanner.cc M be/src/exec/iceberg-metadata/iceberg-metadata-scanner.h M be/src/exec/iceberg-metadata/iceberg-row-reader.cc M be/src/exec/iceberg-metadata/iceberg-row-reader.h M fe/src/main/java/org/apache/impala/util/IcebergMetadataScanner.java M testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test 6 files changed, 395 insertions(+), 123 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/25/21125/5 -- To view, visit http://gerrit.cloudera.org:8080/21125 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8a8b3a574ca45c893315c3b41b33ce4e0eff865a Gerrit-Change-Number: 21125 Gerrit-PatchSet: 5 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Zoltan Borok-Nagy