[Impala-ASF-CR] IMPALA-13108: Update version to 4.5.0-SNAPSHOT
Zoltan Borok-Nagy has uploaded this change for review. ( http://gerrit.cloudera.org:8080/21460 Change subject: IMPALA-13108: Update version to 4.5.0-SNAPSHOT .. IMPALA-13108: Update version to 4.5.0-SNAPSHOT Updated IMPALA_VERSION in impala-config.sh Executed the followings for Java: cd java mvn versions:set -DnewVersion=4.5.0-SNAPSHOT Change-Id: Ie7803fe523406dbdd1ac066a35bb31d21765a244 --- M bin/impala-config.sh M fe/pom.xml M java/TableFlattener/pom.xml M java/calcite-planner/pom.xml M java/datagenerator/pom.xml M java/executor-deps/pom.xml M java/ext-data-source/api/pom.xml M java/ext-data-source/jdbc/pom.xml M java/ext-data-source/pom.xml M java/ext-data-source/sample/pom.xml M java/ext-data-source/test/pom.xml M java/external-frontend/pom.xml M java/pom.xml M java/query-event-hook-api/pom.xml M java/shaded-deps/hive-exec/pom.xml M java/shaded-deps/s3a-aws-sdk/pom.xml M java/test-corrupt-hive-udfs/pom.xml M java/test-hive-udfs/pom.xml M java/yarn-extras/pom.xml 19 files changed, 22 insertions(+), 22 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/60/21460/1 -- To view, visit http://gerrit.cloudera.org:8080/21460 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ie7803fe523406dbdd1ac066a35bb31d21765a244 Gerrit-Change-Number: 21460 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy
[Impala-ASF-CR](asf-site) Add documentation, update links for 4.4.0
Zoltan Borok-Nagy has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/21311 ) Change subject: Add documentation, update links for 4.4.0 .. Add documentation, update links for 4.4.0 Change-Id: Ibb93f7ba80b7a065ea83660fc75be9b065138ad9 Reviewed-on: http://gerrit.cloudera.org:8080/21311 Reviewed-by: Zoltan Borok-Nagy Tested-by: Zoltan Borok-Nagy --- M docs/build/asf-site-html/index.html M docs/build/asf-site-html/shared/ImpalaVariables.html M docs/build/asf-site-html/shared/impala_common.html M docs/build/asf-site-html/topics/impala_abort_on_error.html M docs/build/asf-site-html/topics/impala_adls.html M docs/build/asf-site-html/topics/impala_admin.html M docs/build/asf-site-html/topics/impala_admission.html M docs/build/asf-site-html/topics/impala_admission_config.html M docs/build/asf-site-html/topics/impala_aggregate_functions.html M docs/build/asf-site-html/topics/impala_aliases.html M docs/build/asf-site-html/topics/impala_allow_erasure_coded_files.html M docs/build/asf-site-html/topics/impala_allow_unsupported_formats.html M docs/build/asf-site-html/topics/impala_alter_database.html M docs/build/asf-site-html/topics/impala_alter_table.html M docs/build/asf-site-html/topics/impala_alter_view.html M docs/build/asf-site-html/topics/impala_analytic_functions.html M docs/build/asf-site-html/topics/impala_appx_count_distinct.html M docs/build/asf-site-html/topics/impala_appx_median.html M docs/build/asf-site-html/topics/impala_array.html M docs/build/asf-site-html/topics/impala_auditing.html M docs/build/asf-site-html/topics/impala_authentication.html M docs/build/asf-site-html/topics/impala_authorization.html M docs/build/asf-site-html/topics/impala_avg.html M docs/build/asf-site-html/topics/impala_avro.html M docs/build/asf-site-html/topics/impala_batch_size.html M docs/build/asf-site-html/topics/impala_bigint.html M docs/build/asf-site-html/topics/impala_bit_functions.html M docs/build/asf-site-html/topics/impala_boolean.html M docs/build/asf-site-html/topics/impala_breakpad.html M docs/build/asf-site-html/topics/impala_broadcast_bytes_limit.html M docs/build/asf-site-html/topics/impala_buffer_pool_limit.html M docs/build/asf-site-html/topics/impala_char.html M docs/build/asf-site-html/topics/impala_client.html M docs/build/asf-site-html/topics/impala_comment.html M docs/build/asf-site-html/topics/impala_comments.html M docs/build/asf-site-html/topics/impala_complex_types.html M docs/build/asf-site-html/topics/impala_components.html M docs/build/asf-site-html/topics/impala_compression_codec.html M docs/build/asf-site-html/topics/impala_compute_stats.html M docs/build/asf-site-html/topics/impala_compute_stats_min_sample_size.html M docs/build/asf-site-html/topics/impala_concepts.html M docs/build/asf-site-html/topics/impala_conditional_functions.html M docs/build/asf-site-html/topics/impala_config.html M docs/build/asf-site-html/topics/impala_config_options.html M docs/build/asf-site-html/topics/impala_config_performance.html M docs/build/asf-site-html/topics/impala_connecting.html M docs/build/asf-site-html/topics/impala_conversion_functions.html M docs/build/asf-site-html/topics/impala_count.html M docs/build/asf-site-html/topics/impala_create_database.html M docs/build/asf-site-html/topics/impala_create_function.html M docs/build/asf-site-html/topics/impala_create_role.html M docs/build/asf-site-html/topics/impala_create_table.html M docs/build/asf-site-html/topics/impala_create_view.html M docs/build/asf-site-html/topics/impala_custom_timezones.html M docs/build/asf-site-html/topics/impala_data_cache.html M docs/build/asf-site-html/topics/impala_databases.html M docs/build/asf-site-html/topics/impala_datatypes.html M docs/build/asf-site-html/topics/impala_date.html M docs/build/asf-site-html/topics/impala_datetime_functions.html M docs/build/asf-site-html/topics/impala_ddl.html M docs/build/asf-site-html/topics/impala_debug_action.html M docs/build/asf-site-html/topics/impala_decimal.html M docs/build/asf-site-html/topics/impala_decimal_v2.html M docs/build/asf-site-html/topics/impala_dedicated_coordinator.html M docs/build/asf-site-html/topics/impala_default_file_format.html M docs/build/asf-site-html/topics/impala_default_hints_insert_statement.html M docs/build/asf-site-html/topics/impala_default_join_distribution_mode.html M docs/build/asf-site-html/topics/impala_default_spillable_buffer_size.html M docs/build/asf-site-html/topics/impala_default_transactional_type.html M docs/build/asf-site-html/topics/impala_delegation.html M docs/build/asf-site-html/topics/impala_delete.html M docs/build/asf-site-html/topics/impala_delete_stats_in_truncate.html M docs/build/asf-site-html/topics/impala_describe.html M docs/build/asf-site-html/topics/impala_development.html M docs/build/asf-site-html/topics/impala_disable_codegen.html M docs/build/asf-site-html/topics/impala_disable_codegen_rows_threshold.html M
[Impala-ASF-CR](asf-site) Update download links for release 4.4.0
Zoltan Borok-Nagy has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/21307 ) Change subject: Update download links for release 4.4.0 .. Update download links for release 4.4.0 Change-Id: Ie0e8736154e5289e02d5ec5cf5f664cd4de2739d Reviewed-on: http://gerrit.cloudera.org:8080/21307 Reviewed-by: Laszlo Gaal Tested-by: Zoltan Borok-Nagy --- M downloads.html 1 file changed, 13 insertions(+), 4 deletions(-) Approvals: Laszlo Gaal: Looks good to me, approved Zoltan Borok-Nagy: Verified -- To view, visit http://gerrit.cloudera.org:8080/21307 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: asf-site Gerrit-MessageType: merged Gerrit-Change-Id: Ie0e8736154e5289e02d5ec5cf5f664cd4de2739d Gerrit-Change-Number: 21307 Gerrit-PatchSet: 2 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR](asf-site) Add documentation, update links for 4.4.0
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/21311 ) Change subject: Add documentation, update links for 4.4.0 .. Patch Set 4: Verified+1 Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/21311 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: asf-site Gerrit-MessageType: comment Gerrit-Change-Id: Ibb93f7ba80b7a065ea83660fc75be9b065138ad9 Gerrit-Change-Number: 21311 Gerrit-PatchSet: 4 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Sat, 25 May 2024 08:03:01 + Gerrit-HasComments: No
[Impala-ASF-CR](asf-site) Update download links for release 4.4.0
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/21307 ) Change subject: Update download links for release 4.4.0 .. Patch Set 1: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/21307 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: asf-site Gerrit-MessageType: comment Gerrit-Change-Id: Ie0e8736154e5289e02d5ec5cf5f664cd4de2739d Gerrit-Change-Number: 21307 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Sat, 25 May 2024 08:02:53 + Gerrit-HasComments: No
[Impala-ASF-CR](asf-site) Add documentation, update links for 4.4.0
Zoltan Borok-Nagy has removed a vote on this change. Change subject: Add documentation, update links for 4.4.0 .. Removed Verified-1 by Impala Public Jenkins -- To view, visit http://gerrit.cloudera.org:8080/21311 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: asf-site Gerrit-MessageType: deleteVote Gerrit-Change-Id: Ibb93f7ba80b7a065ea83660fc75be9b065138ad9 Gerrit-Change-Number: 21311 Gerrit-PatchSet: 4 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR](asf-site) Add documentation, update links for 4.4.0
Hello Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21311 to look at the new patch set (#4). Change subject: Add documentation, update links for 4.4.0 .. Add documentation, update links for 4.4.0 Change-Id: Ibb93f7ba80b7a065ea83660fc75be9b065138ad9 --- M docs/build/asf-site-html/index.html M docs/build/asf-site-html/shared/ImpalaVariables.html M docs/build/asf-site-html/shared/impala_common.html M docs/build/asf-site-html/topics/impala_abort_on_error.html M docs/build/asf-site-html/topics/impala_adls.html M docs/build/asf-site-html/topics/impala_admin.html M docs/build/asf-site-html/topics/impala_admission.html M docs/build/asf-site-html/topics/impala_admission_config.html M docs/build/asf-site-html/topics/impala_aggregate_functions.html M docs/build/asf-site-html/topics/impala_aliases.html M docs/build/asf-site-html/topics/impala_allow_erasure_coded_files.html M docs/build/asf-site-html/topics/impala_allow_unsupported_formats.html M docs/build/asf-site-html/topics/impala_alter_database.html M docs/build/asf-site-html/topics/impala_alter_table.html M docs/build/asf-site-html/topics/impala_alter_view.html M docs/build/asf-site-html/topics/impala_analytic_functions.html M docs/build/asf-site-html/topics/impala_appx_count_distinct.html M docs/build/asf-site-html/topics/impala_appx_median.html M docs/build/asf-site-html/topics/impala_array.html M docs/build/asf-site-html/topics/impala_auditing.html M docs/build/asf-site-html/topics/impala_authentication.html M docs/build/asf-site-html/topics/impala_authorization.html M docs/build/asf-site-html/topics/impala_avg.html M docs/build/asf-site-html/topics/impala_avro.html M docs/build/asf-site-html/topics/impala_batch_size.html M docs/build/asf-site-html/topics/impala_bigint.html M docs/build/asf-site-html/topics/impala_bit_functions.html M docs/build/asf-site-html/topics/impala_boolean.html M docs/build/asf-site-html/topics/impala_breakpad.html M docs/build/asf-site-html/topics/impala_broadcast_bytes_limit.html M docs/build/asf-site-html/topics/impala_buffer_pool_limit.html M docs/build/asf-site-html/topics/impala_char.html M docs/build/asf-site-html/topics/impala_client.html M docs/build/asf-site-html/topics/impala_comment.html M docs/build/asf-site-html/topics/impala_comments.html M docs/build/asf-site-html/topics/impala_complex_types.html M docs/build/asf-site-html/topics/impala_components.html M docs/build/asf-site-html/topics/impala_compression_codec.html M docs/build/asf-site-html/topics/impala_compute_stats.html M docs/build/asf-site-html/topics/impala_compute_stats_min_sample_size.html M docs/build/asf-site-html/topics/impala_concepts.html M docs/build/asf-site-html/topics/impala_conditional_functions.html M docs/build/asf-site-html/topics/impala_config.html M docs/build/asf-site-html/topics/impala_config_options.html M docs/build/asf-site-html/topics/impala_config_performance.html M docs/build/asf-site-html/topics/impala_connecting.html M docs/build/asf-site-html/topics/impala_conversion_functions.html M docs/build/asf-site-html/topics/impala_count.html M docs/build/asf-site-html/topics/impala_create_database.html M docs/build/asf-site-html/topics/impala_create_function.html M docs/build/asf-site-html/topics/impala_create_role.html M docs/build/asf-site-html/topics/impala_create_table.html M docs/build/asf-site-html/topics/impala_create_view.html M docs/build/asf-site-html/topics/impala_custom_timezones.html M docs/build/asf-site-html/topics/impala_data_cache.html M docs/build/asf-site-html/topics/impala_databases.html M docs/build/asf-site-html/topics/impala_datatypes.html M docs/build/asf-site-html/topics/impala_date.html M docs/build/asf-site-html/topics/impala_datetime_functions.html M docs/build/asf-site-html/topics/impala_ddl.html M docs/build/asf-site-html/topics/impala_debug_action.html M docs/build/asf-site-html/topics/impala_decimal.html M docs/build/asf-site-html/topics/impala_decimal_v2.html M docs/build/asf-site-html/topics/impala_dedicated_coordinator.html M docs/build/asf-site-html/topics/impala_default_file_format.html M docs/build/asf-site-html/topics/impala_default_hints_insert_statement.html M docs/build/asf-site-html/topics/impala_default_join_distribution_mode.html M docs/build/asf-site-html/topics/impala_default_spillable_buffer_size.html M docs/build/asf-site-html/topics/impala_default_transactional_type.html M docs/build/asf-site-html/topics/impala_delegation.html M docs/build/asf-site-html/topics/impala_delete.html M docs/build/asf-site-html/topics/impala_delete_stats_in_truncate.html M docs/build/asf-site-html/topics/impala_describe.html M docs/build/asf-site-html/topics/impala_development.html M docs/build/asf-site-html/topics/impala_disable_codegen.html M docs/build/asf-site-html/topics/impala_disable_codegen_rows_threshold.html M
[Impala-ASF-CR] IMPALA-13088: (part 2) Parallelize final sorts in IcebergDeleteBuilder
Hello Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21452 to look at the new patch set (#2). Change subject: IMPALA-13088: (part 2) Parallelize final sorts in IcebergDeleteBuilder .. IMPALA-13088: (part 2) Parallelize final sorts in IcebergDeleteBuilder With this patch IcebergDeleteBuilder checks how many probe threads are actually blocked on the builder. Let's assume the following plan: UNION ALL / \ / \ / \ SCAN allANTI JOIN datafiles / \ without / \ deletes SCAN SCAN datafilesdeletes with deletes In that case UNION ALL, and the two "SCAN datafiles" operators are in the same fragment, while the builder of the ANTI JOIN is in a different fragment. This means that "SCAN datafiles without deletes" can run in parallel with the builder. But once that SCAN is exhausted, the UNION ALL will drain rows from "SCAN datafiles with deletes" via the ANTI JOIN operator, but that operator depends on the join builder output. This means in some cases the SCAN fragments are busy, while in other cases the SCAN fragments are blocked. It depends on how much work they need to do, and how much work the build-side needs to do. So to handle all cases, we dynamically check how many build fragments are blocked on the builder, then spin up as many threads to parellelize the final sort. The also works well when we have the following plan: ANTI JOIN / \ / \ SCAN SCAN datafilesdeletes with deletes The above plan is created when all data files have corresponding deletes, or when we are running a simple count(*) query. In that case all "SCAN datafiles" fragments are blocked on the builder, so we can use that many threads to sort the build results. A new field "ThreadCountInFinalBuild" was added, so we can check the query profile about how many threads were used for the final sorting in the builders. Measurements: In a table with 1 Trillion data records and 68.5 Billion delete records it lowered "IcebergDeletePositionSortTimer" from ~1 minute to 8-10 seconds, in an environment with 40 executors and MT_DOP=12. TODO: * e2e tests that check counter "ThreadCountInFinalBuild" Change-Id: I7ca946a452d061238255e9b0e2c81a51cac68807 --- M be/src/exec/iceberg-delete-builder.cc M be/src/exec/iceberg-delete-builder.h M be/src/exec/join-builder.cc M be/src/exec/join-builder.h 4 files changed, 105 insertions(+), 24 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/52/21452/2 -- To view, visit http://gerrit.cloudera.org:8080/21452 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7ca946a452d061238255e9b0e2c81a51cac68807 Gerrit-Change-Number: 21452 Gerrit-PatchSet: 2 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder
Hello Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21435 to look at the new patch set (#2). Change subject: IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder .. IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder When there are lots of delete records the IcebergDeleteBuilder can become a bottleneck. Since the left side of the JOIN is blocked on the build side any improvement we make here significantly improves Iceberg V2 table scanning. Improvements of this patch: * Use a vector of vectors to collect the position delete records. This way we can avoid large re-allocations and copyings. * Insert large ranges from the build batches into the collected delete records instead of doing it one-by-one. Measurements Local measurement with 824 Million position delete records: JOIN BUILD: ~32s -> ~14s (6s is the final sorting) 40-node cluster with 68.5 Billion position delete records: JOIN BUILD: 4m15s -> 1m45s (1m7s is the final sorting) Parallelization of the final sort will be added in a follow-up CR. Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf (cherry picked from commit d08315fe5c57ccb5b197cd196b62eeedf7d90ec3) --- M be/src/exec/iceberg-delete-builder.cc M be/src/exec/iceberg-delete-builder.h 2 files changed, 101 insertions(+), 22 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/35/21435/2 -- To view, visit http://gerrit.cloudera.org:8080/21435 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf Gerrit-Change-Number: 21435 Gerrit-PatchSet: 2 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-13088: (part 2) Parallelize final sorts in IcebergDeleteBuilder
Zoltan Borok-Nagy has uploaded this change for review. ( http://gerrit.cloudera.org:8080/21452 Change subject: IMPALA-13088: (part 2) Parallelize final sorts in IcebergDeleteBuilder .. IMPALA-13088: (part 2) Parallelize final sorts in IcebergDeleteBuilder With this patch IcebergDeleteBuilder checks how many probe threads are actually blocked on the builder. Let's assume the following plan: UNION ALL / \ / \ / \ SCAN allANTI JOIN datafiles / \ without / \ deletes SCAN SCAN datafilesdeletes with deletes In that case UNION ALL, and the two "SCAN datafiles" operators are in the same fragment, while the builder of the ANTI JOIN is in a different fragment. This means that "SCAN datafiles without deletes" can run in parallel with the builder. But once that SCAN is exhausted, the UNION ALL will drain rows from "SCAN datafiles with deletes" via the ANTI JOIN operator, but that operator depends on the join builder output. This means in some cases the SCAN fragments are busy, while in other cases the SCAN fragments are blocked. It depends on how much work they need to do, and how much work the build-side needs to do. So to handle all cases, we dynamically check how many build fragments are blocked on the builder, then spin up as many threads to parellelize the final sort. The also works well when we have the following plan: ANTI JOIN / \ / \ SCAN SCAN datafilesdeletes with deletes The above plan is created when all data files have corresponding deletes, or when we are running a simple count(*) query. In that case all "SCAN datafiles" fragments are blocked on the builder, so we can use that many threads to sort the build results. A new field "ThreadCountInFinalBuild" was added, so we can check the query profile about how many threads were used for the final sorting in the builders. Measurements: In a table with 1 Trillion data records and 68.5 Billion delete records it lowered "IcebergDeletePositionSortTimer" from ~1 minute to 8-10 seconds, in an environment with 40 executors and MT_DOP=12. TODO: * e2e tests that check counter "ThreadCountInFinalBuild" Change-Id: I7ca946a452d061238255e9b0e2c81a51cac68807 --- M be/src/exec/iceberg-delete-builder.cc M be/src/exec/iceberg-delete-builder.h M be/src/exec/join-builder.cc M be/src/exec/join-builder.h 4 files changed, 102 insertions(+), 24 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/52/21452/1 -- To view, visit http://gerrit.cloudera.org:8080/21452 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I7ca946a452d061238255e9b0e2c81a51cac68807 Gerrit-Change-Number: 21452 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder
Zoltan Borok-Nagy has removed a vote on this change. Change subject: IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder .. Removed Verified-1 by Impala Public Jenkins -- To view, visit http://gerrit.cloudera.org:8080/21435 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: deleteVote Gerrit-Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf Gerrit-Change-Number: 21435 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder
Zoltan Borok-Nagy has uploaded this change for review. ( http://gerrit.cloudera.org:8080/21435 Change subject: IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder .. IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder When there are lots of delete records the IcebergDeleteBuilder can become a bottleneck. Since the left side of the JOIN is blocked on the build side any improvement we make here significantly improves Iceberg V2 table scanning. Improvements of this patch: * Use a vector of vectors to collect the position delete records. This way we can avoid large re-allocations and copyings. * Insert large ranges from the build batches into the collected delete records instead of doing it one-by-one. Measurements Local measurement with 824 Million position delete records: JOIN BUILD: ~32s -> ~14s (6s is the final sorting) 40-node cluster with 68.5 Billion position delete records: JOIN BUILD: 4m15s -> 1m45s (1m7s is the final sorting) Parallelization of the final sort will be added in a follow-up CR. Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf --- M be/src/exec/iceberg-delete-builder.cc M be/src/exec/iceberg-delete-builder.h 2 files changed, 101 insertions(+), 22 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/35/21435/1 -- To view, visit http://gerrit.cloudera.org:8080/21435 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf Gerrit-Change-Number: 21435 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-13029: Tests for multi format equality deletes
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/21348 ) Change subject: IMPALA-13029: Tests for multi format equality deletes .. Patch Set 4: Code-Review+2 Thanks for modifying the tests, LGTM! -- To view, visit http://gerrit.cloudera.org:8080/21348 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7f0ebf7f4d401877741eb3e1c990f1318ac2b4ba Gerrit-Change-Number: 21348 Gerrit-PatchSet: 4 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 09 May 2024 12:40:54 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-13029: Tests for multi format equality deletes
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/21348 ) Change subject: IMPALA-13029: Tests for multi format equality deletes .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/21348/2/testdata/data/README File testdata/data/README: http://gerrit.cloudera.org:8080/#/c/21348/2/testdata/data/README@1212 PS2, Line 1212: 5) Manually change identifier-field-ids from [1] to [1,2] : 6) Delete rows with Nifi (i=1,j=11), (i=4,j=44) This means even if the original table's Avro schema is used, the eq-delete files are still getting processed correctly, as the eq-delete file schema is a subset of the original schema, same columns, same positions. Would it be possible to only have [2] in the identifier list? And maybe make 'j' a STRING column? -- To view, visit http://gerrit.cloudera.org:8080/21348 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7f0ebf7f4d401877741eb3e1c990f1318ac2b4ba Gerrit-Change-Number: 21348 Gerrit-PatchSet: 2 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 25 Apr 2024 13:04:36 + Gerrit-HasComments: Yes
[Impala-ASF-CR](asf-site) Add documentation, update links for 4.4.0
Hello Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21311 to look at the new patch set (#3). Change subject: Add documentation, update links for 4.4.0 .. Add documentation, update links for 4.4.0 Change-Id: Ibb93f7ba80b7a065ea83660fc75be9b065138ad9 --- M docs/build/asf-site-html/index.html M docs/build/asf-site-html/shared/ImpalaVariables.html M docs/build/asf-site-html/shared/impala_common.html M docs/build/asf-site-html/topics/impala_abort_on_error.html M docs/build/asf-site-html/topics/impala_adls.html M docs/build/asf-site-html/topics/impala_admin.html M docs/build/asf-site-html/topics/impala_admission.html M docs/build/asf-site-html/topics/impala_admission_config.html M docs/build/asf-site-html/topics/impala_aggregate_functions.html M docs/build/asf-site-html/topics/impala_aliases.html M docs/build/asf-site-html/topics/impala_allow_erasure_coded_files.html M docs/build/asf-site-html/topics/impala_allow_unsupported_formats.html M docs/build/asf-site-html/topics/impala_alter_database.html M docs/build/asf-site-html/topics/impala_alter_table.html M docs/build/asf-site-html/topics/impala_alter_view.html M docs/build/asf-site-html/topics/impala_analytic_functions.html M docs/build/asf-site-html/topics/impala_appx_count_distinct.html M docs/build/asf-site-html/topics/impala_appx_median.html M docs/build/asf-site-html/topics/impala_array.html M docs/build/asf-site-html/topics/impala_auditing.html M docs/build/asf-site-html/topics/impala_authentication.html M docs/build/asf-site-html/topics/impala_authorization.html M docs/build/asf-site-html/topics/impala_avg.html M docs/build/asf-site-html/topics/impala_avro.html M docs/build/asf-site-html/topics/impala_batch_size.html M docs/build/asf-site-html/topics/impala_bigint.html M docs/build/asf-site-html/topics/impala_bit_functions.html M docs/build/asf-site-html/topics/impala_boolean.html M docs/build/asf-site-html/topics/impala_breakpad.html M docs/build/asf-site-html/topics/impala_broadcast_bytes_limit.html M docs/build/asf-site-html/topics/impala_buffer_pool_limit.html M docs/build/asf-site-html/topics/impala_char.html M docs/build/asf-site-html/topics/impala_client.html M docs/build/asf-site-html/topics/impala_comment.html M docs/build/asf-site-html/topics/impala_comments.html M docs/build/asf-site-html/topics/impala_complex_types.html M docs/build/asf-site-html/topics/impala_components.html M docs/build/asf-site-html/topics/impala_compression_codec.html M docs/build/asf-site-html/topics/impala_compute_stats.html M docs/build/asf-site-html/topics/impala_compute_stats_min_sample_size.html M docs/build/asf-site-html/topics/impala_concepts.html M docs/build/asf-site-html/topics/impala_conditional_functions.html M docs/build/asf-site-html/topics/impala_config.html M docs/build/asf-site-html/topics/impala_config_options.html M docs/build/asf-site-html/topics/impala_config_performance.html M docs/build/asf-site-html/topics/impala_connecting.html M docs/build/asf-site-html/topics/impala_conversion_functions.html M docs/build/asf-site-html/topics/impala_count.html M docs/build/asf-site-html/topics/impala_create_database.html M docs/build/asf-site-html/topics/impala_create_function.html M docs/build/asf-site-html/topics/impala_create_role.html M docs/build/asf-site-html/topics/impala_create_table.html M docs/build/asf-site-html/topics/impala_create_view.html M docs/build/asf-site-html/topics/impala_custom_timezones.html M docs/build/asf-site-html/topics/impala_data_cache.html M docs/build/asf-site-html/topics/impala_databases.html M docs/build/asf-site-html/topics/impala_datatypes.html M docs/build/asf-site-html/topics/impala_date.html M docs/build/asf-site-html/topics/impala_datetime_functions.html M docs/build/asf-site-html/topics/impala_ddl.html M docs/build/asf-site-html/topics/impala_debug_action.html M docs/build/asf-site-html/topics/impala_decimal.html M docs/build/asf-site-html/topics/impala_decimal_v2.html M docs/build/asf-site-html/topics/impala_dedicated_coordinator.html M docs/build/asf-site-html/topics/impala_default_file_format.html M docs/build/asf-site-html/topics/impala_default_hints_insert_statement.html M docs/build/asf-site-html/topics/impala_default_join_distribution_mode.html M docs/build/asf-site-html/topics/impala_default_spillable_buffer_size.html M docs/build/asf-site-html/topics/impala_default_transactional_type.html M docs/build/asf-site-html/topics/impala_delegation.html M docs/build/asf-site-html/topics/impala_delete.html M docs/build/asf-site-html/topics/impala_delete_stats_in_truncate.html M docs/build/asf-site-html/topics/impala_describe.html M docs/build/asf-site-html/topics/impala_development.html M docs/build/asf-site-html/topics/impala_disable_codegen.html M docs/build/asf-site-html/topics/impala_disable_codegen_rows_threshold.html M
[Impala-ASF-CR] IMPALA-13029: Tests for multi format equality deletes
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/21348 ) Change subject: IMPALA-13029: Tests for multi format equality deletes .. Patch Set 1: (1 comment) Thanks for adding more tests! http://gerrit.cloudera.org:8080/#/c/21348/1/testdata/data/README File testdata/data/README: http://gerrit.cloudera.org:8080/#/c/21348/1/testdata/data/README@1193 PS1, Line 1193:set tblproperties ('write.format.default'='avro'); Would it be possible to do schema evolution + Avro delete files? I.e. using different delete columns in the Avro eq delete files, to make sure we use the correct Avro schema in the delete scans. -- To view, visit http://gerrit.cloudera.org:8080/21348 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7f0ebf7f4d401877741eb3e1c990f1318ac2b4ba Gerrit-Change-Number: 21348 Gerrit-PatchSet: 1 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 23 Apr 2024 15:37:51 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-13002: Iceberg V2 tables with Avro delete files aren't read properly
Zoltan Borok-Nagy has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/21301 ) Change subject: IMPALA-13002: Iceberg V2 tables with Avro delete files aren't read properly .. IMPALA-13002: Iceberg V2 tables with Avro delete files aren't read properly If the Iceberg table has Avro delete files (e.g. by setting 'write.delete.format.default'='avro') then Impala won't be able to read the contents of the delete files properly. It is because the avro schema is not set properly for the virtual delete table. Testing: * added e2e tests with position delete files of all kinds Change-Id: Iff13198991caf32c51cd9e0ace4454fd00216cf6 Reviewed-on: http://gerrit.cloudera.org:8080/21301 Tested-by: Impala Public Jenkins Reviewed-by: Daniel Becker Reviewed-by: Gabor Kaszab --- M fe/src/main/java/org/apache/impala/catalog/IcebergDeleteTable.java A testdata/workloads/functional-query/queries/QueryTest/iceberg-mixed-format-position-deletes.test M tests/query_test/test_iceberg.py 3 files changed, 143 insertions(+), 0 deletions(-) Approvals: Impala Public Jenkins: Verified Daniel Becker: Looks good to me, but someone else must approve Gabor Kaszab: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/21301 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Iff13198991caf32c51cd9e0ace4454fd00216cf6 Gerrit-Change-Number: 21301 Gerrit-PatchSet: 2 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-13002: Iceberg V2 tables with Avro delete files aren't read properly
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/21301 ) Change subject: IMPALA-13002: Iceberg V2 tables with Avro delete files aren't read properly .. Patch Set 1: (2 comments) Thanks for the comments! http://gerrit.cloudera.org:8080/#/c/21301/1/fe/src/main/java/org/apache/impala/catalog/IcebergDeleteTable.java File fe/src/main/java/org/apache/impala/catalog/IcebergDeleteTable.java: http://gerrit.cloudera.org:8080/#/c/21301/1/fe/src/main/java/org/apache/impala/catalog/IcebergDeleteTable.java@87 PS1, Line 87: if (desc.hdfsTable.isSetAvroSchema()) { > I guess the issue is also true for AVRO equality delete files. Should we al Yes, it would definitely be useful to have such tests. Probably in a separate CR, as adding such tables is cumbersome. http://gerrit.cloudera.org:8080/#/c/21301/1/testdata/workloads/functional-query/queries/QueryTest/iceberg-mixed-format-position-deletes.test File testdata/workloads/functional-query/queries/QueryTest/iceberg-mixed-format-position-deletes.test: http://gerrit.cloudera.org:8080/#/c/21301/1/testdata/workloads/functional-query/queries/QueryTest/iceberg-mixed-format-position-deletes.test@92 PS1, Line 92: row_regex:'$NAMENODE/test-warehouse/$DATABASE.db/ice_mixed_formats_partitioned/data/j_trunc=2/.*-data-.*.orc','.*B','','.*' > there should be 2 ORC data files in the j_trunc=2, right? One for (2,2) and With VERIFY_IS_SUBSET we only check that each line is present in the result set. I.e. adding more lines with the same content wouldn't have an effect: https://github.com/apache/impala/blob/9b05a205fec397fa1e19ae467b1cc406ca43d948/tests/common/test_result_verifier.py#L258-L259 -- To view, visit http://gerrit.cloudera.org:8080/21301 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iff13198991caf32c51cd9e0ace4454fd00216cf6 Gerrit-Change-Number: 21301 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 22 Apr 2024 15:30:48 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-13000: Document OPTIMIZE TABLE
Zoltan Borok-Nagy has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/21320 ) Change subject: IMPALA-13000: Document OPTIMIZE TABLE .. IMPALA-13000: Document OPTIMIZE TABLE Document OPTIMIZE TABLE syntax and behaviour. Testing: - built docs locally Change-Id: I851669686ed4da610dcac97c9b88ff23b0a4a647 Reviewed-on: http://gerrit.cloudera.org:8080/21320 Tested-by: Impala Public Jenkins Reviewed-by: Zoltan Borok-Nagy Reviewed-by: Daniel Becker --- M docs/topics/impala_iceberg.xml 1 file changed, 47 insertions(+), 0 deletions(-) Approvals: Impala Public Jenkins: Verified Zoltan Borok-Nagy: Looks good to me, but someone else must approve Daniel Becker: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/21320 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I851669686ed4da610dcac97c9b88ff23b0a4a647 Gerrit-Change-Number: 21320 Gerrit-PatchSet: 4 Gerrit-Owner: Noemi Pap-Takacs Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-13000: Document OPTIMIZE TABLE
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/21320 ) Change subject: IMPALA-13000: Document OPTIMIZE TABLE .. Patch Set 3: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/21320 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I851669686ed4da610dcac97c9b88ff23b0a4a647 Gerrit-Change-Number: 21320 Gerrit-PatchSet: 3 Gerrit-Owner: Noemi Pap-Takacs Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 22 Apr 2024 10:39:59 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12938: add-opens for platform.cgroupv1
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/21334 ) Change subject: IMPALA-12938: add-opens for platform.cgroupv1 .. Patch Set 1: Code-Review+2 Thanks for fixing this -- To view, visit http://gerrit.cloudera.org:8080/21334 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I312ae987c17c6f06e1ffe15e943b1865feef6b82 Gerrit-Change-Number: 21334 Gerrit-PatchSet: 1 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 19 Apr 2024 10:23:54 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-13016: Fix ambiguous row regex that check for no-existence
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/21333 ) Change subject: IMPALA-13016: Fix ambiguous row_regex that check for no-existence .. Patch Set 2: Code-Review+2 Thanks for fixing these tests! LGTM! -- To view, visit http://gerrit.cloudera.org:8080/21333 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic81de34bf997dfaf1c199b1fe1b05346b55ff4da Gerrit-Change-Number: 21333 Gerrit-PatchSet: 2 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 19 Apr 2024 10:16:51 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-13000: Document OPTIMIZE TABLE
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/21320 ) Change subject: IMPALA-13000: Document OPTIMIZE TABLE .. Patch Set 2: (3 comments) http://gerrit.cloudera.org:8080/#/c/21320/1/docs/topics/impala_iceberg.xml File docs/topics/impala_iceberg.xml: http://gerrit.cloudera.org:8080/#/c/21320/1/docs/topics/impala_iceberg.xml@556 PS1, Line 556: able_na > No need to use fully qualified table names. I only included the database in [] is quite standard notation, and we are using it extensively in the Impala docs, e.g.: https://impala.apache.org/docs/build/html/topics/impala_create_table.html So users shouldn't be confused by it. This file mostly contains simple examples because the other statements have their own detailed doc page. But we don't have that for OPTIMIZE, so having a proper syntax definition here makes sense to me. Alternatively, you we could create a separate top-level page for OPTIMIZE, and here only add a few examples. http://gerrit.cloudera.org:8080/#/c/21320/2/docs/topics/impala_iceberg.xml File docs/topics/impala_iceberg.xml: http://gerrit.cloudera.org:8080/#/c/21320/2/docs/topics/impala_iceberg.xml@561 PS2, Line 561: rewrites the entire table I think we should mention that it only applies to the current implementation, so users won't have this assumption in future releases. http://gerrit.cloudera.org:8080/#/c/21320/2/docs/topics/impala_iceberg.xml@587 PS2, Line 587: rewrites the entire table Maybe also mention here that this behavior is temporary. -- To view, visit http://gerrit.cloudera.org:8080/21320 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I851669686ed4da610dcac97c9b88ff23b0a4a647 Gerrit-Change-Number: 21320 Gerrit-PatchSet: 2 Gerrit-Owner: Noemi Pap-Takacs Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 18 Apr 2024 12:58:20 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-13008: test metadata tables failed in Ubuntu 20 build
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/21317 ) Change subject: IMPALA-13008: test_metadata_tables failed in Ubuntu 20 build .. Patch Set 1: Code-Review+2 Thanks for fixing this! I verified the patch on a RELEASE version. -- To view, visit http://gerrit.cloudera.org:8080/21317 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iad8fd0d9920034e7dbe6c605bed7579fbe3b5b1f Gerrit-Change-Number: 21317 Gerrit-PatchSet: 1 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 17 Apr 2024 15:05:26 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-13000: Document OPTIMIZE TABLE
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/21320 ) Change subject: IMPALA-13000: Document OPTIMIZE TABLE .. Patch Set 1: (5 comments) http://gerrit.cloudera.org:8080/#/c/21320/1/docs/topics/impala_iceberg.xml File docs/topics/impala_iceberg.xml: http://gerrit.cloudera.org:8080/#/c/21320/1/docs/topics/impala_iceberg.xml@553 PS1, Line 553: in nit: maybe 'on'? http://gerrit.cloudera.org:8080/#/c/21320/1/docs/topics/impala_iceberg.xml@554 PS1, Line 554: update files There are no such files as 'udpdate files', so I'd just use 'data files' http://gerrit.cloudera.org:8080/#/c/21320/1/docs/topics/impala_iceberg.xml@565 PS1, Line 565: rewrite files using the latest table schema : rewrite partitions according to the latest partition spec I would slightly phrase it differently, as the current phrasing might suggest that data files with old schema/partition spec are getting selected for rewrite. So maybe: "the newly written files will have the latest schema and partitioned based on the latest partition spec" http://gerrit.cloudera.org:8080/#/c/21320/1/docs/topics/impala_iceberg.xml@577 PS1, Line 577: Views cannot be optimized. not sure if we need this as this should be clear http://gerrit.cloudera.org:8080/#/c/21320/1/docs/topics/impala_iceberg.xml@583 PS1, Line 583: . "..., because the rewritten data and delete files are not removed physically." -- To view, visit http://gerrit.cloudera.org:8080/21320 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I851669686ed4da610dcac97c9b88ff23b0a4a647 Gerrit-Change-Number: 21320 Gerrit-PatchSet: 1 Gerrit-Owner: Noemi Pap-Takacs Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 17 Apr 2024 13:33:08 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-13003: Handle Iceberg AlreadyExistsException
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/21312 ) Change subject: IMPALA-13003: Handle Iceberg AlreadyExistsException .. Patch Set 1: Code-Review+2 Thanks for fixing this, LGTM! -- To view, visit http://gerrit.cloudera.org:8080/21312 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I847eea9297c9ee0d8e821fe1c87ea03d22f1d96e Gerrit-Change-Number: 21312 Gerrit-PatchSet: 1 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 17 Apr 2024 08:59:29 + Gerrit-HasComments: No
[Impala-ASF-CR](asf-site) Add documentation, update links for 4.4.0
Hello Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21311 to look at the new patch set (#2). Change subject: Add documentation, update links for 4.4.0 .. Add documentation, update links for 4.4.0 Change-Id: Ibb93f7ba80b7a065ea83660fc75be9b065138ad9 --- M docs/build/asf-site-html/index.html M docs/build/asf-site-html/shared/ImpalaVariables.html M docs/build/asf-site-html/shared/impala_common.html M docs/build/asf-site-html/topics/impala_abort_on_error.html M docs/build/asf-site-html/topics/impala_adls.html M docs/build/asf-site-html/topics/impala_admin.html M docs/build/asf-site-html/topics/impala_admission.html M docs/build/asf-site-html/topics/impala_admission_config.html M docs/build/asf-site-html/topics/impala_aggregate_functions.html M docs/build/asf-site-html/topics/impala_aliases.html M docs/build/asf-site-html/topics/impala_allow_erasure_coded_files.html M docs/build/asf-site-html/topics/impala_allow_unsupported_formats.html M docs/build/asf-site-html/topics/impala_alter_database.html M docs/build/asf-site-html/topics/impala_alter_table.html M docs/build/asf-site-html/topics/impala_alter_view.html M docs/build/asf-site-html/topics/impala_analytic_functions.html M docs/build/asf-site-html/topics/impala_appx_count_distinct.html M docs/build/asf-site-html/topics/impala_appx_median.html M docs/build/asf-site-html/topics/impala_array.html M docs/build/asf-site-html/topics/impala_auditing.html M docs/build/asf-site-html/topics/impala_authentication.html M docs/build/asf-site-html/topics/impala_authorization.html M docs/build/asf-site-html/topics/impala_avg.html M docs/build/asf-site-html/topics/impala_avro.html M docs/build/asf-site-html/topics/impala_batch_size.html M docs/build/asf-site-html/topics/impala_bigint.html M docs/build/asf-site-html/topics/impala_bit_functions.html M docs/build/asf-site-html/topics/impala_boolean.html M docs/build/asf-site-html/topics/impala_breakpad.html M docs/build/asf-site-html/topics/impala_broadcast_bytes_limit.html M docs/build/asf-site-html/topics/impala_buffer_pool_limit.html M docs/build/asf-site-html/topics/impala_char.html M docs/build/asf-site-html/topics/impala_client.html M docs/build/asf-site-html/topics/impala_comment.html M docs/build/asf-site-html/topics/impala_comments.html M docs/build/asf-site-html/topics/impala_complex_types.html M docs/build/asf-site-html/topics/impala_components.html M docs/build/asf-site-html/topics/impala_compression_codec.html M docs/build/asf-site-html/topics/impala_compute_stats.html M docs/build/asf-site-html/topics/impala_compute_stats_min_sample_size.html M docs/build/asf-site-html/topics/impala_concepts.html M docs/build/asf-site-html/topics/impala_conditional_functions.html M docs/build/asf-site-html/topics/impala_config.html M docs/build/asf-site-html/topics/impala_config_options.html M docs/build/asf-site-html/topics/impala_config_performance.html M docs/build/asf-site-html/topics/impala_connecting.html M docs/build/asf-site-html/topics/impala_conversion_functions.html M docs/build/asf-site-html/topics/impala_count.html M docs/build/asf-site-html/topics/impala_create_database.html M docs/build/asf-site-html/topics/impala_create_function.html M docs/build/asf-site-html/topics/impala_create_role.html M docs/build/asf-site-html/topics/impala_create_table.html M docs/build/asf-site-html/topics/impala_create_view.html M docs/build/asf-site-html/topics/impala_custom_timezones.html M docs/build/asf-site-html/topics/impala_data_cache.html M docs/build/asf-site-html/topics/impala_databases.html M docs/build/asf-site-html/topics/impala_datatypes.html M docs/build/asf-site-html/topics/impala_date.html M docs/build/asf-site-html/topics/impala_datetime_functions.html M docs/build/asf-site-html/topics/impala_ddl.html M docs/build/asf-site-html/topics/impala_debug_action.html M docs/build/asf-site-html/topics/impala_decimal.html M docs/build/asf-site-html/topics/impala_decimal_v2.html M docs/build/asf-site-html/topics/impala_dedicated_coordinator.html M docs/build/asf-site-html/topics/impala_default_file_format.html M docs/build/asf-site-html/topics/impala_default_hints_insert_statement.html M docs/build/asf-site-html/topics/impala_default_join_distribution_mode.html M docs/build/asf-site-html/topics/impala_default_spillable_buffer_size.html M docs/build/asf-site-html/topics/impala_default_transactional_type.html M docs/build/asf-site-html/topics/impala_delegation.html M docs/build/asf-site-html/topics/impala_delete.html M docs/build/asf-site-html/topics/impala_delete_stats_in_truncate.html M docs/build/asf-site-html/topics/impala_describe.html M docs/build/asf-site-html/topics/impala_development.html M docs/build/asf-site-html/topics/impala_disable_codegen.html M docs/build/asf-site-html/topics/impala_disable_codegen_rows_threshold.html M
[Impala-ASF-CR](asf-site) Add documentation, update links for 4.4.0
Zoltan Borok-Nagy has uploaded this change for review. ( http://gerrit.cloudera.org:8080/21311 Change subject: Add documentation, update links for 4.4.0 .. Add documentation, update links for 4.4.0 Change-Id: Ibb93f7ba80b7a065ea83660fc75be9b065138ad9 --- M docs/build/asf-site-html/index.html M docs/build/asf-site-html/shared/ImpalaVariables.html M docs/build/asf-site-html/shared/impala_common.html M docs/build/asf-site-html/topics/impala_abort_on_error.html M docs/build/asf-site-html/topics/impala_adls.html M docs/build/asf-site-html/topics/impala_admin.html M docs/build/asf-site-html/topics/impala_admission.html M docs/build/asf-site-html/topics/impala_admission_config.html M docs/build/asf-site-html/topics/impala_aggregate_functions.html M docs/build/asf-site-html/topics/impala_aliases.html M docs/build/asf-site-html/topics/impala_allow_erasure_coded_files.html M docs/build/asf-site-html/topics/impala_allow_unsupported_formats.html M docs/build/asf-site-html/topics/impala_alter_database.html M docs/build/asf-site-html/topics/impala_alter_table.html M docs/build/asf-site-html/topics/impala_alter_view.html M docs/build/asf-site-html/topics/impala_analytic_functions.html M docs/build/asf-site-html/topics/impala_appx_count_distinct.html M docs/build/asf-site-html/topics/impala_appx_median.html M docs/build/asf-site-html/topics/impala_array.html M docs/build/asf-site-html/topics/impala_auditing.html M docs/build/asf-site-html/topics/impala_authentication.html M docs/build/asf-site-html/topics/impala_authorization.html M docs/build/asf-site-html/topics/impala_avg.html M docs/build/asf-site-html/topics/impala_avro.html M docs/build/asf-site-html/topics/impala_batch_size.html M docs/build/asf-site-html/topics/impala_bigint.html M docs/build/asf-site-html/topics/impala_bit_functions.html M docs/build/asf-site-html/topics/impala_boolean.html M docs/build/asf-site-html/topics/impala_breakpad.html M docs/build/asf-site-html/topics/impala_broadcast_bytes_limit.html M docs/build/asf-site-html/topics/impala_buffer_pool_limit.html M docs/build/asf-site-html/topics/impala_char.html M docs/build/asf-site-html/topics/impala_client.html M docs/build/asf-site-html/topics/impala_comment.html M docs/build/asf-site-html/topics/impala_comments.html M docs/build/asf-site-html/topics/impala_complex_types.html M docs/build/asf-site-html/topics/impala_components.html M docs/build/asf-site-html/topics/impala_compression_codec.html M docs/build/asf-site-html/topics/impala_compute_stats.html M docs/build/asf-site-html/topics/impala_compute_stats_min_sample_size.html M docs/build/asf-site-html/topics/impala_concepts.html M docs/build/asf-site-html/topics/impala_conditional_functions.html M docs/build/asf-site-html/topics/impala_config.html M docs/build/asf-site-html/topics/impala_config_options.html M docs/build/asf-site-html/topics/impala_config_performance.html M docs/build/asf-site-html/topics/impala_connecting.html M docs/build/asf-site-html/topics/impala_conversion_functions.html M docs/build/asf-site-html/topics/impala_count.html M docs/build/asf-site-html/topics/impala_create_database.html M docs/build/asf-site-html/topics/impala_create_function.html M docs/build/asf-site-html/topics/impala_create_role.html M docs/build/asf-site-html/topics/impala_create_table.html M docs/build/asf-site-html/topics/impala_create_view.html M docs/build/asf-site-html/topics/impala_custom_timezones.html M docs/build/asf-site-html/topics/impala_data_cache.html M docs/build/asf-site-html/topics/impala_databases.html M docs/build/asf-site-html/topics/impala_datatypes.html M docs/build/asf-site-html/topics/impala_date.html M docs/build/asf-site-html/topics/impala_datetime_functions.html M docs/build/asf-site-html/topics/impala_ddl.html M docs/build/asf-site-html/topics/impala_debug_action.html M docs/build/asf-site-html/topics/impala_decimal.html M docs/build/asf-site-html/topics/impala_decimal_v2.html M docs/build/asf-site-html/topics/impala_dedicated_coordinator.html M docs/build/asf-site-html/topics/impala_default_file_format.html M docs/build/asf-site-html/topics/impala_default_hints_insert_statement.html M docs/build/asf-site-html/topics/impala_default_join_distribution_mode.html M docs/build/asf-site-html/topics/impala_default_spillable_buffer_size.html M docs/build/asf-site-html/topics/impala_default_transactional_type.html M docs/build/asf-site-html/topics/impala_delegation.html M docs/build/asf-site-html/topics/impala_delete.html M docs/build/asf-site-html/topics/impala_delete_stats_in_truncate.html M docs/build/asf-site-html/topics/impala_describe.html M docs/build/asf-site-html/topics/impala_development.html M docs/build/asf-site-html/topics/impala_disable_codegen.html M docs/build/asf-site-html/topics/impala_disable_codegen_rows_threshold.html M docs/build/asf-site-html/topics/impala_disable_hbase_num_rows_estimate.html M
[Impala-ASF-CR](asf-site) Update download links for release 4.4.0
Zoltan Borok-Nagy has uploaded this change for review. ( http://gerrit.cloudera.org:8080/21307 Change subject: Update download links for release 4.4.0 .. Update download links for release 4.4.0 Change-Id: Ie0e8736154e5289e02d5ec5cf5f664cd4de2739d --- M downloads.html 1 file changed, 13 insertions(+), 4 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/07/21307/1 -- To view, visit http://gerrit.cloudera.org:8080/21307 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: asf-site Gerrit-MessageType: newchange Gerrit-Change-Id: Ie0e8736154e5289e02d5ec5cf5f664cd4de2739d Gerrit-Change-Number: 21307 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-13002: Iceberg V2 tables with Avro delete files aren't read properly
Zoltan Borok-Nagy has uploaded this change for review. ( http://gerrit.cloudera.org:8080/21301 Change subject: IMPALA-13002: Iceberg V2 tables with Avro delete files aren't read properly .. IMPALA-13002: Iceberg V2 tables with Avro delete files aren't read properly If the Iceberg table has Avro delete files (e.g. by setting 'write.delete.format.default'='avro') then Impala won't be able to read the contents of the delete files properly. It is because the avro schema is not set properly for the virtual delete table. Testing: * added e2e tests with position delete files of all kinds Change-Id: Iff13198991caf32c51cd9e0ace4454fd00216cf6 --- M fe/src/main/java/org/apache/impala/catalog/IcebergDeleteTable.java A testdata/workloads/functional-query/queries/QueryTest/iceberg-mixed-format-position-deletes.test M tests/query_test/test_iceberg.py 3 files changed, 143 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/01/21301/1 -- To view, visit http://gerrit.cloudera.org:8080/21301 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Iff13198991caf32c51cd9e0ace4454fd00216cf6 Gerrit-Change-Number: 21301 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-12810: Simplify IcebergDeleteNode and IcebergDeleteBuilder
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/21258 ) Change subject: IMPALA-12810: Simplify IcebergDeleteNode and IcebergDeleteBuilder .. Patch Set 7: (1 comment) Thanks for the comment! http://gerrit.cloudera.org:8080/#/c/21258/7/tests/query_test/test_iceberg.py File tests/query_test/test_iceberg.py: http://gerrit.cloudera.org:8080/#/c/21258/7/tests/query_test/test_iceberg.py@1455 PS7, Line 1455: if vector.get_value('exec_option')['disable_optimized_iceberg_v2_read'] == 0: > Here the code says that "if we don't disable the V2 read optimisations then Ah, those negations... you're right. Anyway, it revealed a bug in the test code that I fixed in the new PS. -- To view, visit http://gerrit.cloudera.org:8080/21258 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3ba02b33433990950b49628f11e732e01ed8a34d Gerrit-Change-Number: 21258 Gerrit-PatchSet: 7 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 11 Apr 2024 15:28:13 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12810: Simplify IcebergDeleteNode and IcebergDeleteBuilder
Hello Daniel Becker, Gabor Kaszab, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21258 to look at the new patch set (#10). Change subject: IMPALA-12810: Simplify IcebergDeleteNode and IcebergDeleteBuilder .. IMPALA-12810: Simplify IcebergDeleteNode and IcebergDeleteBuilder Now that we have the DIRECTED distribution mode, some parts of IcebergDeleteNode and IcebergDeleteBuilder became dead code. It is time to simplify the above classes. IcebergDeleteBuilder and KrpcDataStreamSender now also tolerate NULL file paths which are also not an error in the hash join mode. Change-Id: I3ba02b33433990950b49628f11e732e01ed8a34d --- M be/src/exec/iceberg-delete-builder.cc M be/src/exec/iceberg-delete-builder.h M be/src/exec/iceberg-delete-node.cc M be/src/exec/iceberg-delete-node.h M be/src/runtime/krpc-data-stream-sender.cc M testdata/data/README A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/data/6348b186d3705f6b-370ecfbb_152551971_data.0.parq A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/data/delete_null_first.parq A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/data/delete_null_first_and_last.parq A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/data/delete_null_last.parq A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/data/delete_null_single.parq A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/data/delete_three_nulls.parq A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/data/same_data.0.parq A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/metadata/3a813d5e-fc0b-485f-bbba-010972a9f20a-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/metadata/e90d28aa-cd17-4655-ad04-aa3711792576-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/metadata/snap-5852039568708655222-1-3a813d5e-fc0b-485f-bbba-010972a9f20a.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/metadata/v1.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/metadata/version-hint.text M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv M tests/query_test/test_iceberg.py 21 files changed, 219 insertions(+), 104 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/58/21258/10 -- To view, visit http://gerrit.cloudera.org:8080/21258 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3ba02b33433990950b49628f11e732e01ed8a34d Gerrit-Change-Number: 21258 Gerrit-PatchSet: 10 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-12991: Eliminate unnecessary SORT for Iceberg DELETEs
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/21285 ) Change subject: IMPALA-12991: Eliminate unnecessary SORT for Iceberg DELETEs .. Patch Set 2: Code-Review+2 (1 comment) Thanks for the comment! Carry +2 http://gerrit.cloudera.org:8080/#/c/21285/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21285/1//COMMIT_MSG@9 PS1, Line 9: using > nit: using Done -- To view, visit http://gerrit.cloudera.org:8080/21285 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I94a691e7990228a1ec2de03e6ad90ebb97931581 Gerrit-Change-Number: 21285 Gerrit-PatchSet: 2 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 11 Apr 2024 12:33:32 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12991: Eliminate unnecessary SORT for Iceberg DELETEs
Hello Gabor Kaszab, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21285 to look at the new patch set (#2). Change subject: IMPALA-12991: Eliminate unnecessary SORT for Iceberg DELETEs .. IMPALA-12991: Eliminate unnecessary SORT for Iceberg DELETEs Since we are using IcebergBufferedDeleteSink, which sorts the data before flushing, there is no need to add a SORT node before the sink. Testing: * updated planner tests Change-Id: I94a691e7990228a1ec2de03e6ad90ebb97931581 --- M fe/src/main/java/org/apache/impala/planner/Planner.java M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-delete.test 2 files changed, 7 insertions(+), 41 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/85/21285/2 -- To view, visit http://gerrit.cloudera.org:8080/21285 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I94a691e7990228a1ec2de03e6ad90ebb97931581 Gerrit-Change-Number: 21285 Gerrit-PatchSet: 2 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-12970: Fix ConcurrentModificationException for Iceberg table scans
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/21267 ) Change subject: IMPALA-12970: Fix ConcurrentModificationException for Iceberg table scans .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/21267/1/fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java File fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java: http://gerrit.cloudera.org:8080/#/c/21267/1/fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java@107 PS1, Line 107: fileDescs_ = new ArrayList<>(fileDescs_); : Collections.sort(fileDescs_); > I'm experimenting with your suggestion and see that it would bring too much I see, thanks Gabor for investigating this. -- To view, visit http://gerrit.cloudera.org:8080/21267 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iafe57f05ffa0fa6a0875c141cfafd5ee1607a5c3 Gerrit-Change-Number: 21267 Gerrit-PatchSet: 1 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Peter Rozsa Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 11 Apr 2024 07:43:42 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12991: Eliminate unnecessary SORT for Iceberg DELETEs
Zoltan Borok-Nagy has uploaded this change for review. ( http://gerrit.cloudera.org:8080/21285 Change subject: IMPALA-12991: Eliminate unnecessary SORT for Iceberg DELETEs .. IMPALA-12991: Eliminate unnecessary SORT for Iceberg DELETEs Since we are useing IcebergBufferedDeleteSink, which sorts the data before flushing, there is no need to add a SORT node before the sink. Testing: * updated planner tests Change-Id: I94a691e7990228a1ec2de03e6ad90ebb97931581 --- M fe/src/main/java/org/apache/impala/planner/Planner.java M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-delete.test 2 files changed, 7 insertions(+), 41 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/85/21285/1 -- To view, visit http://gerrit.cloudera.org:8080/21285 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I94a691e7990228a1ec2de03e6ad90ebb97931581 Gerrit-Change-Number: 21285 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-12810: Simplify IcebergDeleteNode and IcebergDeleteBuilder
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/21258 ) Change subject: IMPALA-12810: Simplify IcebergDeleteNode and IcebergDeleteBuilder .. Patch Set 9: (2 comments) http://gerrit.cloudera.org:8080/#/c/21258/7/be/src/runtime/krpc-data-stream-sender.cc File be/src/runtime/krpc-data-stream-sender.cc: http://gerrit.cloudera.org:8080/#/c/21258/7/be/src/runtime/krpc-data-stream-sender.cc@1125 PS7, Line 1125: A > I think it's a bit unexpected/abrupt after the previous paragraphs which ar Done http://gerrit.cloudera.org:8080/#/c/21258/8/testdata/data/README File testdata/data/README: http://gerrit.cloudera.org:8080/#/c/21258/8/testdata/data/README@1134 PS8, Line 1134: 1) Created the table via Impala and added some records to it. > Could you include the CREATE TABLE and INSERT statements for reproducibilit Done -- To view, visit http://gerrit.cloudera.org:8080/21258 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3ba02b33433990950b49628f11e732e01ed8a34d Gerrit-Change-Number: 21258 Gerrit-PatchSet: 9 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 10 Apr 2024 13:38:41 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12810: Simplify IcebergDeleteNode and IcebergDeleteBuilder
Hello Daniel Becker, Gabor Kaszab, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21258 to look at the new patch set (#9). Change subject: IMPALA-12810: Simplify IcebergDeleteNode and IcebergDeleteBuilder .. IMPALA-12810: Simplify IcebergDeleteNode and IcebergDeleteBuilder Now that we have the DIRECTED distribution mode, some parts of IcebergDeleteNode and IcebergDeleteBuilder became dead code. It is time to simplify the above classes. IcebergDeleteBuilder and KrpcDataStreamSender now also tolerate NULL file paths which are also not an error in the hash join mode. Change-Id: I3ba02b33433990950b49628f11e732e01ed8a34d --- M be/src/exec/iceberg-delete-builder.cc M be/src/exec/iceberg-delete-builder.h M be/src/exec/iceberg-delete-node.cc M be/src/exec/iceberg-delete-node.h M be/src/runtime/krpc-data-stream-sender.cc M testdata/data/README A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/data/6348b186d3705f6b-370ecfbb_152551971_data.0.parq A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/data/delete_null_first.parq A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/data/delete_null_first_and_last.parq A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/data/delete_null_last.parq A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/data/delete_null_single.parq A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/data/delete_three_nulls.parq A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/data/same_data.0.parq A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/metadata/3a813d5e-fc0b-485f-bbba-010972a9f20a-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/metadata/e90d28aa-cd17-4655-ad04-aa3711792576-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/metadata/snap-5852039568708655222-1-3a813d5e-fc0b-485f-bbba-010972a9f20a.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/metadata/v1.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/metadata/version-hint.text M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv M tests/query_test/test_iceberg.py 21 files changed, 217 insertions(+), 104 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/58/21258/9 -- To view, visit http://gerrit.cloudera.org:8080/21258 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3ba02b33433990950b49628f11e732e01ed8a34d Gerrit-Change-Number: 21258 Gerrit-PatchSet: 9 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-12810: Simplify IcebergDeleteNode and IcebergDeleteBuilder
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/21258 ) Change subject: IMPALA-12810: Simplify IcebergDeleteNode and IcebergDeleteBuilder .. Patch Set 8: (5 comments) Thanks for the comments! http://gerrit.cloudera.org:8080/#/c/21258/7/be/src/exec/iceberg-delete-builder.cc File be/src/exec/iceberg-delete-builder.cc: http://gerrit.cloudera.org:8080/#/c/21258/7/be/src/exec/iceberg-delete-builder.cc@283 PS7, Line 283: ErrorMsg(TErrorCode::GENERAL, "NULL found as file_path in delete file")); > Can't we return or continue here? Sure, we can continue http://gerrit.cloudera.org:8080/#/c/21258/7/be/src/runtime/krpc-data-stream-sender.cc File be/src/runtime/krpc-data-stream-sender.cc: http://gerrit.cloudera.org:8080/#/c/21258/7/be/src/runtime/krpc-data-stream-sender.cc@ PS7, Line : (filename_value_ss.len == 0 && prev_channels.empty())); > This is triggered when there is 2 consecutive rows in the delete file where Yes, we have a delete file with only three NULLs http://gerrit.cloudera.org:8080/#/c/21258/7/be/src/runtime/krpc-data-stream-sender.cc@1125 PS7, Line 1125: Or > Nit: something like "A third case is..." would be nicer. I can change it if you feel strong about it, but I think concise and simple phrasing is preferable in comments. http://gerrit.cloudera.org:8080/#/c/21258/7/testdata/datasets/functional/functional_schema_template.sql File testdata/datasets/functional/functional_schema_template.sql: http://gerrit.cloudera.org:8080/#/c/21258/7/testdata/datasets/functional/functional_schema_template.sql@3957 PS7, Line 3957: iceberg_v2_null_delete_record > Could you add some details about this table to the README? Would be nice to Done http://gerrit.cloudera.org:8080/#/c/21258/7/tests/query_test/test_iceberg.py File tests/query_test/test_iceberg.py: http://gerrit.cloudera.org:8080/#/c/21258/7/tests/query_test/test_iceberg.py@1455 PS7, Line 1455: if vector.get_value('exec_option')['disable_optimized_iceberg_v2_read'] == 0: > Would it make sense to test where we have DIRECTED mode to see that the Krp We only test DIRECTED mode + V2 operator (which go in hand with each other). When 'disable_optimized_iceberg_v2_read' is true, we fallback to the old anti hash join which doesn't validate the delete records. -- To view, visit http://gerrit.cloudera.org:8080/21258 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3ba02b33433990950b49628f11e732e01ed8a34d Gerrit-Change-Number: 21258 Gerrit-PatchSet: 8 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 10 Apr 2024 13:08:22 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12810: Simplify IcebergDeleteNode and IcebergDeleteBuilder
Hello Daniel Becker, Gabor Kaszab, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21258 to look at the new patch set (#8). Change subject: IMPALA-12810: Simplify IcebergDeleteNode and IcebergDeleteBuilder .. IMPALA-12810: Simplify IcebergDeleteNode and IcebergDeleteBuilder Now that we have the DIRECTED distribution mode, some parts of IcebergDeleteNode and IcebergDeleteBuilder became dead code. It is time to simplify the above classes. IcebergDeleteBuilder and KrpcDataStreamSender now also tolerate NULL file paths which are also not an error in the hash join mode. Change-Id: I3ba02b33433990950b49628f11e732e01ed8a34d --- M be/src/exec/iceberg-delete-builder.cc M be/src/exec/iceberg-delete-builder.h M be/src/exec/iceberg-delete-node.cc M be/src/exec/iceberg-delete-node.h M be/src/runtime/krpc-data-stream-sender.cc M testdata/data/README A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/data/6348b186d3705f6b-370ecfbb_152551971_data.0.parq A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/data/delete_null_first.parq A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/data/delete_null_first_and_last.parq A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/data/delete_null_last.parq A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/data/delete_null_single.parq A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/data/delete_three_nulls.parq A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/data/same_data.0.parq A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/metadata/3a813d5e-fc0b-485f-bbba-010972a9f20a-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/metadata/e90d28aa-cd17-4655-ad04-aa3711792576-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/metadata/snap-5852039568708655222-1-3a813d5e-fc0b-485f-bbba-010972a9f20a.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/metadata/v1.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/metadata/version-hint.text M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv M tests/query_test/test_iceberg.py 21 files changed, 210 insertions(+), 104 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/58/21258/8 -- To view, visit http://gerrit.cloudera.org:8080/21258 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3ba02b33433990950b49628f11e732e01ed8a34d Gerrit-Change-Number: 21258 Gerrit-PatchSet: 8 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-12810: Simplify IcebergDeleteNode and IcebergDeleteBuilder
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/21258 ) Change subject: IMPALA-12810: Simplify IcebergDeleteNode and IcebergDeleteBuilder .. Patch Set 7: (4 comments) Thanks for the comments! http://gerrit.cloudera.org:8080/#/c/21258/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21258/2//COMMIT_MSG@13 PS2, Line 13: IcebergDeleteBuilder and KrpcDataStreamSender now also tolerate > It is not valid, but we have seen such errors at certain customers. Unfortu Actually there are still cases then the IcebergDeleteBuilder receives NULL file paths, e.g. num_nodes=1, or there's only a single data file http://gerrit.cloudera.org:8080/#/c/21258/4//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21258/4//COMMIT_MSG@13 PS4, Line 13: IcebergDeleteBuilder and KrpcDataStreamSender now also > Is it possible to add a test for this? Yeah I've just added tests http://gerrit.cloudera.org:8080/#/c/21258/4/be/src/exec/iceberg-delete-builder.h File be/src/exec/iceberg-delete-builder.h: http://gerrit.cloudera.org:8080/#/c/21258/4/be/src/exec/iceberg-delete-builder.h@79 PS4, Line 79: /// Shared Build > Shouldn't we mention DIRECTED mode here? Actually we should only mention the DIRECTED mode, since this is the only supported mode. http://gerrit.cloudera.org:8080/#/c/21258/4/be/src/exec/iceberg-delete-builder.cc File be/src/exec/iceberg-delete-builder.cc: http://gerrit.cloudera.org:8080/#/c/21258/4/be/src/exec/iceberg-delete-builder.cc@272 PS4, Line 272: state > Do we use 'state' anywhere? If not, this parameter could also be removed fr In PS5 we use again -- To view, visit http://gerrit.cloudera.org:8080/21258 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3ba02b33433990950b49628f11e732e01ed8a34d Gerrit-Change-Number: 21258 Gerrit-PatchSet: 7 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 09 Apr 2024 16:32:44 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12810: Simplify IcebergDeleteNode and IcebergDeleteBuilder
Hello Daniel Becker, Gabor Kaszab, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21258 to look at the new patch set (#7). Change subject: IMPALA-12810: Simplify IcebergDeleteNode and IcebergDeleteBuilder .. IMPALA-12810: Simplify IcebergDeleteNode and IcebergDeleteBuilder Now that we have the DIRECTED distribution mode, some parts of IcebergDeleteNode and IcebergDeleteBuilder became dead code. It is time to simplify the above classes. IcebergDeleteBuilder and KrpcDataStreamSender now also tolerate NULL file paths which are also not an error in the hash join mode. Change-Id: I3ba02b33433990950b49628f11e732e01ed8a34d --- M be/src/exec/iceberg-delete-builder.cc M be/src/exec/iceberg-delete-builder.h M be/src/exec/iceberg-delete-node.cc M be/src/exec/iceberg-delete-node.h M be/src/runtime/krpc-data-stream-sender.cc A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/data/6348b186d3705f6b-370ecfbb_152551971_data.0.parq A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/data/delete_null_first.parq A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/data/delete_null_first_and_last.parq A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/data/delete_null_last.parq A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/data/delete_null_single.parq A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/data/delete_three_nulls.parq A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/data/same_data.0.parq A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/metadata/3a813d5e-fc0b-485f-bbba-010972a9f20a-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/metadata/e90d28aa-cd17-4655-ad04-aa3711792576-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/metadata/snap-5852039568708655222-1-3a813d5e-fc0b-485f-bbba-010972a9f20a.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/metadata/v1.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/metadata/version-hint.text M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv M tests/query_test/test_iceberg.py 20 files changed, 172 insertions(+), 104 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/58/21258/7 -- To view, visit http://gerrit.cloudera.org:8080/21258 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3ba02b33433990950b49628f11e732e01ed8a34d Gerrit-Change-Number: 21258 Gerrit-PatchSet: 7 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-12810: Simplify IcebergDeleteNode and IcebergDeleteBuilder
Hello Daniel Becker, Gabor Kaszab, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21258 to look at the new patch set (#6). Change subject: IMPALA-12810: Simplify IcebergDeleteNode and IcebergDeleteBuilder .. IMPALA-12810: Simplify IcebergDeleteNode and IcebergDeleteBuilder Now that we have the DIRECTED distribution mode, some parts of IcebergDeleteNode and IcebergDeleteBuilder became dead code. It is time to simplify the above classes. IcebergDeleteBuilder and KrpcDataStreamSender now also tolerate NULL file paths which are also not an error in the hash join mode. Change-Id: I3ba02b33433990950b49628f11e732e01ed8a34d --- M be/src/exec/iceberg-delete-builder.cc M be/src/exec/iceberg-delete-builder.h M be/src/exec/iceberg-delete-node.cc M be/src/exec/iceberg-delete-node.h M be/src/runtime/krpc-data-stream-sender.cc M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv M tests/query_test/test_iceberg.py 8 files changed, 82 insertions(+), 104 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/58/21258/6 -- To view, visit http://gerrit.cloudera.org:8080/21258 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3ba02b33433990950b49628f11e732e01ed8a34d Gerrit-Change-Number: 21258 Gerrit-PatchSet: 6 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-12810: Simplify IcebergDeleteNode and IcebergDeleteBuilder
Hello Daniel Becker, Gabor Kaszab, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21258 to look at the new patch set (#5). Change subject: IMPALA-12810: Simplify IcebergDeleteNode and IcebergDeleteBuilder .. IMPALA-12810: Simplify IcebergDeleteNode and IcebergDeleteBuilder Now that we have the DIRECTED distribution mode, some parts of IcebergDeleteNode and IcebergDeleteBuilder became dead code. It is time to simplify the above classes. IcebergDeleteBuilder and KrpcDataStreamSender now also tolerate NULL file paths which are also not an error in the hash join mode. Change-Id: I3ba02b33433990950b49628f11e732e01ed8a34d --- M be/src/exec/iceberg-delete-builder.cc M be/src/exec/iceberg-delete-builder.h M be/src/exec/iceberg-delete-node.cc M be/src/exec/iceberg-delete-node.h M be/src/runtime/krpc-data-stream-sender.cc M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv M tests/query_test/test_iceberg.py 8 files changed, 79 insertions(+), 96 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/58/21258/5 -- To view, visit http://gerrit.cloudera.org:8080/21258 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3ba02b33433990950b49628f11e732e01ed8a34d Gerrit-Change-Number: 21258 Gerrit-PatchSet: 5 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-12810: Simplify IcebergDeleteNode and IcebergDeleteBuilder
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/21258 ) Change subject: IMPALA-12810: Simplify IcebergDeleteNode and IcebergDeleteBuilder .. Patch Set 4: (2 comments) http://gerrit.cloudera.org:8080/#/c/21258/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21258/2//COMMIT_MSG@9 PS2, Line 9: DIRECTED > DIRECTED Ouch, fixed in the Jira ticket as well. http://gerrit.cloudera.org:8080/#/c/21258/2//COMMIT_MSG@13 PS2, Line 13: IcebergDeleteBuilder now also tolerates NULL file paths which are > I'm wondering if there is a valid use case when a file path is null. Is it It is not valid, but we have seen such errors at certain customers. Unfortunately we don't know which engine wrote those position delete files :( But now that I think of it, with DIRECTED mode, IcebergDeleteBuilder will never get NULL file paths, as we will never have an entry for these in the routing map (filepath_to_hosts_). So I'm just keeping that logic as is and adding a DCHECK. Now it depends on KrpcDataStreamSender how we handle NULL file paths. I'll try to prepare a test table for this. -- To view, visit http://gerrit.cloudera.org:8080/21258 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3ba02b33433990950b49628f11e732e01ed8a34d Gerrit-Change-Number: 21258 Gerrit-PatchSet: 4 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 09 Apr 2024 11:22:22 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12810: Simplify IcebergDeleteNode and IcebergDeleteBuilder
Hello Gabor Kaszab, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21258 to look at the new patch set (#4). Change subject: IMPALA-12810: Simplify IcebergDeleteNode and IcebergDeleteBuilder .. IMPALA-12810: Simplify IcebergDeleteNode and IcebergDeleteBuilder Now that we have the DIRECTED distribution mode, some parts of IcebergDeleteNode and IcebergDeleteBuilder became dead code. It is time to simplify the above classes. IcebergDeleteBuilder now also tolerates NULL file paths which are not an error in the hash join mode. Change-Id: I3ba02b33433990950b49628f11e732e01ed8a34d --- M be/src/exec/iceberg-delete-builder.cc M be/src/exec/iceberg-delete-builder.h M be/src/exec/iceberg-delete-node.cc M be/src/exec/iceberg-delete-node.h 4 files changed, 30 insertions(+), 92 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/58/21258/4 -- To view, visit http://gerrit.cloudera.org:8080/21258 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3ba02b33433990950b49628f11e732e01ed8a34d Gerrit-Change-Number: 21258 Gerrit-PatchSet: 4 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-12810: Simplify IcebergDeleteNode and IcebergDeleteBuilder
Hello Gabor Kaszab, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21258 to look at the new patch set (#3). Change subject: IMPALA-12810: Simplify IcebergDeleteNode and IcebergDeleteBuilder .. IMPALA-12810: Simplify IcebergDeleteNode and IcebergDeleteBuilder Now that we have the DIRECTED distribution mode, some parts of IcebergDeleteNode and IcebergDeleteBuilder became dead code. It is time to simplify the above classes. IcebergDeleteBuilder now also tolerates NULL file paths which are not an error in the hash join mode. Change-Id: I3ba02b33433990950b49628f11e732e01ed8a34d --- M be/src/exec/iceberg-delete-builder.cc M be/src/exec/iceberg-delete-builder.h M be/src/exec/iceberg-delete-node.cc M be/src/exec/iceberg-delete-node.h 4 files changed, 30 insertions(+), 93 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/58/21258/3 -- To view, visit http://gerrit.cloudera.org:8080/21258 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3ba02b33433990950b49628f11e732e01ed8a34d Gerrit-Change-Number: 21258 Gerrit-PatchSet: 3 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-12970: Fix ConcurrentModificationException for Iceberg table scans
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/21267 ) Change subject: IMPALA-12970: Fix ConcurrentModificationException for Iceberg table scans .. Patch Set 2: Code-Review+2 (1 comment) Proposed an alternative approach. But I'm also OK to quickly push this fix and improve it later. http://gerrit.cloudera.org:8080/#/c/21267/1/fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java File fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java: http://gerrit.cloudera.org:8080/#/c/21267/1/fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java@107 PS1, Line 107: fileDescs_ = new ArrayList<>(fileDescs_); : Collections.sort(fileDescs_); Alternatively we could do the sorting during file metadata loading, so we wouldn't need to copy and sort fds for every Iceberg scan node. -- To view, visit http://gerrit.cloudera.org:8080/21267 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iafe57f05ffa0fa6a0875c141cfafd5ee1607a5c3 Gerrit-Change-Number: 21267 Gerrit-PatchSet: 2 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Peter Rozsa Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 09 Apr 2024 08:45:55 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12810: Simplify IcebergDeleteNode and IcebergDeleteBuilder
Hello Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21258 to look at the new patch set (#2). Change subject: IMPALA-12810: Simplify IcebergDeleteNode and IcebergDeleteBuilder .. IMPALA-12810: Simplify IcebergDeleteNode and IcebergDeleteBuilder Now that we have the BROADCAST distribution mode, some parts of IcebergDeleteNode and IcebergDeleteBuilder became dead code. It is time to simplify the above classes. IcebergDeleteBuilder now also tolerates NULL file paths which are not an error in the hash join mode. Change-Id: I3ba02b33433990950b49628f11e732e01ed8a34d --- M be/src/exec/iceberg-delete-builder.cc M be/src/exec/iceberg-delete-builder.h M be/src/exec/iceberg-delete-node.cc M be/src/exec/iceberg-delete-node.h 4 files changed, 30 insertions(+), 93 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/58/21258/2 -- To view, visit http://gerrit.cloudera.org:8080/21258 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3ba02b33433990950b49628f11e732e01ed8a34d Gerrit-Change-Number: 21258 Gerrit-PatchSet: 2 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-12810: Simplify IcebergDeleteNode and IcebergDeleteBuilder
Zoltan Borok-Nagy has uploaded this change for review. ( http://gerrit.cloudera.org:8080/21258 Change subject: IMPALA-12810: Simplify IcebergDeleteNode and IcebergDeleteBuilder .. IMPALA-12810: Simplify IcebergDeleteNode and IcebergDeleteBuilder Now that we have the BROADCAST distribution mode, some parts of IcebergDeleteNode and IcebergDeleteBuilder became dead code. It is time to simplify the above classes. IcebergDeleteBuilder now also tolerates NULL file paths which are not an error in the hash join mode. Change-Id: I3ba02b33433990950b49628f11e732e01ed8a34d --- M be/src/exec/iceberg-delete-builder.cc M be/src/exec/iceberg-delete-builder.h M be/src/exec/iceberg-delete-node.cc M be/src/exec/iceberg-delete-node.h M be/src/runtime/coordinator.cc M be/src/scheduling/scheduler.cc 6 files changed, 46 insertions(+), 102 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/58/21258/1 -- To view, visit http://gerrit.cloudera.org:8080/21258 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I3ba02b33433990950b49628f11e732e01ed8a34d Gerrit-Change-Number: 21258 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-12894: Addendum: Re-enable test plain count star optimization
Zoltan Borok-Nagy has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/21249 ) Change subject: IMPALA-12894: Addendum: Re-enable test_plain_count_star_optimization .. IMPALA-12894: Addendum: Re-enable test_plain_count_star_optimization test_plain_count_star_optimization was disabled by IMPALA-12894 part 1, and part 2 didn't re-enable it. This patch re-enables it. Change-Id: I30629632742c0d402a6bb852a169359edac59eba Reviewed-on: http://gerrit.cloudera.org:8080/21249 Tested-by: Impala Public Jenkins Reviewed-by: Gabor Kaszab --- M tests/query_test/test_iceberg.py 1 file changed, 0 insertions(+), 1 deletion(-) Approvals: Impala Public Jenkins: Verified Gabor Kaszab: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/21249 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I30629632742c0d402a6bb852a169359edac59eba Gerrit-Change-Number: 21249 Gerrit-PatchSet: 2 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-12894: Addendum: Re-enable test plain count star optimization
Zoltan Borok-Nagy has uploaded this change for review. ( http://gerrit.cloudera.org:8080/21249 Change subject: IMPALA-12894: Addendum: Re-enable test_plain_count_star_optimization .. IMPALA-12894: Addendum: Re-enable test_plain_count_star_optimization test_plain_count_star_optimization was disabled by IMPALA-12894 part 1, and part 2 didn't re-enable it. This patch re-enables it. Change-Id: I30629632742c0d402a6bb852a169359edac59eba --- M tests/query_test/test_iceberg.py 1 file changed, 0 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/21249/1 -- To view, visit http://gerrit.cloudera.org:8080/21249 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I30629632742c0d402a6bb852a169359edac59eba Gerrit-Change-Number: 21249 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-12609: Implement SHOW METADATA TABLES IN statement to list Iceberg Metadata tables
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/21026 ) Change subject: IMPALA-12609: Implement SHOW METADATA TABLES IN statement to list Iceberg Metadata tables .. Patch Set 15: Code-Review+2 (1 comment) LGTM http://gerrit.cloudera.org:8080/#/c/21026/15/fe/src/test/java/org/apache/impala/authorization/AuthorizationStmtTest.java File fe/src/test/java/org/apache/impala/authorization/AuthorizationStmtTest.java: http://gerrit.cloudera.org:8080/#/c/21026/15/fe/src/test/java/org/apache/impala/authorization/AuthorizationStmtTest.java@1258 PS15, Line 1258: functional_parquet > Removed the table name because "functional_parquet.iceberg_query_metadata" Do we know why? Is it related to local / legacy catalog modes? functional_parquet.*.* can be a bit misleading. But I'm OK with fixing it in a follow-up Jira. -- To view, visit http://gerrit.cloudera.org:8080/21026 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ide10ccf10fc0abf5c270119ba7092c67e712ec49 Gerrit-Change-Number: 21026 Gerrit-PatchSet: 15 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 02 Apr 2024 09:38:39 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12609: Implement SHOW METADATA TABLES IN statement to list Iceberg Metadata tables
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/21026 ) Change subject: IMPALA-12609: Implement SHOW METADATA TABLES IN statement to list Iceberg Metadata tables .. Patch Set 11: (1 comment) http://gerrit.cloudera.org:8080/#/c/21026/11/fe/src/main/java/org/apache/impala/service/JniFrontend.java File fe/src/main/java/org/apache/impala/service/JniFrontend.java: http://gerrit.cloudera.org:8080/#/c/21026/11/fe/src/main/java/org/apache/impala/service/JniFrontend.java@279 PS11, Line 279: params.getSession() Doesn't it return null anyway if the session is not set? -- To view, visit http://gerrit.cloudera.org:8080/21026 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ide10ccf10fc0abf5c270119ba7092c67e712ec49 Gerrit-Change-Number: 21026 Gerrit-PatchSet: 11 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 28 Mar 2024 15:23:20 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12894: (part 2) Fix optimized count(*) for Iceberg tables with dangling delete files
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/21190 ) Change subject: IMPALA-12894: (part 2) Fix optimized count(*) for Iceberg tables with dangling delete files .. Patch Set 7: Code-Review+2 (1 comment) Carry +2 http://gerrit.cloudera.org:8080/#/c/21190/6//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21190/6//COMMIT_MSG@50 PS6, Line 50: SCAN > This could be 'datafiles with deletes'. Done -- To view, visit http://gerrit.cloudera.org:8080/21190 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie3aca0b0a104f9ca4589cde9643f3f341d4ff99f Gerrit-Change-Number: 21190 Gerrit-PatchSet: 7 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 28 Mar 2024 10:17:43 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12894: (part 2) Fix optimized count(*) for Iceberg tables with dangling delete files
Hello Daniel Becker, Gabor Kaszab, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21190 to look at the new patch set (#7). Change subject: IMPALA-12894: (part 2) Fix optimized count(*) for Iceberg tables with dangling delete files .. IMPALA-12894: (part 2) Fix optimized count(*) for Iceberg tables with dangling delete files Impala can return incorrect results if a table has dangling delete files. Dangling delete files are delete files that are part of the snapshot but they are not applicable to any of the data files. We can have such delete files after Spark's rewrite_data_files action. During analysis we check the existence of delete files based on the snapshot summary. If there are no delete files in the table, we just replace the count(*) expression with NumericLiteral($record_count). If there are delete files in the table (based on the summary), we set optimize_count_star_for_iceberg_v2 in the query context. Without optimize_count_star_for_iceberg_v2 in the query context, the IcebergScanPlanner would create the following plan. AGGREGATE COUNT(*) | UNION ALL / \ / \ / \ SCAN allANTI JOIN datafiles / \ without / \ deletes SCAN SCAN datafilesdeletes with deletes With optimize_count_star_for_iceberg_v2 the final plan looks like the following: ArithmeticExpr(ADD) / \ / \ / \ record_count AGGREGATE of all COUNT(*) datafiles | withoutANTI JOIN deletes / \ / \ SCANSCAN datafiles deletes with deletes The ArithmeticExpr(ADD) and its left child (record_count) is created by the analyzer, IcebergScanPlanner is responsible in creating the plan under AGGREGATE COUNT(*). And if it has delete files and optimize_count_star_for_iceberg_v2 is true, it knows it can omit the original UNION ALL and its left child. However, IcebergScanPlanner checks delete file existence based on the result of planFiles(), hence dangling delete files are eliminated. And if there are no delete files, IcebergScanPlanner assumes that case is already handled by the Analyzer (i.e. it replaced count(*) with NumericLiteral($record_count)). So it will incorrectly create a normal SCAN plan of the table under COUNT(*), i.e. we end up with this: ArithmeticExpr(ADD) / \ / \ / \ record_count AGGREGATE of all COUNT(*) datafiles | without SCAN deletesdatafiles without deletes Which means Impala will yield $record_count * 2 as a result. This patch fixes the FeIcebergTable.hasDeleteFiles() method, so it also ignores dangling delete files. Therefore, the analyzer will just substitute count(*) with NumericLiteral($record_count) if all deletes are dangling, i.e. no need to involve the IcebergScanPlanner at all. The patch also introduces a new query option, "iceberg_disable_count_star_optimization", so users can completely disable the statistic-based count(*)-optimization if necessary. Testing: * e2e tests * planner tests Change-Id: Ie3aca0b0a104f9ca4589cde9643f3f341d4ff99f --- M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables-hash-join.test M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-position-deletes-orc.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-position-deletes.test 11 files changed, 336 insertions(+), 433 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/90/21190/7 -- To view, visit http://gerrit.cloudera.org:8080/21190 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie3aca0b0a104f9ca4589cde9643f3f341d4ff99f Gerrit-Change-Number: 21190 Gerrit-PatchSet: 7 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-12945: Fix Flaky Ticker Test
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/21214 ) Change subject: IMPALA-12945: Fix Flaky Ticker Test .. Patch Set 1: Code-Review+2 Thanks for fixing this, LGTM! -- To view, visit http://gerrit.cloudera.org:8080/21214 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibc8cf03ae68fb3103c5bbc438c32f6565b8c406c Gerrit-Change-Number: 21214 Gerrit-PatchSet: 1 Gerrit-Owner: Jason Fehr Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 28 Mar 2024 09:08:31 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12942: deflake test virtual column file position generic
Zoltan Borok-Nagy has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/21209 ) Change subject: IMPALA-12942: deflake test_virtual_column_file_position_generic .. IMPALA-12942: deflake test_virtual_column_file_position_generic Sometimes the runtime filters don't arrive in time in test test_virtual_column_file_position_generic. This patch increases RUNTIME_FILTER_WAIT_TIME_MS to 30 seconds. Change-Id: I4d7a23389a2dcdd92602c2de22a2fc8f09aa618c Reviewed-on: http://gerrit.cloudera.org:8080/21209 Tested-by: Impala Public Jenkins Reviewed-by: Daniel Becker --- M testdata/workloads/functional-query/queries/QueryTest/virtual-column-file-position-generic.test 1 file changed, 1 insertion(+), 0 deletions(-) Approvals: Impala Public Jenkins: Verified Daniel Becker: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/21209 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I4d7a23389a2dcdd92602c2de22a2fc8f09aa618c Gerrit-Change-Number: 21209 Gerrit-PatchSet: 2 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-12609: Implement SHOW METADATA TABLES IN statement to list Iceberg Metadata tables
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/21026 ) Change subject: IMPALA-12609: Implement SHOW METADATA TABLES IN statement to list Iceberg Metadata tables .. Patch Set 10: Code-Review+2 (2 comments) Small nits, otherwise LGTM! http://gerrit.cloudera.org:8080/#/c/21026/10/fe/src/main/java/org/apache/impala/service/JniFrontend.java File fe/src/main/java/org/apache/impala/service/JniFrontend.java: http://gerrit.cloudera.org:8080/#/c/21026/10/fe/src/main/java/org/apache/impala/service/JniFrontend.java@314 PS10, Line 314: params.isSetSession() ? : new User(TSessionStateUtil.getEffectiveUser(params.getSession())) : : ImpalaInternalAdminUser.getInstance(); nit: Can we move this to a private method? There are 3 usages of this pattern (getTableNames, getMetadataTableNames, getDbs) http://gerrit.cloudera.org:8080/#/c/21026/10/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java: http://gerrit.cloudera.org:8080/#/c/21026/10/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java@4006 PS10, Line 4006: AnalyzesOk("show metadata tables in functional_parquet.iceberg_query_metadata"); We could have an AnalysisError test for a non-Iceberg table. -- To view, visit http://gerrit.cloudera.org:8080/21026 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ide10ccf10fc0abf5c270119ba7092c67e712ec49 Gerrit-Change-Number: 21026 Gerrit-PatchSet: 10 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 27 Mar 2024 16:36:26 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12894: (part 2) Fix optimized count(*) for Iceberg tables with dangling delete files
Hello Daniel Becker, Gabor Kaszab, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21190 to look at the new patch set (#6). Change subject: IMPALA-12894: (part 2) Fix optimized count(*) for Iceberg tables with dangling delete files .. IMPALA-12894: (part 2) Fix optimized count(*) for Iceberg tables with dangling delete files Impala can return incorrect results if a table has dangling delete files. Dangling delete files are delete files that are part of the snapshot but they are not applicable to any of the data files. We can have such delete files after Spark's rewrite_data_files action. During analysis we check the existence of delete files based on the snapshot summary. If there are no delete files in the table, we just replace the count(*) expression with NumericLiteral($record_count). If there are delete files in the table (based on the summary), we set optimize_count_star_for_iceberg_v2 in the query context. Without optimize_count_star_for_iceberg_v2 in the query context, the IcebergScanPlanner would create the following plan. AGGREGATE COUNT(*) | UNION ALL / \ / \ / \ SCAN allANTI JOIN datafiles / \ without / \ deletes SCAN SCAN datafilesdeletes With optimize_count_star_for_iceberg_v2 the final plan looks like the following: ArithmeticExpr(ADD) / \ / \ / \ record_count AGGREGATE of all COUNT(*) datafiles | withoutANTI JOIN deletes / \ / \ SCANSCAN datafiles deletes The ArithmeticExpr(ADD) and its left child (record_count) is created by the analyzer, IcebergScanPlanner is responsible in creating the plan under AGGREGATE COUNT(*). And if it has delete files and optimize_count_star_for_iceberg_v2 is true, it knows it can omit the original UNION ALL and its left child. However, IcebergScanPlanner checks delete file existence based on the result of planFiles(), hence dangling delete files are eliminated. And if there are no delete files, IcebergScanPlanner assumes that case is already handled by the Analyzer (i.e. it replaced count(*) with NumericLiteral($record_count)). So it will incorrectly create a normal SCAN plan of the table under COUNT(*), i.e. we end up with this: ArithmeticExpr(ADD) / \ / \ / \ record_count AGGREGATE of all COUNT(*) datafiles | without SCAN deletesdatafiles without deletes Which means Impala will yield $record_count * 2 as a result. This patch fixes the FeIcebergTable.hasDeleteFiles() method, so it also ignores dangling delete files. Therefore, the analyzer will just substitute count(*) with NumericLiteral($record_count) if all deletes are dangling, i.e. no need to involve the IcebergScanPlanner at all. The patch also introduces a new query option, "iceberg_disable_count_star_optimization", so users can completely disable the statistic-based count(*)-optimization if necessary. Testing: * e2e tests * planner tests Change-Id: Ie3aca0b0a104f9ca4589cde9643f3f341d4ff99f --- M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables-hash-join.test M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-position-deletes-orc.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-position-deletes.test 11 files changed, 336 insertions(+), 433 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/90/21190/6 -- To view, visit http://gerrit.cloudera.org:8080/21190 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie3aca0b0a104f9ca4589cde9643f3f341d4ff99f Gerrit-Change-Number: 21190 Gerrit-PatchSet: 6 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-12600: Schema evolution with equality delete files
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/21210 ) Change subject: IMPALA-12600: Schema evolution with equality delete files .. Patch Set 2: Code-Review+2 Thanks for adding these extra tests, LGTM! -- To view, visit http://gerrit.cloudera.org:8080/21210 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I125f72bade5b79bad5aaa6b676d6afaf3ca98395 Gerrit-Change-Number: 21210 Gerrit-PatchSet: 2 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 27 Mar 2024 15:42:21 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12600: Schema evolution with equality delete files
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/21210 ) Change subject: IMPALA-12600: Schema evolution with equality delete files .. Patch Set 1: The change LGTM, but it would be also nice to see a planner test. -- To view, visit http://gerrit.cloudera.org:8080/21210 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I125f72bade5b79bad5aaa6b676d6afaf3ca98395 Gerrit-Change-Number: 21210 Gerrit-PatchSet: 1 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 27 Mar 2024 13:34:58 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12942: deflake test virtual column file position generic
Zoltan Borok-Nagy has uploaded this change for review. ( http://gerrit.cloudera.org:8080/21209 Change subject: IMPALA-12942: deflake test_virtual_column_file_position_generic .. IMPALA-12942: deflake test_virtual_column_file_position_generic Sometimes the runtime filters don't arrive in time in test test_virtual_column_file_position_generic. This patch increases RUNTIME_FILTER_WAIT_TIME_MS to 30 seconds. Change-Id: I4d7a23389a2dcdd92602c2de22a2fc8f09aa618c --- M testdata/workloads/functional-query/queries/QueryTest/virtual-column-file-position-generic.test 1 file changed, 1 insertion(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/09/21209/1 -- To view, visit http://gerrit.cloudera.org:8080/21209 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I4d7a23389a2dcdd92602c2de22a2fc8f09aa618c Gerrit-Change-Number: 21209 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-12894: (part 2) Fix optimized count(*) for Iceberg tables with dangling delete files
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/21190 ) Change subject: IMPALA-12894: (part 2) Fix optimized count(*) for Iceberg tables with dangling delete files .. Patch Set 5: (2 comments) Thanks for the comments http://gerrit.cloudera.org:8080/#/c/21190/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21190/3//COMMIT_MSG@10 PS3, Line 10: files. Dangling delete files are delete files that are part of the > Could you describe the cause of the bug in more detail? Added a few extra sentences http://gerrit.cloudera.org:8080/#/c/21190/3/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java File fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java: http://gerrit.cloudera.org:8080/#/c/21190/3/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java@983 PS3, Line 983: don > Nit: superfluous "use". Done -- To view, visit http://gerrit.cloudera.org:8080/21190 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie3aca0b0a104f9ca4589cde9643f3f341d4ff99f Gerrit-Change-Number: 21190 Gerrit-PatchSet: 5 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 27 Mar 2024 09:32:02 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12894: (part 2) Fix optimized count(*) for Iceberg tables with dangling delete files
Hello Daniel Becker, Gabor Kaszab, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21190 to look at the new patch set (#4). Change subject: IMPALA-12894: (part 2) Fix optimized count(*) for Iceberg tables with dangling delete files .. IMPALA-12894: (part 2) Fix optimized count(*) for Iceberg tables with dangling delete files Impala can return incorrect results if a table has dangling delete files. Dangling delete files are delete files that are part of the snapshot but they are not applicable to any of the data files. We can have such delete files after Spark's rewrite_data_files action. During analysis we check the existence of delete files based on the snapshot summary. But during planning in IcebergScanPlanner we do it based on planFiles(), i.e. dangling delete files don't count in the latter case. Because of this Impala can create incorrectplans for count(*) optimization. This patch fixes the FeIcebergTable.hasDeleteFiles() method, so it ignores dangling delete files. It also introduces a new query option, "iceberg_disable_count_star_optimization", so users can completely disable the statistic-based count(*)-optimization if necessary. Testing: * e2e tests * planner tests Change-Id: Ie3aca0b0a104f9ca4589cde9643f3f341d4ff99f --- M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables-hash-join.test M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-position-deletes-orc.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-position-deletes.test 11 files changed, 336 insertions(+), 433 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/90/21190/4 -- To view, visit http://gerrit.cloudera.org:8080/21190 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie3aca0b0a104f9ca4589cde9643f3f341d4ff99f Gerrit-Change-Number: 21190 Gerrit-PatchSet: 4 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-12894: (part 2) Fix optimized count(*) for Iceberg tables with dangling delete files
Hello Daniel Becker, Gabor Kaszab, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21190 to look at the new patch set (#5). Change subject: IMPALA-12894: (part 2) Fix optimized count(*) for Iceberg tables with dangling delete files .. IMPALA-12894: (part 2) Fix optimized count(*) for Iceberg tables with dangling delete files Impala can return incorrect results if a table has dangling delete files. Dangling delete files are delete files that are part of the snapshot but they are not applicable to any of the data files. We can have such delete files after Spark's rewrite_data_files action. During analysis we check the existence of delete files based on the snapshot summary. But during planning in IcebergScanPlanner we do it based on planFiles(), i.e. dangling delete files don't count in the latter case. Because of this Impala can create incorrect plans for count(*) optimization. This patch fixes the FeIcebergTable.hasDeleteFiles() method, so it ignores dangling delete files. It also introduces a new query option, "iceberg_disable_count_star_optimization", so users can completely disable the statistic-based count(*)-optimization if necessary. Testing: * e2e tests * planner tests Change-Id: Ie3aca0b0a104f9ca4589cde9643f3f341d4ff99f --- M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables-hash-join.test M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-position-deletes-orc.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-position-deletes.test 11 files changed, 336 insertions(+), 433 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/90/21190/5 -- To view, visit http://gerrit.cloudera.org:8080/21190 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie3aca0b0a104f9ca4589cde9643f3f341d4ff99f Gerrit-Change-Number: 21190 Gerrit-PatchSet: 5 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-12879: Conjunct not referring to table field causes ERROR for Iceberg table
Zoltan Borok-Nagy has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/21179 ) Change subject: IMPALA-12879: Conjunct not referring to table field causes ERROR for Iceberg table .. IMPALA-12879: Conjunct not referring to table field causes ERROR for Iceberg table The following query throws an error for Iceberg tables: select * from ice_tbl where rand() < 0.001; It's because the predicate 'rand() < 0.001' doesn't involve any table columns. Because of a bug in IcebergScanPlanner.hasPartitionTransformType() the method throws an IndexOutOfBoundsException. This patch fixes the method to handle such predicates. Testing: * added e2e tests Change-Id: Id43a6798df3f4cc3a0e00ac610e25aa3b5781342 Reviewed-on: http://gerrit.cloudera.org:8080/21179 Tested-by: Impala Public Jenkins Reviewed-by: Gabor Kaszab --- M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java M testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test 2 files changed, 94 insertions(+), 2 deletions(-) Approvals: Impala Public Jenkins: Verified Gabor Kaszab: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/21179 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Id43a6798df3f4cc3a0e00ac610e25aa3b5781342 Gerrit-Change-Number: 21179 Gerrit-PatchSet: 6 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-12879: Conjunct not referring to table field causes ERROR for Iceberg table
Hello Daniel Becker, Gabor Kaszab, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21179 to look at the new patch set (#5). Change subject: IMPALA-12879: Conjunct not referring to table field causes ERROR for Iceberg table .. IMPALA-12879: Conjunct not referring to table field causes ERROR for Iceberg table The following query throws an error for Iceberg tables: select * from ice_tbl where rand() < 0.001; It's because the predicate 'rand() < 0.001' doesn't involve any table columns. Because of a bug in IcebergScanPlanner.hasPartitionTransformType() the method throws an IndexOutOfBoundsException. This patch fixes the method to handle such predicates. Testing: * added e2e tests Change-Id: Id43a6798df3f4cc3a0e00ac610e25aa3b5781342 --- M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java M testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test 2 files changed, 94 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/79/21179/5 -- To view, visit http://gerrit.cloudera.org:8080/21179 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id43a6798df3f4cc3a0e00ac610e25aa3b5781342 Gerrit-Change-Number: 21179 Gerrit-PatchSet: 5 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-12894: Optimized count(*) for Iceberg gives wrong results after a Spark rewrite data files
Hello Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21190 to look at the new patch set (#3). Change subject: IMPALA-12894: Optimized count(*) for Iceberg gives wrong results after a Spark rewrite_data_files .. IMPALA-12894: Optimized count(*) for Iceberg gives wrong results after a Spark rewrite_data_files Impala can return incorrect results if a table has dangling delete files. During analysis we check the existence of delete files based on the snapshot summary. But during planning in IcebergScanPlanner we do it based on planFiles(), i.e. dangling delete files don't count in the latter case. Because of this Impala can create incorrect plans for count(*) optimization. This patch fixes the FeIcebergTable.hasDeleteFiles() method, so it ignores dangling delete files. It also introduces a new query option, "iceberg_disable_count_star_optimization", so users can completely disable the statistic-based count(*)-optimization if necessary. Testing: * e2e tests * planner tests Change-Id: Ie3aca0b0a104f9ca4589cde9643f3f341d4ff99f --- M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables-hash-join.test M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-position-deletes-orc.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-position-deletes.test 11 files changed, 336 insertions(+), 433 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/90/21190/3 -- To view, visit http://gerrit.cloudera.org:8080/21190 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie3aca0b0a104f9ca4589cde9643f3f341d4ff99f Gerrit-Change-Number: 21190 Gerrit-PatchSet: 3 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-12903: Querying virtual column FILE POSITION for TEXT and JSON tables crashes Impala
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/21148 ) Change subject: IMPALA-12903: Querying virtual column FILE__POSITION for TEXT and JSON tables crashes Impala .. Patch Set 6: Code-Review+2 Carry +2 -- To view, visit http://gerrit.cloudera.org:8080/21148 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8e1af8d526f9046aceddb5944da9e6f9c63768b0 Gerrit-Change-Number: 21148 Gerrit-PatchSet: 6 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Zihao Ye Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 25 Mar 2024 13:55:08 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12903: Querying virtual column FILE POSITION for TEXT and JSON tables crashes Impala
Zoltan Borok-Nagy has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/21148 ) Change subject: IMPALA-12903: Querying virtual column FILE__POSITION for TEXT and JSON tables crashes Impala .. IMPALA-12903: Querying virtual column FILE__POSITION for TEXT and JSON tables crashes Impala Impala generates segmentation fault when it queries the virtual column FILE__POSITION for TEXT or JSON tables. When the scanners that do not support the FILE__POSITION virtual column detect its presence they try to report an error and close themselves. The segfault is in the scanners' Close() method when they try to dereference a NULL stream object. This patch simply adds NULL-checks in Close(). Alternatively we could detect the presence of FILE__POSITION during planning in the HdfsScanNode, but doing it in the scanners lets us handle more queries, e.g. queries that dynamically prune partitions and the surviving partitions all have file formats that support FILE__POSITION. Testing: * added negative tests to properly report the errors * added tests for mixed file format tables Change-Id: I8e1af8d526f9046aceddb5944da9e6f9c63768b0 Reviewed-on: http://gerrit.cloudera.org:8080/21148 Tested-by: Impala Public Jenkins Reviewed-by: Zoltan Borok-Nagy --- M be/src/exec/json/hdfs-json-scanner.cc M be/src/exec/text/hdfs-text-scanner.cc M testdata/workloads/functional-query/queries/QueryTest/virtual-column-file-position-generic.test A testdata/workloads/functional-query/queries/QueryTest/virtual-column-file-position-negative.test M tests/query_test/test_scanners.py 5 files changed, 94 insertions(+), 3 deletions(-) Approvals: Impala Public Jenkins: Verified Zoltan Borok-Nagy: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/21148 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I8e1af8d526f9046aceddb5944da9e6f9c63768b0 Gerrit-Change-Number: 21148 Gerrit-PatchSet: 7 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Zihao Ye Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-12915: Use libgtest.so when built with shared libs
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/21163 ) Change subject: IMPALA-12915: Use libgtest.so when built with shared libs .. Patch Set 3: Code-Review+2 Thanks for fixing this! -- To view, visit http://gerrit.cloudera.org:8080/21163 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I27d21217db219f52b072a4e5cfa1caaace35d1a2 Gerrit-Change-Number: 21163 Gerrit-PatchSet: 3 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 25 Mar 2024 09:49:45 + Gerrit-HasComments: No
[Impala-ASF-CR] PRELIMINIARY COUNT(*)
Zoltan Borok-Nagy has abandoned this change. ( http://gerrit.cloudera.org:8080/21189 ) Change subject: PRELIMINIARY COUNT(*) .. Abandoned -- To view, visit http://gerrit.cloudera.org:8080/21189 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: abandon Gerrit-Change-Id: I13a7cbb926d4ca56bc17690d61652fb837ebd672 Gerrit-Change-Number: 21189 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-12894: Optimized count(*) for Iceberg gives wrong results after a Spark rewrite data files
Zoltan Borok-Nagy has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/21190 ) Change subject: IMPALA-12894: Optimized count(*) for Iceberg gives wrong results after a Spark rewrite_data_files .. IMPALA-12894: Optimized count(*) for Iceberg gives wrong results after a Spark rewrite_data_files Impala can return incorrect results if a table has dangling delete files. During analysis we check the existence of delete files based on the snapshot summary. But during planning in IcebergScanPlanner we do it based on planFiles(), i.e. dangling delete files don't count in the latter case. Because of this Impala can create incorrect plans for count(*) optimization. This patch fixes the FeIcebergTable.hasDeleteFiles() method, so it ignores dangling delete files. TODO: * introduce query option so we can completely disable the count(*) optimization Testing: * e2e tests * planner tests Change-Id: Ie3aca0b0a104f9ca4589cde9643f3f341d4ff99f --- M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables-hash-join.test M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-position-deletes-orc.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-position-deletes.test 7 files changed, 307 insertions(+), 431 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/90/21190/2 -- To view, visit http://gerrit.cloudera.org:8080/21190 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie3aca0b0a104f9ca4589cde9643f3f341d4ff99f Gerrit-Change-Number: 21190 Gerrit-PatchSet: 2 Gerrit-Owner: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-12894: Optimized count(*) for Iceberg gives wrong results after a Spark rewrite data files
Zoltan Borok-Nagy has uploaded this change for review. ( http://gerrit.cloudera.org:8080/21190 Change subject: IMPALA-12894: Optimized count(*) for Iceberg gives wrong results after a Spark rewrite_data_files .. IMPALA-12894: Optimized count(*) for Iceberg gives wrong results after a Spark rewrite_data_files Impala can return incorrect results if a table has dangling delete files. During analysis we check the existence of delete files based on the snapshot summary. But during planning in IcebergScanPlanner we do it based on planFiles(), i.e. dangling delete files don't count in the latter case. Because of this Impala can create incorrect plans for count(*) optimization. This patch fixes the FeIcebergTable.hasDeleteFiles() method, so it ignores dangling delete files. TODO: * introduce query option so we can completely disable the count(*) optimization Testing: * e2e tests * planner tests Change-Id: Ie3aca0b0a104f9ca4589cde9643f3f341d4ff99f --- M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables-hash-join.test M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-position-deletes-orc.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-position-deletes.test 7 files changed, 307 insertions(+), 430 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/90/21190/1 -- To view, visit http://gerrit.cloudera.org:8080/21190 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ie3aca0b0a104f9ca4589cde9643f3f341d4ff99f Gerrit-Change-Number: 21190 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy
[Impala-ASF-CR] PRELIMINIARY COUNT(*)
Zoltan Borok-Nagy has uploaded this change for review. ( http://gerrit.cloudera.org:8080/21189 Change subject: PRELIMINIARY COUNT(*) .. PRELIMINIARY COUNT(*) Change-Id: I13a7cbb926d4ca56bc17690d61652fb837ebd672 --- M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java 1 file changed, 1 insertion(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/89/21189/1 -- To view, visit http://gerrit.cloudera.org:8080/21189 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I13a7cbb926d4ca56bc17690d61652fb837ebd672 Gerrit-Change-Number: 21189 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-12903: Querying virtual column FILE POSITION for TEXT and JSON tables crashes Impala
Hello Quanlong Huang, Daniel Becker, Riza Suminto, Gabor Kaszab, Zihao Ye, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21148 to look at the new patch set (#6). Change subject: IMPALA-12903: Querying virtual column FILE__POSITION for TEXT and JSON tables crashes Impala .. IMPALA-12903: Querying virtual column FILE__POSITION for TEXT and JSON tables crashes Impala Impala generates segmentation fault when it queries the virtual column FILE__POSITION for TEXT or JSON tables. When the scanners that do not support the FILE__POSITION virtual column detect its presence they try to report an error and close themselves. The segfault is in the scanners' Close() method when they try to dereference a NULL stream object. This patch simply adds NULL-checks in Close(). Alternatively we could detect the presence of FILE__POSITION during planning in the HdfsScanNode, but doing it in the scanners lets us handle more queries, e.g. queries that dynamically prune partitions and the surviving partitions all have file formats that support FILE__POSITION. Testing: * added negative tests to properly report the errors * added tests for mixed file format tables Change-Id: I8e1af8d526f9046aceddb5944da9e6f9c63768b0 --- M be/src/exec/json/hdfs-json-scanner.cc M be/src/exec/text/hdfs-text-scanner.cc M testdata/workloads/functional-query/queries/QueryTest/virtual-column-file-position-generic.test A testdata/workloads/functional-query/queries/QueryTest/virtual-column-file-position-negative.test M tests/query_test/test_scanners.py 5 files changed, 94 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/48/21148/6 -- To view, visit http://gerrit.cloudera.org:8080/21148 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8e1af8d526f9046aceddb5944da9e6f9c63768b0 Gerrit-Change-Number: 21148 Gerrit-PatchSet: 6 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Zihao Ye Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-12879: Conjunct not referring to table field causes ERROR for Iceberg table
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/21179 ) Change subject: IMPALA-12879: Conjunct not referring to table field causes ERROR for Iceberg table .. Patch Set 4: Instead of using rand() I switched to rand(SEED) as the seed-generation seems to be system-specific. -- To view, visit http://gerrit.cloudera.org:8080/21179 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id43a6798df3f4cc3a0e00ac610e25aa3b5781342 Gerrit-Change-Number: 21179 Gerrit-PatchSet: 4 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 22 Mar 2024 16:10:44 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12879: Conjunct not referring to table field causes ERROR for Iceberg table
Hello Daniel Becker, Gabor Kaszab, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21179 to look at the new patch set (#4). Change subject: IMPALA-12879: Conjunct not referring to table field causes ERROR for Iceberg table .. IMPALA-12879: Conjunct not referring to table field causes ERROR for Iceberg table The following query throws an error for Iceberg tables: select * from ice_tbl where rand() < 0.001; It's because the predicate 'rand() < 0.001' doesn't involve any table columns. Because of a bug in IcebergScanPlanner.hasPartitionTransformType() the method throws an IndexOutOfBoundsException. This patch fixes the method to handle such predicates. Testing: * added e2e tests Change-Id: Id43a6798df3f4cc3a0e00ac610e25aa3b5781342 --- M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java M testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test 2 files changed, 106 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/79/21179/4 -- To view, visit http://gerrit.cloudera.org:8080/21179 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id43a6798df3f4cc3a0e00ac610e25aa3b5781342 Gerrit-Change-Number: 21179 Gerrit-PatchSet: 4 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-12809: Iceberg metadata table scanner should always be scheduled to the coordinator
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/21138 ) Change subject: IMPALA-12809: Iceberg metadata table scanner should always be scheduled to the coordinator .. Patch Set 3: (1 comment) Just quickly went over the code. Looks good overall, but could you please add planner tests? http://gerrit.cloudera.org:8080/#/c/21138/3/fe/src/main/java/org/apache/impala/planner/PlanFragment.java File fe/src/main/java/org/apache/impala/planner/PlanFragment.java: http://gerrit.cloudera.org:8080/#/c/21138/3/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@192 PS3, Line 192: Preconditions.checkState(!coordinatorOnly || : dataPartition_.equals(DataPartition.UNPARTITIONED)); Could you please add a comment for this? -- To view, visit http://gerrit.cloudera.org:8080/21138 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib4397f64e9def42d2b84ffd7bc14ff31df27d58e Gerrit-Change-Number: 21138 Gerrit-PatchSet: 3 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 22 Mar 2024 10:54:47 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12898: Tidy up test dimensions of test scanner.py
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/21162 ) Change subject: IMPALA-12898: Tidy up test dimensions of test_scanner.py .. Patch Set 3: Code-Review+2 Thanks for applying the changes! LGTM! -- To view, visit http://gerrit.cloudera.org:8080/21162 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5efd2b483338fb55b958d8e1a0acf6b365f8093e Gerrit-Change-Number: 21162 Gerrit-PatchSet: 3 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 22 Mar 2024 10:35:43 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12879: Conjunct not referring to table field causes ERROR for Iceberg table
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/21179 ) Change subject: IMPALA-12879: Conjunct not referring to table field causes ERROR for Iceberg table .. Patch Set 2: (2 comments) Thanks for the comments! http://gerrit.cloudera.org:8080/#/c/21179/1/testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test File testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test: http://gerrit.cloudera.org:8080/#/c/21179/1/testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test@1192 PS1, Line 1192: select * from iceberg_avro_format where rand() < 0.5; > Isn't this flaky because of the rand()? Or is it not that random? :) It is not that random :) It can take a seed, so the following would be truly random: rand(unix_timestamp()) < 0.5 http://gerrit.cloudera.org:8080/#/c/21179/1/testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test@1271 PS1, Line 1271: 1460 > Would it make sense to involve some time travel too? Sure, done. -- To view, visit http://gerrit.cloudera.org:8080/21179 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id43a6798df3f4cc3a0e00ac610e25aa3b5781342 Gerrit-Change-Number: 21179 Gerrit-PatchSet: 2 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 21 Mar 2024 17:36:59 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12879: Conjunct not referring to table field causes ERROR for Iceberg table
Hello Daniel Becker, Gabor Kaszab, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21179 to look at the new patch set (#2). Change subject: IMPALA-12879: Conjunct not referring to table field causes ERROR for Iceberg table .. IMPALA-12879: Conjunct not referring to table field causes ERROR for Iceberg table The following query throws an error for Iceberg tables: select * from ice_tbl where rand() < 0.001; It's because the predicate 'rand() < 0.001' doesn't involve any table columns. Because of a bug in IcebergScanPlanner.hasPartitionTransformType() the method throws an IndexOutOfBoundsException. This patch fixes the method to handle such predicates. Testing: * added e2e tests Change-Id: Id43a6798df3f4cc3a0e00ac610e25aa3b5781342 --- M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java M testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test 2 files changed, 106 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/79/21179/2 -- To view, visit http://gerrit.cloudera.org:8080/21179 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id43a6798df3f4cc3a0e00ac610e25aa3b5781342 Gerrit-Change-Number: 21179 Gerrit-PatchSet: 2 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-12898: Tidy up test dimensions of test scanner.py
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/21162 ) Change subject: IMPALA-12898: Tidy up test dimensions of test_scanner.py .. Patch Set 1: (17 comments) Thanks for working on this! Mostly found style issues http://gerrit.cloudera.org:8080/#/c/21162/1/tests/query_test/test_scanners.py File tests/query_test/test_scanners.py: http://gerrit.cloudera.org:8080/#/c/21162/1/tests/query_test/test_scanners.py@74 PS1, Line 74: return [0, 1] Earlier [0, 1, 4] was used in core. Do you think it's not a problem to decrease the values in that dimension? http://gerrit.cloudera.org:8080/#/c/21162/1/tests/query_test/test_scanners.py@81 PS1, Line 81: return [0, 1] Same as above, earlier it was [0, 1, 16] in core. http://gerrit.cloudera.org:8080/#/c/21162/1/tests/query_test/test_scanners.py@100 PS1, Line 100: nit: 4 spaces are needed http://gerrit.cloudera.org:8080/#/c/21162/1/tests/query_test/test_scanners.py@157 PS1, Line 157: nit: 4 spaces are needed http://gerrit.cloudera.org:8080/#/c/21162/1/tests/query_test/test_scanners.py@201 PS1, Line 201: nit: indentation is off http://gerrit.cloudera.org:8080/#/c/21162/1/tests/query_test/test_scanners.py@203 PS1, Line 203: nit: indentation http://gerrit.cloudera.org:8080/#/c/21162/1/tests/query_test/test_scanners.py@259 PS1, Line 259: nit: indentation http://gerrit.cloudera.org:8080/#/c/21162/1/tests/query_test/test_scanners.py@353 PS1, Line 353: nit: indentation http://gerrit.cloudera.org:8080/#/c/21162/1/tests/query_test/test_scanners.py@418 PS1, Line 418: and nit: The original lines were more aligned, so I'm not sure if that formatting change is beneficial http://gerrit.cloudera.org:8080/#/c/21162/1/tests/query_test/test_scanners.py@633 PS1, Line 633: nit: needs +4 indent instead of +2 http://gerrit.cloudera.org:8080/#/c/21162/1/tests/query_test/test_scanners.py@1023 PS1, Line 1023: nit: 4 spaces are needed. Same for L1025 http://gerrit.cloudera.org:8080/#/c/21162/1/tests/query_test/test_scanners.py@1443 PS1, Line 1443: nit: indentation http://gerrit.cloudera.org:8080/#/c/21162/1/tests/query_test/test_scanners.py@1474 PS1, Line 1474: nit: indentation http://gerrit.cloudera.org:8080/#/c/21162/1/tests/query_test/test_scanners.py@1508 PS1, Line 1508: + nit: originally the lines were more aligned, so I'm not sure about this change in formatting. Is it enforced by PEP8? http://gerrit.cloudera.org:8080/#/c/21162/1/tests/query_test/test_scanners.py@1595 PS1, Line 1595: nit: indentation http://gerrit.cloudera.org:8080/#/c/21162/1/tests/query_test/test_scanners.py@1636 PS1, Line 1636: nit: indentation http://gerrit.cloudera.org:8080/#/c/21162/1/tests/query_test/test_scanners.py@1994 PS1, Line 1994: nit: indentation -- To view, visit http://gerrit.cloudera.org:8080/21162 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5efd2b483338fb55b958d8e1a0acf6b365f8093e Gerrit-Change-Number: 21162 Gerrit-PatchSet: 1 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 21 Mar 2024 17:21:57 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12903: Querying virtual column FILE POSITION for TEXT and JSON tables crashes Impala
Hello Quanlong Huang, Daniel Becker, Riza Suminto, Gabor Kaszab, Zihao Ye, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21148 to look at the new patch set (#4). Change subject: IMPALA-12903: Querying virtual column FILE__POSITION for TEXT and JSON tables crashes Impala .. IMPALA-12903: Querying virtual column FILE__POSITION for TEXT and JSON tables crashes Impala Impala generates segmentation fault when it queries the virtual column FILE__POSITION for TEXT or JSON tables. When the scanners that do not support the FILE__POSITION virtual column detect its presence they try to report an error and close themselves. The segfault is in the scanners' Close() method when they try to dereference a NULL stream object. This patch simply adds NULL-checks in Close(). Alternatively we could detect the presence of FILE__POSITION during planning in the HdfsScanNode, but doing it in the scanners lets us handle more queries, e.g. queries that dynamically prune partitions and the surviving partitions all have file formats that support FILE__POSITION. Testing: * added negative tests to properly report the errors * added tests for mixed file format tables Change-Id: I8e1af8d526f9046aceddb5944da9e6f9c63768b0 --- M be/src/exec/json/hdfs-json-scanner.cc M be/src/exec/text/hdfs-text-scanner.cc M testdata/workloads/functional-query/queries/QueryTest/virtual-column-file-position-generic.test A testdata/workloads/functional-query/queries/QueryTest/virtual-column-file-position-negative.test M tests/query_test/test_scanners.py 5 files changed, 92 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/48/21148/4 -- To view, visit http://gerrit.cloudera.org:8080/21148 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8e1af8d526f9046aceddb5944da9e6f9c63768b0 Gerrit-Change-Number: 21148 Gerrit-PatchSet: 4 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Zihao Ye Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-12879: Conjunct not referring to table field causes ERROR for Iceberg table
Zoltan Borok-Nagy has uploaded this change for review. ( http://gerrit.cloudera.org:8080/21179 Change subject: IMPALA-12879: Conjunct not referring to table field causes ERROR for Iceberg table .. IMPALA-12879: Conjunct not referring to table field causes ERROR for Iceberg table The following query throws an error for Iceberg tables: select * from ice_tbl where rand() < 0.001; It's because the predicate 'rand() < 0.001' doesn't involve any table columns. Because of a bug in IcebergScanPlanner.hasPartitionTransformType() the method throws an IndexOutOfBoundsException. This patch fixes the method to handle such predicates. Testing: * added e2e tests Change-Id: Id43a6798df3f4cc3a0e00ac610e25aa3b5781342 --- M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java M testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test 2 files changed, 85 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/79/21179/1 -- To view, visit http://gerrit.cloudera.org:8080/21179 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Id43a6798df3f4cc3a0e00ac610e25aa3b5781342 Gerrit-Change-Number: 21179 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-12443: Add catalog timeline for all DDL profiles
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/20491 ) Change subject: IMPALA-12443: Add catalog timeline for all DDL profiles .. Patch Set 15: Code-Review+2 Nice feature! LGTM! -- To view, visit http://gerrit.cloudera.org:8080/20491 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifbceefaeb24c66eb1a064c449d6f56077ea347c5 Gerrit-Change-Number: 20491 Gerrit-PatchSet: 15 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Zihao Ye Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 21 Mar 2024 10:35:24 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12893: (part 1) Specify 'format-version' explicitly in Iceberg tests
Zoltan Borok-Nagy has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/21167 ) Change subject: IMPALA-12893: (part 1) Specify 'format-version' explicitly in Iceberg tests .. IMPALA-12893: (part 1) Specify 'format-version' explicitly in Iceberg tests This CR is the first step to upgrade to Iceberg 1.4.3. The biggest change in behavior in Iceberg 1.4.3 is that Iceberg V2 tables are the default. Because of this we update some test files to explicitly create V1/V2 tables. We also introduce test files that create Iceberg tables without explicitly specifying the format version, these tests have the name *-default.test. The latter tests will need to be updated when we actually upgrade to Iceberg 1.4.3. Change-Id: Ieb4f6c1b206d1d4fd878f07ea5f1436dcae560cd Reviewed-on: http://gerrit.cloudera.org:8080/21167 Tested-by: Impala Public Jenkins Reviewed-by: Andrew Sherman --- R testdata/workloads/functional-query/queries/QueryTest/iceberg-alter-default.test C testdata/workloads/functional-query/queries/QueryTest/iceberg-alter-v1.test C testdata/workloads/functional-query/queries/QueryTest/iceberg-alter-v2.test R testdata/workloads/functional-query/queries/QueryTest/iceberg-partitioned-insert-default.test A testdata/workloads/functional-query/queries/QueryTest/iceberg-partitioned-insert-v1.test A testdata/workloads/functional-query/queries/QueryTest/iceberg-partitioned-insert-v2.test M tests/query_test/test_iceberg.py 7 files changed, 1,672 insertions(+), 120 deletions(-) Approvals: Impala Public Jenkins: Verified Andrew Sherman: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/21167 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Ieb4f6c1b206d1d4fd878f07ea5f1436dcae560cd Gerrit-Change-Number: 21167 Gerrit-PatchSet: 2 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-12893: (part 1) Specify 'format-version' explicitly in Iceberg tests
Zoltan Borok-Nagy has uploaded this change for review. ( http://gerrit.cloudera.org:8080/21167 Change subject: IMPALA-12893: (part 1) Specify 'format-version' explicitly in Iceberg tests .. IMPALA-12893: (part 1) Specify 'format-version' explicitly in Iceberg tests This CR is the first step to upgrade to Iceberg 1.4.3. The biggest change in behavior in Iceberg 1.4.3 is that Iceberg V2 tables are the default. Because of this we update some test files to explicitly create V1/V2 tables. We also introduce test files that create Iceberg tables without explicitly specifying the format version, these tests have the name *-default.test. The latter tests will need to be updated when we actually upgrade to Iceberg 1.4.3. Change-Id: Ieb4f6c1b206d1d4fd878f07ea5f1436dcae560cd --- R testdata/workloads/functional-query/queries/QueryTest/iceberg-alter-default.test C testdata/workloads/functional-query/queries/QueryTest/iceberg-alter-v1.test C testdata/workloads/functional-query/queries/QueryTest/iceberg-alter-v2.test R testdata/workloads/functional-query/queries/QueryTest/iceberg-partitioned-insert-default.test A testdata/workloads/functional-query/queries/QueryTest/iceberg-partitioned-insert-v1.test A testdata/workloads/functional-query/queries/QueryTest/iceberg-partitioned-insert-v2.test M tests/query_test/test_iceberg.py 7 files changed, 1,672 insertions(+), 120 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/67/21167/1 -- To view, visit http://gerrit.cloudera.org:8080/21167 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ieb4f6c1b206d1d4fd878f07ea5f1436dcae560cd Gerrit-Change-Number: 21167 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-12904: test type conversions hive3 silently passes because of wrongly defined test dimensions
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/21151 ) Change subject: IMPALA-12904: test_type_conversions_hive3 silently passes because of wrongly defined test dimensions .. Patch Set 4: (3 comments) Thanks for the comments! http://gerrit.cloudera.org:8080/#/c/21151/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21151/1//COMMIT_MSG@21 PS1, Line 21: f3f3b1427b20a1d2d28 > Maybe one day we want to add this back for IMPALA-12349. It's ok to remove Thanks for pointing me to IMPALA-12349. If we have such plans then I think we shouldn't remove test_type_conversions_hive2 because we might forget to re-add it later. I didn't fix the column names in test_type_conversions_hive2 because I cannot test it, but at least I've left a hint for the future contributor of IMPALA-12349. http://gerrit.cloudera.org:8080/#/c/21151/2/tests/query_test/test_scanners.py File tests/query_test/test_scanners.py: http://gerrit.cloudera.org:8080/#/c/21151/2/tests/query_test/test_scanners.py@1717 PS2, Line 1717: # TODO(IMPALA-12349): Rename the columns to use the correct names (see > line has trailing whitespace Done http://gerrit.cloudera.org:8080/#/c/21151/2/tests/query_test/test_scanners.py@1717 PS2, Line 1717: > flake8: W291 trailing whitespace Done -- To view, visit http://gerrit.cloudera.org:8080/21151 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I786a5eaae9243b4728484f3f3b1427b20a1d2d28 Gerrit-Change-Number: 21151 Gerrit-PatchSet: 4 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 18 Mar 2024 16:18:20 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12904: test type conversions hive3 silently passes because of wrongly defined test dimensions
Hello Quanlong Huang, Riza Suminto, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21151 to look at the new patch set (#4). Change subject: IMPALA-12904: test_type_conversions_hive3 silently passes because of wrongly defined test dimensions .. IMPALA-12904: test_type_conversions_hive3 silently passes because of wrongly defined test dimensions test_type_conversions_hive3 silently passes because we are not creating the test dimenstion for query option orc_shema_resolution correctly. If we set orc_shema_resolution correctly, i.e. to also exercise the name-based schema resolution, the test fails. The cause of the failure is that the ill-typed tables have dummy column names like 'c1', 'c2', etc. These are completely fine for position-based schema resolution, but it is not OK for name-based schema resolution. The test just wants to check error messages related to type errors, the column names are irrelevant, so we can just use the correct names. Change-Id: I786a5eaae9243b4728484f3f3b1427b20a1d2d28 --- M testdata/workloads/functional-query/queries/DataErrorsTest/orc-type-checks.test M tests/query_test/test_scanners.py 2 files changed, 44 insertions(+), 36 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/51/21151/4 -- To view, visit http://gerrit.cloudera.org:8080/21151 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I786a5eaae9243b4728484f3f3b1427b20a1d2d28 Gerrit-Change-Number: 21151 Gerrit-PatchSet: 4 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto
[Impala-ASF-CR] IMPALA-12904: test type conversions hive3 silently passes because of wrongly defined test dimensions
Hello Quanlong Huang, Riza Suminto, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21151 to look at the new patch set (#3). Change subject: IMPALA-12904: test_type_conversions_hive3 silently passes because of wrongly defined test dimensions .. IMPALA-12904: test_type_conversions_hive3 silently passes because of wrongly defined test dimensions test_type_conversions_hive3 silently passes because we are not creating the test dimenstion for query option orc_shema_resolution correctly. If we set orc_shema_resolution correctly, i.e. to also exercise the name-based schema resolution, the test fails. The cause of the failure is that the ill-typed tables have dummy column names like 'c1', 'c2', etc. These are completely fine for position-based schema resolution, but it is not OK for name-based schema resolution. The test just wants to check error messages related to type errors, the column names are irrelevant, so we can just use the correct names. The test was copied from the old test_type_conversions_hive2 which is not relevant anymore, so this CR also removes it. Change-Id: I786a5eaae9243b4728484f3f3b1427b20a1d2d28 --- M testdata/workloads/functional-query/queries/DataErrorsTest/orc-type-checks.test M tests/query_test/test_scanners.py 2 files changed, 44 insertions(+), 36 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/51/21151/3 -- To view, visit http://gerrit.cloudera.org:8080/21151 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I786a5eaae9243b4728484f3f3b1427b20a1d2d28 Gerrit-Change-Number: 21151 Gerrit-PatchSet: 3 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto
[Impala-ASF-CR] IMPALA-12904: test type conversions hive3 silently passes because of wrongly defined test dimensions
Hello Quanlong Huang, Riza Suminto, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21151 to look at the new patch set (#2). Change subject: IMPALA-12904: test_type_conversions_hive3 silently passes because of wrongly defined test dimensions .. IMPALA-12904: test_type_conversions_hive3 silently passes because of wrongly defined test dimensions test_type_conversions_hive3 silently passes because we are not creating the test dimenstion for query option orc_shema_resolution correctly. If we set orc_shema_resolution correctly, i.e. to also exercise the name-based schema resolution, the test fails. The cause of the failure is that the ill-typed tables have dummy column names like 'c1', 'c2', etc. These are completely fine for position-based schema resolution, but it is not OK for name-based schema resolution. The test just wants to check error messages related to type errors, the column names are irrelevant, so we can just use the correct names. The test was copied from the old test_type_conversions_hive2 which is not relevant anymore, so this CR also removes it. Change-Id: I786a5eaae9243b4728484f3f3b1427b20a1d2d28 --- M testdata/workloads/functional-query/queries/DataErrorsTest/orc-type-checks.test M tests/query_test/test_scanners.py 2 files changed, 44 insertions(+), 36 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/51/21151/2 -- To view, visit http://gerrit.cloudera.org:8080/21151 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I786a5eaae9243b4728484f3f3b1427b20a1d2d28 Gerrit-Change-Number: 21151 Gerrit-PatchSet: 2 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto
[Impala-ASF-CR] IMPALA-12903: Querying virtual column FILE POSITION for TEXT and JSON tables crashes Impala
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/21148 ) Change subject: IMPALA-12903: Querying virtual column FILE__POSITION for TEXT and JSON tables crashes Impala .. Patch Set 3: (6 comments) Thanks for the comments! http://gerrit.cloudera.org:8080/#/c/21148/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21148/1//COMMIT_MSG@19 PS1, Line 19: let > Nit: lets. Done http://gerrit.cloudera.org:8080/#/c/21148/1/testdata/workloads/functional-query/queries/QueryTest/virtual-column-file-position-generic.test File testdata/workloads/functional-query/queries/QueryTest/virtual-column-file-position-generic.test: http://gerrit.cloudera.org:8080/#/c/21148/1/testdata/workloads/functional-query/queries/QueryTest/virtual-column-file-position-generic.test@158 PS1, Line 158: QUERY > Are these the queries where some files in the table do not support FILE_POS Done http://gerrit.cloudera.org:8080/#/c/21148/1/testdata/workloads/functional-query/queries/QueryTest/virtual-column-file-position-generic.test@159 PS1, Line 159: # Regression test for IMPALA-12903. The following query uses static pruning. The surviving > nit: could you add a comment that in this test we prune partitions that doe Done http://gerrit.cloudera.org:8080/#/c/21148/1/testdata/workloads/functional-query/queries/QueryTest/virtual-column-file-position-negative.test File testdata/workloads/functional-query/queries/QueryTest/virtual-column-file-position-negative.test: http://gerrit.cloudera.org:8080/#/c/21148/1/testdata/workloads/functional-query/queries/QueryTest/virtual-column-file-position-negative.test@1 PS1, Line 1: > Is FILE_POSITION the only virtual column that could cause this bug before t Only FILE__POSITION cause this problem. INPUT__FILE__NAME is supported for all file formats. http://gerrit.cloudera.org:8080/#/c/21148/1/testdata/workloads/functional-query/queries/QueryTest/virtual-column-file-position-negative.test@40 PS1, Line 40: Virtual column FILE__POSITION is not supported > Could you replace some of the FILE__POSITIONS to some other virtual columns INPUT__FILE__NAME is supported for all file formats. http://gerrit.cloudera.org:8080/#/c/21148/1/tests/query_test/test_scanners.py File tests/query_test/test_scanners.py: http://gerrit.cloudera.org:8080/#/c/21148/1/tests/query_test/test_scanners.py@183 PS1, Line 183: ))) > Or just fix the table_format dimension to text/none and remove this constra Thanks for the suggestions. I went with the uncompressed text dimension option. -- To view, visit http://gerrit.cloudera.org:8080/21148 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8e1af8d526f9046aceddb5944da9e6f9c63768b0 Gerrit-Change-Number: 21148 Gerrit-PatchSet: 3 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 18 Mar 2024 10:18:12 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12903: Querying virtual column FILE POSITION for TEXT and JSON tables crashes Impala
Hello Daniel Becker, Riza Suminto, Gabor Kaszab, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21148 to look at the new patch set (#3). Change subject: IMPALA-12903: Querying virtual column FILE__POSITION for TEXT and JSON tables crashes Impala .. IMPALA-12903: Querying virtual column FILE__POSITION for TEXT and JSON tables crashes Impala Impala generates segmentation fault when it queries the virtual column FILE__POSITION for TEXT or JSON tables. When the scanners that do not support the FILE__POSITION virtual column detect its presence they try to report an error and close themselves. The segfault is in the scanners' Close() method when they try to dereference a NULL stream object. This patch simply adds NULL-checks in Close(). Alternatively we could detect the presence of FILE__POSITION during planning in the HdfsScanNode, but doing it in the scanners lets us handle more queries, e.g. queries that dynamically prune partitions and the surviving partitions all have file formats that support FILE__POSITION. Testing: * added negative tests to properly report the errors * added tests for mixed file format tables Change-Id: I8e1af8d526f9046aceddb5944da9e6f9c63768b0 --- M be/src/exec/json/hdfs-json-scanner.cc M be/src/exec/text/hdfs-text-scanner.cc M testdata/workloads/functional-query/queries/QueryTest/virtual-column-file-position-generic.test A testdata/workloads/functional-query/queries/QueryTest/virtual-column-file-position-negative.test M tests/query_test/test_scanners.py 5 files changed, 92 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/48/21148/3 -- To view, visit http://gerrit.cloudera.org:8080/21148 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8e1af8d526f9046aceddb5944da9e6f9c63768b0 Gerrit-Change-Number: 21148 Gerrit-PatchSet: 3 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto
[Impala-ASF-CR] IMPALA-12903: Querying virtual column FILE POSITION for TEXT and JSON tables crashes Impala
Hello Daniel Becker, Riza Suminto, Gabor Kaszab, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21148 to look at the new patch set (#2). Change subject: IMPALA-12903: Querying virtual column FILE__POSITION for TEXT and JSON tables crashes Impala .. IMPALA-12903: Querying virtual column FILE__POSITION for TEXT and JSON tables crashes Impala Impala generates segmentation fault when it queries the virtual column FILE__POSITION for TEXT or JSON tables. When the scanners that do not support the FILE__POSITION virtual column detect its presence they try to report an error and close themselves. The segfault is in the scanners' Close() method when they try to dereference a NULL stream object. This patch simply adds NULL-checks in Close(). Alternatively we could detect the presence of FILE__POSITION during planning in the HdfsScanNode, but doing it in the scanners lets us handle more queries, e.g. queries that dynamically prune partitions and the surviving partitions all have file formats that support FILE__POSITION. Testing: * added negative tests to properly report the errors * added tests for mixed file format tables Change-Id: I8e1af8d526f9046aceddb5944da9e6f9c63768b0 --- M be/src/exec/json/hdfs-json-scanner.cc M be/src/exec/text/hdfs-text-scanner.cc M testdata/workloads/functional-query/queries/QueryTest/virtual-column-file-position-generic.test A testdata/workloads/functional-query/queries/QueryTest/virtual-column-file-position-negative.test M tests/query_test/test_scanners.py 5 files changed, 92 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/48/21148/2 -- To view, visit http://gerrit.cloudera.org:8080/21148 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8e1af8d526f9046aceddb5944da9e6f9c63768b0 Gerrit-Change-Number: 21148 Gerrit-PatchSet: 2 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto
[Impala-ASF-CR] IMPALA-12904: test type conversions hive3 silently passes because of wrongly defined test dimensions
Zoltan Borok-Nagy has uploaded this change for review. ( http://gerrit.cloudera.org:8080/21151 Change subject: IMPALA-12904: test_type_conversions_hive3 silently passes because of wrongly defined test dimensions .. IMPALA-12904: test_type_conversions_hive3 silently passes because of wrongly defined test dimensions test_type_conversions_hive3 silently passes because we are not creating the test dimenstion for query option orc_shema_resolution correctly. If we set orc_shema_resolution correctly, i.e. to also exercise the name-based schema resolution, the test fails. The cause of the failure is that the ill-typed tables have dummy column names like 'c1', 'c2', etc. These are completely fine for position-based schema resolution, but it is not OK for name-based schema resolution. The test just wants to check error messages related to type errors, the column names are irrelevant, so we can just use the correct names. The test was copied from the old test_type_conversions_hive2 which is not relevant anymore, so this CR also removes it. Change-Id: I786a5eaae9243b4728484f3f3b1427b20a1d2d28 --- M testdata/workloads/functional-query/queries/DataErrorsTest/orc-type-checks.test M tests/query_test/test_scanners.py 2 files changed, 42 insertions(+), 81 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/51/21151/1 -- To view, visit http://gerrit.cloudera.org:8080/21151 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I786a5eaae9243b4728484f3f3b1427b20a1d2d28 Gerrit-Change-Number: 21151 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-12903: Querying virtual column FILE POSITION for TEXT and JSON tables crashes Impala
Zoltan Borok-Nagy has uploaded this change for review. ( http://gerrit.cloudera.org:8080/21148 Change subject: IMPALA-12903: Querying virtual column FILE__POSITION for TEXT and JSON tables crashes Impala .. IMPALA-12903: Querying virtual column FILE__POSITION for TEXT and JSON tables crashes Impala Impala generates segmentation fault when it queries the virtual column FILE__POSITION for TEXT or JSON tables. When the scanners that do not support the FILE__POSITION virtual column detect its presence they try to report an error and close themselves. The segfault is in the scanners' Close() method when they try to dereference a NULL stream object. This patch simply adds NULL-checks in Close(). Alternatively we could detect the presence of FILE__POSITION during planning in the HdfsScanNode, but doing it in the scanners let us handle more queries, e.g. queries that dynamically prune partitions and the surviving partitions all have file formats that support FILE__POSITION. Testing: * added negative tests to properly report the errors * added tests for mixed file format tables Change-Id: I8e1af8d526f9046aceddb5944da9e6f9c63768b0 --- M be/src/exec/json/hdfs-json-scanner.cc M be/src/exec/text/hdfs-text-scanner.cc M testdata/workloads/functional-query/queries/QueryTest/virtual-column-file-position-generic.test A testdata/workloads/functional-query/queries/QueryTest/virtual-column-file-position-negative.test M tests/query_test/test_scanners.py 5 files changed, 88 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/48/21148/1 -- To view, visit http://gerrit.cloudera.org:8080/21148 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I8e1af8d526f9046aceddb5944da9e6f9c63768b0 Gerrit-Change-Number: 21148 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-12894: (part 1) Turn off the count(*) optimisation for V2 Iceberg tables
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/21139 ) Change subject: IMPALA-12894: (part 1) Turn off the count(*) optimisation for V2 Iceberg tables .. Patch Set 4: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/21139 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ida9fb04fd076c987b6b5257ad801bf30f5900237 Gerrit-Change-Number: 21139 Gerrit-PatchSet: 4 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 13 Mar 2024 14:13:30 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12894: Turn off the count(*) optimisation for V2 Iceberg tables
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/21139 ) Change subject: IMPALA-12894: Turn off the count(*) optimisation for V2 Iceberg tables .. Patch Set 3: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/21139/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21139/3//COMMIT_MSG@7 PS3, Line 7: nit: maybe you could include "part 1" in the title -- To view, visit http://gerrit.cloudera.org:8080/21139 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ida9fb04fd076c987b6b5257ad801bf30f5900237 Gerrit-Change-Number: 21139 Gerrit-PatchSet: 3 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 13 Mar 2024 14:11:10 + Gerrit-HasComments: Yes