[Impala-ASF-CR] IMPALA-13108: Update version to 4.5.0-SNAPSHOT

2024-05-25 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/21460


Change subject: IMPALA-13108: Update version to 4.5.0-SNAPSHOT
..

IMPALA-13108: Update version to 4.5.0-SNAPSHOT

Updated IMPALA_VERSION in impala-config.sh

Executed the followings for Java:

  cd java
  mvn versions:set -DnewVersion=4.5.0-SNAPSHOT

Change-Id: Ie7803fe523406dbdd1ac066a35bb31d21765a244
---
M bin/impala-config.sh
M fe/pom.xml
M java/TableFlattener/pom.xml
M java/calcite-planner/pom.xml
M java/datagenerator/pom.xml
M java/executor-deps/pom.xml
M java/ext-data-source/api/pom.xml
M java/ext-data-source/jdbc/pom.xml
M java/ext-data-source/pom.xml
M java/ext-data-source/sample/pom.xml
M java/ext-data-source/test/pom.xml
M java/external-frontend/pom.xml
M java/pom.xml
M java/query-event-hook-api/pom.xml
M java/shaded-deps/hive-exec/pom.xml
M java/shaded-deps/s3a-aws-sdk/pom.xml
M java/test-corrupt-hive-udfs/pom.xml
M java/test-hive-udfs/pom.xml
M java/yarn-extras/pom.xml
19 files changed, 22 insertions(+), 22 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/60/21460/1
--
To view, visit http://gerrit.cloudera.org:8080/21460
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ie7803fe523406dbdd1ac066a35bb31d21765a244
Gerrit-Change-Number: 21460
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy 


[Impala-ASF-CR](asf-site) Add documentation, update links for 4.4.0

2024-05-25 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/21311 )

Change subject: Add documentation, update links for 4.4.0
..

Add documentation, update links for 4.4.0

Change-Id: Ibb93f7ba80b7a065ea83660fc75be9b065138ad9
Reviewed-on: http://gerrit.cloudera.org:8080/21311
Reviewed-by: Zoltan Borok-Nagy 
Tested-by: Zoltan Borok-Nagy 
---
M docs/build/asf-site-html/index.html
M docs/build/asf-site-html/shared/ImpalaVariables.html
M docs/build/asf-site-html/shared/impala_common.html
M docs/build/asf-site-html/topics/impala_abort_on_error.html
M docs/build/asf-site-html/topics/impala_adls.html
M docs/build/asf-site-html/topics/impala_admin.html
M docs/build/asf-site-html/topics/impala_admission.html
M docs/build/asf-site-html/topics/impala_admission_config.html
M docs/build/asf-site-html/topics/impala_aggregate_functions.html
M docs/build/asf-site-html/topics/impala_aliases.html
M docs/build/asf-site-html/topics/impala_allow_erasure_coded_files.html
M docs/build/asf-site-html/topics/impala_allow_unsupported_formats.html
M docs/build/asf-site-html/topics/impala_alter_database.html
M docs/build/asf-site-html/topics/impala_alter_table.html
M docs/build/asf-site-html/topics/impala_alter_view.html
M docs/build/asf-site-html/topics/impala_analytic_functions.html
M docs/build/asf-site-html/topics/impala_appx_count_distinct.html
M docs/build/asf-site-html/topics/impala_appx_median.html
M docs/build/asf-site-html/topics/impala_array.html
M docs/build/asf-site-html/topics/impala_auditing.html
M docs/build/asf-site-html/topics/impala_authentication.html
M docs/build/asf-site-html/topics/impala_authorization.html
M docs/build/asf-site-html/topics/impala_avg.html
M docs/build/asf-site-html/topics/impala_avro.html
M docs/build/asf-site-html/topics/impala_batch_size.html
M docs/build/asf-site-html/topics/impala_bigint.html
M docs/build/asf-site-html/topics/impala_bit_functions.html
M docs/build/asf-site-html/topics/impala_boolean.html
M docs/build/asf-site-html/topics/impala_breakpad.html
M docs/build/asf-site-html/topics/impala_broadcast_bytes_limit.html
M docs/build/asf-site-html/topics/impala_buffer_pool_limit.html
M docs/build/asf-site-html/topics/impala_char.html
M docs/build/asf-site-html/topics/impala_client.html
M docs/build/asf-site-html/topics/impala_comment.html
M docs/build/asf-site-html/topics/impala_comments.html
M docs/build/asf-site-html/topics/impala_complex_types.html
M docs/build/asf-site-html/topics/impala_components.html
M docs/build/asf-site-html/topics/impala_compression_codec.html
M docs/build/asf-site-html/topics/impala_compute_stats.html
M docs/build/asf-site-html/topics/impala_compute_stats_min_sample_size.html
M docs/build/asf-site-html/topics/impala_concepts.html
M docs/build/asf-site-html/topics/impala_conditional_functions.html
M docs/build/asf-site-html/topics/impala_config.html
M docs/build/asf-site-html/topics/impala_config_options.html
M docs/build/asf-site-html/topics/impala_config_performance.html
M docs/build/asf-site-html/topics/impala_connecting.html
M docs/build/asf-site-html/topics/impala_conversion_functions.html
M docs/build/asf-site-html/topics/impala_count.html
M docs/build/asf-site-html/topics/impala_create_database.html
M docs/build/asf-site-html/topics/impala_create_function.html
M docs/build/asf-site-html/topics/impala_create_role.html
M docs/build/asf-site-html/topics/impala_create_table.html
M docs/build/asf-site-html/topics/impala_create_view.html
M docs/build/asf-site-html/topics/impala_custom_timezones.html
M docs/build/asf-site-html/topics/impala_data_cache.html
M docs/build/asf-site-html/topics/impala_databases.html
M docs/build/asf-site-html/topics/impala_datatypes.html
M docs/build/asf-site-html/topics/impala_date.html
M docs/build/asf-site-html/topics/impala_datetime_functions.html
M docs/build/asf-site-html/topics/impala_ddl.html
M docs/build/asf-site-html/topics/impala_debug_action.html
M docs/build/asf-site-html/topics/impala_decimal.html
M docs/build/asf-site-html/topics/impala_decimal_v2.html
M docs/build/asf-site-html/topics/impala_dedicated_coordinator.html
M docs/build/asf-site-html/topics/impala_default_file_format.html
M docs/build/asf-site-html/topics/impala_default_hints_insert_statement.html
M docs/build/asf-site-html/topics/impala_default_join_distribution_mode.html
M docs/build/asf-site-html/topics/impala_default_spillable_buffer_size.html
M docs/build/asf-site-html/topics/impala_default_transactional_type.html
M docs/build/asf-site-html/topics/impala_delegation.html
M docs/build/asf-site-html/topics/impala_delete.html
M docs/build/asf-site-html/topics/impala_delete_stats_in_truncate.html
M docs/build/asf-site-html/topics/impala_describe.html
M docs/build/asf-site-html/topics/impala_development.html
M docs/build/asf-site-html/topics/impala_disable_codegen.html
M docs/build/asf-site-html/topics/impala_disable_codegen_rows_threshold.html
M 

[Impala-ASF-CR](asf-site) Update download links for release 4.4.0

2024-05-25 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/21307 )

Change subject: Update download links for release 4.4.0
..

Update download links for release 4.4.0

Change-Id: Ie0e8736154e5289e02d5ec5cf5f664cd4de2739d
Reviewed-on: http://gerrit.cloudera.org:8080/21307
Reviewed-by: Laszlo Gaal 
Tested-by: Zoltan Borok-Nagy 
---
M downloads.html
1 file changed, 13 insertions(+), 4 deletions(-)

Approvals:
  Laszlo Gaal: Looks good to me, approved
  Zoltan Borok-Nagy: Verified

--
To view, visit http://gerrit.cloudera.org:8080/21307
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: asf-site
Gerrit-MessageType: merged
Gerrit-Change-Id: Ie0e8736154e5289e02d5ec5cf5f664cd4de2739d
Gerrit-Change-Number: 21307
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR](asf-site) Add documentation, update links for 4.4.0

2024-05-25 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21311 )

Change subject: Add documentation, update links for 4.4.0
..


Patch Set 4: Verified+1 Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/21311
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: asf-site
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibb93f7ba80b7a065ea83660fc75be9b065138ad9
Gerrit-Change-Number: 21311
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Sat, 25 May 2024 08:03:01 +
Gerrit-HasComments: No


[Impala-ASF-CR](asf-site) Update download links for release 4.4.0

2024-05-25 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21307 )

Change subject: Update download links for release 4.4.0
..


Patch Set 1: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/21307
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: asf-site
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie0e8736154e5289e02d5ec5cf5f664cd4de2739d
Gerrit-Change-Number: 21307
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Sat, 25 May 2024 08:02:53 +
Gerrit-HasComments: No


[Impala-ASF-CR](asf-site) Add documentation, update links for 4.4.0

2024-05-25 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has removed a vote on this change.

Change subject: Add documentation, update links for 4.4.0
..


Removed Verified-1 by Impala Public Jenkins 
--
To view, visit http://gerrit.cloudera.org:8080/21311
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: asf-site
Gerrit-MessageType: deleteVote
Gerrit-Change-Id: Ibb93f7ba80b7a065ea83660fc75be9b065138ad9
Gerrit-Change-Number: 21311
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR](asf-site) Add documentation, update links for 4.4.0

2024-05-24 Thread Zoltan Borok-Nagy (Code Review)
Hello Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21311

to look at the new patch set (#4).

Change subject: Add documentation, update links for 4.4.0
..

Add documentation, update links for 4.4.0

Change-Id: Ibb93f7ba80b7a065ea83660fc75be9b065138ad9
---
M docs/build/asf-site-html/index.html
M docs/build/asf-site-html/shared/ImpalaVariables.html
M docs/build/asf-site-html/shared/impala_common.html
M docs/build/asf-site-html/topics/impala_abort_on_error.html
M docs/build/asf-site-html/topics/impala_adls.html
M docs/build/asf-site-html/topics/impala_admin.html
M docs/build/asf-site-html/topics/impala_admission.html
M docs/build/asf-site-html/topics/impala_admission_config.html
M docs/build/asf-site-html/topics/impala_aggregate_functions.html
M docs/build/asf-site-html/topics/impala_aliases.html
M docs/build/asf-site-html/topics/impala_allow_erasure_coded_files.html
M docs/build/asf-site-html/topics/impala_allow_unsupported_formats.html
M docs/build/asf-site-html/topics/impala_alter_database.html
M docs/build/asf-site-html/topics/impala_alter_table.html
M docs/build/asf-site-html/topics/impala_alter_view.html
M docs/build/asf-site-html/topics/impala_analytic_functions.html
M docs/build/asf-site-html/topics/impala_appx_count_distinct.html
M docs/build/asf-site-html/topics/impala_appx_median.html
M docs/build/asf-site-html/topics/impala_array.html
M docs/build/asf-site-html/topics/impala_auditing.html
M docs/build/asf-site-html/topics/impala_authentication.html
M docs/build/asf-site-html/topics/impala_authorization.html
M docs/build/asf-site-html/topics/impala_avg.html
M docs/build/asf-site-html/topics/impala_avro.html
M docs/build/asf-site-html/topics/impala_batch_size.html
M docs/build/asf-site-html/topics/impala_bigint.html
M docs/build/asf-site-html/topics/impala_bit_functions.html
M docs/build/asf-site-html/topics/impala_boolean.html
M docs/build/asf-site-html/topics/impala_breakpad.html
M docs/build/asf-site-html/topics/impala_broadcast_bytes_limit.html
M docs/build/asf-site-html/topics/impala_buffer_pool_limit.html
M docs/build/asf-site-html/topics/impala_char.html
M docs/build/asf-site-html/topics/impala_client.html
M docs/build/asf-site-html/topics/impala_comment.html
M docs/build/asf-site-html/topics/impala_comments.html
M docs/build/asf-site-html/topics/impala_complex_types.html
M docs/build/asf-site-html/topics/impala_components.html
M docs/build/asf-site-html/topics/impala_compression_codec.html
M docs/build/asf-site-html/topics/impala_compute_stats.html
M docs/build/asf-site-html/topics/impala_compute_stats_min_sample_size.html
M docs/build/asf-site-html/topics/impala_concepts.html
M docs/build/asf-site-html/topics/impala_conditional_functions.html
M docs/build/asf-site-html/topics/impala_config.html
M docs/build/asf-site-html/topics/impala_config_options.html
M docs/build/asf-site-html/topics/impala_config_performance.html
M docs/build/asf-site-html/topics/impala_connecting.html
M docs/build/asf-site-html/topics/impala_conversion_functions.html
M docs/build/asf-site-html/topics/impala_count.html
M docs/build/asf-site-html/topics/impala_create_database.html
M docs/build/asf-site-html/topics/impala_create_function.html
M docs/build/asf-site-html/topics/impala_create_role.html
M docs/build/asf-site-html/topics/impala_create_table.html
M docs/build/asf-site-html/topics/impala_create_view.html
M docs/build/asf-site-html/topics/impala_custom_timezones.html
M docs/build/asf-site-html/topics/impala_data_cache.html
M docs/build/asf-site-html/topics/impala_databases.html
M docs/build/asf-site-html/topics/impala_datatypes.html
M docs/build/asf-site-html/topics/impala_date.html
M docs/build/asf-site-html/topics/impala_datetime_functions.html
M docs/build/asf-site-html/topics/impala_ddl.html
M docs/build/asf-site-html/topics/impala_debug_action.html
M docs/build/asf-site-html/topics/impala_decimal.html
M docs/build/asf-site-html/topics/impala_decimal_v2.html
M docs/build/asf-site-html/topics/impala_dedicated_coordinator.html
M docs/build/asf-site-html/topics/impala_default_file_format.html
M docs/build/asf-site-html/topics/impala_default_hints_insert_statement.html
M docs/build/asf-site-html/topics/impala_default_join_distribution_mode.html
M docs/build/asf-site-html/topics/impala_default_spillable_buffer_size.html
M docs/build/asf-site-html/topics/impala_default_transactional_type.html
M docs/build/asf-site-html/topics/impala_delegation.html
M docs/build/asf-site-html/topics/impala_delete.html
M docs/build/asf-site-html/topics/impala_delete_stats_in_truncate.html
M docs/build/asf-site-html/topics/impala_describe.html
M docs/build/asf-site-html/topics/impala_development.html
M docs/build/asf-site-html/topics/impala_disable_codegen.html
M docs/build/asf-site-html/topics/impala_disable_codegen_rows_threshold.html
M 

[Impala-ASF-CR] IMPALA-13088: (part 2) Parallelize final sorts in IcebergDeleteBuilder

2024-05-23 Thread Zoltan Borok-Nagy (Code Review)
Hello Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21452

to look at the new patch set (#2).

Change subject: IMPALA-13088: (part 2) Parallelize final sorts in 
IcebergDeleteBuilder
..

IMPALA-13088: (part 2) Parallelize final sorts in IcebergDeleteBuilder

With this patch IcebergDeleteBuilder checks how many probe threads
are actually blocked on the builder. Let's assume the following plan:

 UNION ALL
/ \
   /   \
  / \
 SCAN allANTI JOIN
 datafiles  / \
 without   /   \
 deletes  SCAN SCAN
  datafilesdeletes
  with deletes

In that case UNION ALL, and the two "SCAN datafiles" operators are in
the same fragment, while the builder of the ANTI JOIN is in a different
fragment. This means that "SCAN datafiles without deletes" can run in
parallel with the builder. But once that SCAN is exhausted, the UNION
ALL will drain rows from "SCAN datafiles with deletes" via the ANTI JOIN
operator, but that operator depends on the join builder output.

This means in some cases the SCAN fragments are busy, while in other
cases the SCAN fragments are blocked. It depends on how much work
they need to do, and how much work the build-side needs to do. So to
handle all cases, we dynamically check how many build fragments are
blocked on the builder, then spin up as many threads to parellelize
the final sort.

The also works well when we have the following plan:

ANTI JOIN
   / \
  /   \
 SCAN SCAN
 datafilesdeletes
 with deletes

The above plan is created when all data files have corresponding
deletes, or when we are running a simple count(*) query. In that
case all "SCAN datafiles" fragments are blocked on the builder,
so we can use that many threads to sort the build results.

A new field "ThreadCountInFinalBuild" was added, so we can check the
query profile about how many threads were used for the final
sorting in the builders.

Measurements:
In a table with 1 Trillion data records and 68.5 Billion delete records
it lowered "IcebergDeletePositionSortTimer" from ~1 minute to
8-10 seconds, in an environment with 40 executors and MT_DOP=12.

TODO:
 * e2e tests that check counter "ThreadCountInFinalBuild"

Change-Id: I7ca946a452d061238255e9b0e2c81a51cac68807
---
M be/src/exec/iceberg-delete-builder.cc
M be/src/exec/iceberg-delete-builder.h
M be/src/exec/join-builder.cc
M be/src/exec/join-builder.h
4 files changed, 105 insertions(+), 24 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/52/21452/2
--
To view, visit http://gerrit.cloudera.org:8080/21452
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7ca946a452d061238255e9b0e2c81a51cac68807
Gerrit-Change-Number: 21452
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder

2024-05-23 Thread Zoltan Borok-Nagy (Code Review)
Hello Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21435

to look at the new patch set (#2).

Change subject: IMPALA-13088: (part 1) Improve build batch processing of 
IcebergDeleteBuilder
..

IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder

When there are lots of delete records the IcebergDeleteBuilder can
become a bottleneck. Since the left side of the JOIN is blocked on
the build side any improvement we make here significantly improves
Iceberg V2 table scanning.

Improvements of this patch:

* Use a vector of vectors to collect the position delete records.
  This way we can avoid large re-allocations and copyings.
* Insert large ranges from the build batches into the collected
  delete records instead of doing it one-by-one.

Measurements

Local measurement with 824 Million position delete records:
JOIN BUILD: ~32s -> ~14s (6s is the final sorting)

40-node cluster with 68.5 Billion position delete records:
JOIN BUILD: 4m15s -> 1m45s (1m7s is the final sorting)

Parallelization of the final sort will be added in a follow-up CR.

Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf
(cherry picked from commit d08315fe5c57ccb5b197cd196b62eeedf7d90ec3)
---
M be/src/exec/iceberg-delete-builder.cc
M be/src/exec/iceberg-delete-builder.h
2 files changed, 101 insertions(+), 22 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/35/21435/2
--
To view, visit http://gerrit.cloudera.org:8080/21435
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf
Gerrit-Change-Number: 21435
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-13088: (part 2) Parallelize final sorts in IcebergDeleteBuilder

2024-05-23 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/21452


Change subject: IMPALA-13088: (part 2) Parallelize final sorts in 
IcebergDeleteBuilder
..

IMPALA-13088: (part 2) Parallelize final sorts in IcebergDeleteBuilder

With this patch IcebergDeleteBuilder checks how many probe threads
are actually blocked on the builder. Let's assume the following plan:

UNION ALL
   / \
  /   \
 / \
SCAN allANTI JOIN
datafiles  / \
without   /   \
deletes  SCAN SCAN
 datafilesdeletes
 with deletes

In that case UNION ALL, and the two "SCAN datafiles" operators are in
the same fragment, while the builder of the ANTI JOIN is in a different
fragment. This means that "SCAN datafiles without deletes" can run in
parallel with the builder. But once that SCAN is exhausted, the UNION
ALL will drain rows from "SCAN datafiles with deletes" via the ANTI JOIN
operator, but that operator depends on the join builder output.

This means in some cases the SCAN fragments are busy, while in other
cases the SCAN fragments are blocked. It depends on how much work
they need to do, and how much work the build-side needs to do. So to
handle all cases, we dynamically check how many build fragments are
blocked on the builder, then spin up as many threads to parellelize
the final sort.

The also works well when we have the following plan:

ANTI JOIN
   / \
  /   \
 SCAN SCAN
 datafilesdeletes
 with deletes

The above plan is created when all data files have corresponding
deletes, or when we are running a simple count(*) query. In that
case all "SCAN datafiles" fragments are blocked on the builder,
so we can use that many threads to sort the build results.

A new field "ThreadCountInFinalBuild" was added, so we can check the
query profile about how many threads were used for the final
sorting in the builders.

Measurements:
In a table with 1 Trillion data records and 68.5 Billion delete records
it lowered "IcebergDeletePositionSortTimer" from ~1 minute to
8-10 seconds, in an environment with 40 executors and MT_DOP=12.

TODO:
 * e2e tests that check counter "ThreadCountInFinalBuild"

Change-Id: I7ca946a452d061238255e9b0e2c81a51cac68807
---
M be/src/exec/iceberg-delete-builder.cc
M be/src/exec/iceberg-delete-builder.h
M be/src/exec/join-builder.cc
M be/src/exec/join-builder.h
4 files changed, 102 insertions(+), 24 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/52/21452/1
--
To view, visit http://gerrit.cloudera.org:8080/21452
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I7ca946a452d061238255e9b0e2c81a51cac68807
Gerrit-Change-Number: 21452
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder

2024-05-21 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has removed a vote on this change.

Change subject: IMPALA-13088: (part 1) Improve build batch processing of 
IcebergDeleteBuilder
..


Removed Verified-1 by Impala Public Jenkins 
--
To view, visit http://gerrit.cloudera.org:8080/21435
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: deleteVote
Gerrit-Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf
Gerrit-Change-Number: 21435
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder

2024-05-16 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/21435


Change subject: IMPALA-13088: (part 1) Improve build batch processing of 
IcebergDeleteBuilder
..

IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder

When there are lots of delete records the IcebergDeleteBuilder can
become a bottleneck. Since the left side of the JOIN is blocked on
the build side any improvement we make here significantly improves
Iceberg V2 table scanning.

Improvements of this patch:

* Use a vector of vectors to collect the position delete records.
  This way we can avoid large re-allocations and copyings.
* Insert large ranges from the build batches into the collected
  delete records instead of doing it one-by-one.

Measurements

Local measurement with 824 Million position delete records:
JOIN BUILD: ~32s -> ~14s (6s is the final sorting)

40-node cluster with 68.5 Billion position delete records:
JOIN BUILD: 4m15s -> 1m45s (1m7s is the final sorting)

Parallelization of the final sort will be added in a follow-up CR.

Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf
---
M be/src/exec/iceberg-delete-builder.cc
M be/src/exec/iceberg-delete-builder.h
2 files changed, 101 insertions(+), 22 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/35/21435/1
--
To view, visit http://gerrit.cloudera.org:8080/21435
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf
Gerrit-Change-Number: 21435
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-13029: Tests for multi format equality deletes

2024-05-09 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21348 )

Change subject: IMPALA-13029: Tests for multi format equality deletes
..


Patch Set 4: Code-Review+2

Thanks for modifying the tests, LGTM!


--
To view, visit http://gerrit.cloudera.org:8080/21348
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7f0ebf7f4d401877741eb3e1c990f1318ac2b4ba
Gerrit-Change-Number: 21348
Gerrit-PatchSet: 4
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 09 May 2024 12:40:54 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-13029: Tests for multi format equality deletes

2024-04-25 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21348 )

Change subject: IMPALA-13029: Tests for multi format equality deletes
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/21348/2/testdata/data/README
File testdata/data/README:

http://gerrit.cloudera.org:8080/#/c/21348/2/testdata/data/README@1212
PS2, Line 1212: 5) Manually change identifier-field-ids from [1] to [1,2]
  : 6) Delete rows with Nifi (i=1,j=11), (i=4,j=44)
This means even if the original table's Avro schema is used, the eq-delete 
files are still getting processed correctly, as the eq-delete file schema is a 
subset of the original schema, same columns, same positions.

Would it be possible to only have [2] in the identifier list? And maybe make 
'j' a STRING column?



--
To view, visit http://gerrit.cloudera.org:8080/21348
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7f0ebf7f4d401877741eb3e1c990f1318ac2b4ba
Gerrit-Change-Number: 21348
Gerrit-PatchSet: 2
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 25 Apr 2024 13:04:36 +
Gerrit-HasComments: Yes


[Impala-ASF-CR](asf-site) Add documentation, update links for 4.4.0

2024-04-23 Thread Zoltan Borok-Nagy (Code Review)
Hello Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21311

to look at the new patch set (#3).

Change subject: Add documentation, update links for 4.4.0
..

Add documentation, update links for 4.4.0

Change-Id: Ibb93f7ba80b7a065ea83660fc75be9b065138ad9
---
M docs/build/asf-site-html/index.html
M docs/build/asf-site-html/shared/ImpalaVariables.html
M docs/build/asf-site-html/shared/impala_common.html
M docs/build/asf-site-html/topics/impala_abort_on_error.html
M docs/build/asf-site-html/topics/impala_adls.html
M docs/build/asf-site-html/topics/impala_admin.html
M docs/build/asf-site-html/topics/impala_admission.html
M docs/build/asf-site-html/topics/impala_admission_config.html
M docs/build/asf-site-html/topics/impala_aggregate_functions.html
M docs/build/asf-site-html/topics/impala_aliases.html
M docs/build/asf-site-html/topics/impala_allow_erasure_coded_files.html
M docs/build/asf-site-html/topics/impala_allow_unsupported_formats.html
M docs/build/asf-site-html/topics/impala_alter_database.html
M docs/build/asf-site-html/topics/impala_alter_table.html
M docs/build/asf-site-html/topics/impala_alter_view.html
M docs/build/asf-site-html/topics/impala_analytic_functions.html
M docs/build/asf-site-html/topics/impala_appx_count_distinct.html
M docs/build/asf-site-html/topics/impala_appx_median.html
M docs/build/asf-site-html/topics/impala_array.html
M docs/build/asf-site-html/topics/impala_auditing.html
M docs/build/asf-site-html/topics/impala_authentication.html
M docs/build/asf-site-html/topics/impala_authorization.html
M docs/build/asf-site-html/topics/impala_avg.html
M docs/build/asf-site-html/topics/impala_avro.html
M docs/build/asf-site-html/topics/impala_batch_size.html
M docs/build/asf-site-html/topics/impala_bigint.html
M docs/build/asf-site-html/topics/impala_bit_functions.html
M docs/build/asf-site-html/topics/impala_boolean.html
M docs/build/asf-site-html/topics/impala_breakpad.html
M docs/build/asf-site-html/topics/impala_broadcast_bytes_limit.html
M docs/build/asf-site-html/topics/impala_buffer_pool_limit.html
M docs/build/asf-site-html/topics/impala_char.html
M docs/build/asf-site-html/topics/impala_client.html
M docs/build/asf-site-html/topics/impala_comment.html
M docs/build/asf-site-html/topics/impala_comments.html
M docs/build/asf-site-html/topics/impala_complex_types.html
M docs/build/asf-site-html/topics/impala_components.html
M docs/build/asf-site-html/topics/impala_compression_codec.html
M docs/build/asf-site-html/topics/impala_compute_stats.html
M docs/build/asf-site-html/topics/impala_compute_stats_min_sample_size.html
M docs/build/asf-site-html/topics/impala_concepts.html
M docs/build/asf-site-html/topics/impala_conditional_functions.html
M docs/build/asf-site-html/topics/impala_config.html
M docs/build/asf-site-html/topics/impala_config_options.html
M docs/build/asf-site-html/topics/impala_config_performance.html
M docs/build/asf-site-html/topics/impala_connecting.html
M docs/build/asf-site-html/topics/impala_conversion_functions.html
M docs/build/asf-site-html/topics/impala_count.html
M docs/build/asf-site-html/topics/impala_create_database.html
M docs/build/asf-site-html/topics/impala_create_function.html
M docs/build/asf-site-html/topics/impala_create_role.html
M docs/build/asf-site-html/topics/impala_create_table.html
M docs/build/asf-site-html/topics/impala_create_view.html
M docs/build/asf-site-html/topics/impala_custom_timezones.html
M docs/build/asf-site-html/topics/impala_data_cache.html
M docs/build/asf-site-html/topics/impala_databases.html
M docs/build/asf-site-html/topics/impala_datatypes.html
M docs/build/asf-site-html/topics/impala_date.html
M docs/build/asf-site-html/topics/impala_datetime_functions.html
M docs/build/asf-site-html/topics/impala_ddl.html
M docs/build/asf-site-html/topics/impala_debug_action.html
M docs/build/asf-site-html/topics/impala_decimal.html
M docs/build/asf-site-html/topics/impala_decimal_v2.html
M docs/build/asf-site-html/topics/impala_dedicated_coordinator.html
M docs/build/asf-site-html/topics/impala_default_file_format.html
M docs/build/asf-site-html/topics/impala_default_hints_insert_statement.html
M docs/build/asf-site-html/topics/impala_default_join_distribution_mode.html
M docs/build/asf-site-html/topics/impala_default_spillable_buffer_size.html
M docs/build/asf-site-html/topics/impala_default_transactional_type.html
M docs/build/asf-site-html/topics/impala_delegation.html
M docs/build/asf-site-html/topics/impala_delete.html
M docs/build/asf-site-html/topics/impala_delete_stats_in_truncate.html
M docs/build/asf-site-html/topics/impala_describe.html
M docs/build/asf-site-html/topics/impala_development.html
M docs/build/asf-site-html/topics/impala_disable_codegen.html
M docs/build/asf-site-html/topics/impala_disable_codegen_rows_threshold.html
M 

[Impala-ASF-CR] IMPALA-13029: Tests for multi format equality deletes

2024-04-23 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21348 )

Change subject: IMPALA-13029: Tests for multi format equality deletes
..


Patch Set 1:

(1 comment)

Thanks for adding more tests!

http://gerrit.cloudera.org:8080/#/c/21348/1/testdata/data/README
File testdata/data/README:

http://gerrit.cloudera.org:8080/#/c/21348/1/testdata/data/README@1193
PS1, Line 1193:set tblproperties ('write.format.default'='avro');
Would it be possible to do schema evolution + Avro delete files? I.e. using 
different delete columns in the Avro eq delete files, to make sure we use the 
correct Avro schema in the delete scans.



--
To view, visit http://gerrit.cloudera.org:8080/21348
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7f0ebf7f4d401877741eb3e1c990f1318ac2b4ba
Gerrit-Change-Number: 21348
Gerrit-PatchSet: 1
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 23 Apr 2024 15:37:51 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-13002: Iceberg V2 tables with Avro delete files aren't read properly

2024-04-23 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/21301 )

Change subject: IMPALA-13002: Iceberg V2 tables with Avro delete files aren't 
read properly
..

IMPALA-13002: Iceberg V2 tables with Avro delete files aren't read properly

If the Iceberg table has Avro delete files (e.g. by setting
'write.delete.format.default'='avro') then Impala won't be able to read
the contents of the delete files properly. It is because the avro
schema is not set properly for the virtual delete table.

Testing:
 * added e2e tests with position delete files of all kinds

Change-Id: Iff13198991caf32c51cd9e0ace4454fd00216cf6
Reviewed-on: http://gerrit.cloudera.org:8080/21301
Tested-by: Impala Public Jenkins 
Reviewed-by: Daniel Becker 
Reviewed-by: Gabor Kaszab 
---
M fe/src/main/java/org/apache/impala/catalog/IcebergDeleteTable.java
A 
testdata/workloads/functional-query/queries/QueryTest/iceberg-mixed-format-position-deletes.test
M tests/query_test/test_iceberg.py
3 files changed, 143 insertions(+), 0 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Daniel Becker: Looks good to me, but someone else must approve
  Gabor Kaszab: Looks good to me, approved

--
To view, visit http://gerrit.cloudera.org:8080/21301
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Iff13198991caf32c51cd9e0ace4454fd00216cf6
Gerrit-Change-Number: 21301
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-13002: Iceberg V2 tables with Avro delete files aren't read properly

2024-04-22 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21301 )

Change subject: IMPALA-13002: Iceberg V2 tables with Avro delete files aren't 
read properly
..


Patch Set 1:

(2 comments)

Thanks for the comments!

http://gerrit.cloudera.org:8080/#/c/21301/1/fe/src/main/java/org/apache/impala/catalog/IcebergDeleteTable.java
File fe/src/main/java/org/apache/impala/catalog/IcebergDeleteTable.java:

http://gerrit.cloudera.org:8080/#/c/21301/1/fe/src/main/java/org/apache/impala/catalog/IcebergDeleteTable.java@87
PS1, Line 87:   if (desc.hdfsTable.isSetAvroSchema()) {
> I guess the issue is also true for AVRO equality delete files. Should we al
Yes, it would definitely be useful to have such tests. Probably in a separate 
CR, as adding such tables is cumbersome.


http://gerrit.cloudera.org:8080/#/c/21301/1/testdata/workloads/functional-query/queries/QueryTest/iceberg-mixed-format-position-deletes.test
File 
testdata/workloads/functional-query/queries/QueryTest/iceberg-mixed-format-position-deletes.test:

http://gerrit.cloudera.org:8080/#/c/21301/1/testdata/workloads/functional-query/queries/QueryTest/iceberg-mixed-format-position-deletes.test@92
PS1, Line 92: 
row_regex:'$NAMENODE/test-warehouse/$DATABASE.db/ice_mixed_formats_partitioned/data/j_trunc=2/.*-data-.*.orc','.*B','','.*'
> there should be 2 ORC data files in the j_trunc=2, right? One for (2,2) and
With VERIFY_IS_SUBSET we only check that each line is present in the result 
set. I.e. adding more lines with the same content wouldn't have an effect: 
https://github.com/apache/impala/blob/9b05a205fec397fa1e19ae467b1cc406ca43d948/tests/common/test_result_verifier.py#L258-L259



--
To view, visit http://gerrit.cloudera.org:8080/21301
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iff13198991caf32c51cd9e0ace4454fd00216cf6
Gerrit-Change-Number: 21301
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 22 Apr 2024 15:30:48 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-13000: Document OPTIMIZE TABLE

2024-04-22 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/21320 )

Change subject: IMPALA-13000: Document OPTIMIZE TABLE
..

IMPALA-13000: Document OPTIMIZE TABLE

Document OPTIMIZE TABLE syntax and behaviour.

Testing:
 - built docs locally

Change-Id: I851669686ed4da610dcac97c9b88ff23b0a4a647
Reviewed-on: http://gerrit.cloudera.org:8080/21320
Tested-by: Impala Public Jenkins 
Reviewed-by: Zoltan Borok-Nagy 
Reviewed-by: Daniel Becker 
---
M docs/topics/impala_iceberg.xml
1 file changed, 47 insertions(+), 0 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Zoltan Borok-Nagy: Looks good to me, but someone else must approve
  Daniel Becker: Looks good to me, approved

-- 
To view, visit http://gerrit.cloudera.org:8080/21320
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I851669686ed4da610dcac97c9b88ff23b0a4a647
Gerrit-Change-Number: 21320
Gerrit-PatchSet: 4
Gerrit-Owner: Noemi Pap-Takacs 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-13000: Document OPTIMIZE TABLE

2024-04-22 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21320 )

Change subject: IMPALA-13000: Document OPTIMIZE TABLE
..


Patch Set 3: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/21320
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I851669686ed4da610dcac97c9b88ff23b0a4a647
Gerrit-Change-Number: 21320
Gerrit-PatchSet: 3
Gerrit-Owner: Noemi Pap-Takacs 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 22 Apr 2024 10:39:59 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12938: add-opens for platform.cgroupv1

2024-04-19 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21334 )

Change subject: IMPALA-12938: add-opens for platform.cgroupv1
..


Patch Set 1: Code-Review+2

Thanks for fixing this


--
To view, visit http://gerrit.cloudera.org:8080/21334
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I312ae987c17c6f06e1ffe15e943b1865feef6b82
Gerrit-Change-Number: 21334
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Smith 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 19 Apr 2024 10:23:54 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-13016: Fix ambiguous row regex that check for no-existence

2024-04-19 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21333 )

Change subject: IMPALA-13016: Fix ambiguous row_regex that check for 
no-existence
..


Patch Set 2: Code-Review+2

Thanks for fixing these tests! LGTM!


--
To view, visit http://gerrit.cloudera.org:8080/21333
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic81de34bf997dfaf1c199b1fe1b05346b55ff4da
Gerrit-Change-Number: 21333
Gerrit-PatchSet: 2
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 19 Apr 2024 10:16:51 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-13000: Document OPTIMIZE TABLE

2024-04-18 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21320 )

Change subject: IMPALA-13000: Document OPTIMIZE TABLE
..


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/21320/1/docs/topics/impala_iceberg.xml
File docs/topics/impala_iceberg.xml:

http://gerrit.cloudera.org:8080/#/c/21320/1/docs/topics/impala_iceberg.xml@556
PS1, Line 556: able_na
> No need to use fully qualified table names. I only included the database in
[] is quite standard notation, and we are using it extensively in the Impala 
docs, e.g.: 
https://impala.apache.org/docs/build/html/topics/impala_create_table.html

So users shouldn't be confused by it. This file mostly contains simple examples 
because the other statements have their own detailed doc page. But we don't 
have that for OPTIMIZE, so having a proper syntax definition here makes sense 
to me. Alternatively, you we could create a separate top-level page for 
OPTIMIZE, and here only add a few examples.


http://gerrit.cloudera.org:8080/#/c/21320/2/docs/topics/impala_iceberg.xml
File docs/topics/impala_iceberg.xml:

http://gerrit.cloudera.org:8080/#/c/21320/2/docs/topics/impala_iceberg.xml@561
PS2, Line 561: rewrites the entire table
I think we should mention that it only applies to the current implementation, 
so users won't have this assumption in future releases.


http://gerrit.cloudera.org:8080/#/c/21320/2/docs/topics/impala_iceberg.xml@587
PS2, Line 587: rewrites the entire table
Maybe also mention here that this behavior is temporary.



--
To view, visit http://gerrit.cloudera.org:8080/21320
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I851669686ed4da610dcac97c9b88ff23b0a4a647
Gerrit-Change-Number: 21320
Gerrit-PatchSet: 2
Gerrit-Owner: Noemi Pap-Takacs 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 18 Apr 2024 12:58:20 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-13008: test metadata tables failed in Ubuntu 20 build

2024-04-17 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21317 )

Change subject: IMPALA-13008: test_metadata_tables failed in Ubuntu 20 build
..


Patch Set 1: Code-Review+2

Thanks for fixing this! I verified the patch on a RELEASE version.


--
To view, visit http://gerrit.cloudera.org:8080/21317
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iad8fd0d9920034e7dbe6c605bed7579fbe3b5b1f
Gerrit-Change-Number: 21317
Gerrit-PatchSet: 1
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 17 Apr 2024 15:05:26 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-13000: Document OPTIMIZE TABLE

2024-04-17 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21320 )

Change subject: IMPALA-13000: Document OPTIMIZE TABLE
..


Patch Set 1:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/21320/1/docs/topics/impala_iceberg.xml
File docs/topics/impala_iceberg.xml:

http://gerrit.cloudera.org:8080/#/c/21320/1/docs/topics/impala_iceberg.xml@553
PS1, Line 553: in
nit: maybe 'on'?


http://gerrit.cloudera.org:8080/#/c/21320/1/docs/topics/impala_iceberg.xml@554
PS1, Line 554: update files
There are no such files as 'udpdate files', so I'd just use 'data files'


http://gerrit.cloudera.org:8080/#/c/21320/1/docs/topics/impala_iceberg.xml@565
PS1, Line 565:   rewrite files using the latest table schema
 :   rewrite partitions according to the latest 
partition spec
I would slightly phrase it differently, as the current phrasing might suggest 
that data files with old schema/partition spec are getting selected for rewrite.

So maybe:
"the newly written files will have the latest schema and partitioned based on 
the latest partition spec"


http://gerrit.cloudera.org:8080/#/c/21320/1/docs/topics/impala_iceberg.xml@577
PS1, Line 577: Views cannot be optimized.
not sure if we need this as this should be clear


http://gerrit.cloudera.org:8080/#/c/21320/1/docs/topics/impala_iceberg.xml@583
PS1, Line 583: .
"..., because the rewritten data and delete files are not removed physically."



--
To view, visit http://gerrit.cloudera.org:8080/21320
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I851669686ed4da610dcac97c9b88ff23b0a4a647
Gerrit-Change-Number: 21320
Gerrit-PatchSet: 1
Gerrit-Owner: Noemi Pap-Takacs 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 17 Apr 2024 13:33:08 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-13003: Handle Iceberg AlreadyExistsException

2024-04-17 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21312 )

Change subject: IMPALA-13003: Handle Iceberg AlreadyExistsException
..


Patch Set 1: Code-Review+2

Thanks for fixing this, LGTM!


--
To view, visit http://gerrit.cloudera.org:8080/21312
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I847eea9297c9ee0d8e821fe1c87ea03d22f1d96e
Gerrit-Change-Number: 21312
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Smith 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 17 Apr 2024 08:59:29 +
Gerrit-HasComments: No


[Impala-ASF-CR](asf-site) Add documentation, update links for 4.4.0

2024-04-16 Thread Zoltan Borok-Nagy (Code Review)
Hello Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21311

to look at the new patch set (#2).

Change subject: Add documentation, update links for 4.4.0
..

Add documentation, update links for 4.4.0

Change-Id: Ibb93f7ba80b7a065ea83660fc75be9b065138ad9
---
M docs/build/asf-site-html/index.html
M docs/build/asf-site-html/shared/ImpalaVariables.html
M docs/build/asf-site-html/shared/impala_common.html
M docs/build/asf-site-html/topics/impala_abort_on_error.html
M docs/build/asf-site-html/topics/impala_adls.html
M docs/build/asf-site-html/topics/impala_admin.html
M docs/build/asf-site-html/topics/impala_admission.html
M docs/build/asf-site-html/topics/impala_admission_config.html
M docs/build/asf-site-html/topics/impala_aggregate_functions.html
M docs/build/asf-site-html/topics/impala_aliases.html
M docs/build/asf-site-html/topics/impala_allow_erasure_coded_files.html
M docs/build/asf-site-html/topics/impala_allow_unsupported_formats.html
M docs/build/asf-site-html/topics/impala_alter_database.html
M docs/build/asf-site-html/topics/impala_alter_table.html
M docs/build/asf-site-html/topics/impala_alter_view.html
M docs/build/asf-site-html/topics/impala_analytic_functions.html
M docs/build/asf-site-html/topics/impala_appx_count_distinct.html
M docs/build/asf-site-html/topics/impala_appx_median.html
M docs/build/asf-site-html/topics/impala_array.html
M docs/build/asf-site-html/topics/impala_auditing.html
M docs/build/asf-site-html/topics/impala_authentication.html
M docs/build/asf-site-html/topics/impala_authorization.html
M docs/build/asf-site-html/topics/impala_avg.html
M docs/build/asf-site-html/topics/impala_avro.html
M docs/build/asf-site-html/topics/impala_batch_size.html
M docs/build/asf-site-html/topics/impala_bigint.html
M docs/build/asf-site-html/topics/impala_bit_functions.html
M docs/build/asf-site-html/topics/impala_boolean.html
M docs/build/asf-site-html/topics/impala_breakpad.html
M docs/build/asf-site-html/topics/impala_broadcast_bytes_limit.html
M docs/build/asf-site-html/topics/impala_buffer_pool_limit.html
M docs/build/asf-site-html/topics/impala_char.html
M docs/build/asf-site-html/topics/impala_client.html
M docs/build/asf-site-html/topics/impala_comment.html
M docs/build/asf-site-html/topics/impala_comments.html
M docs/build/asf-site-html/topics/impala_complex_types.html
M docs/build/asf-site-html/topics/impala_components.html
M docs/build/asf-site-html/topics/impala_compression_codec.html
M docs/build/asf-site-html/topics/impala_compute_stats.html
M docs/build/asf-site-html/topics/impala_compute_stats_min_sample_size.html
M docs/build/asf-site-html/topics/impala_concepts.html
M docs/build/asf-site-html/topics/impala_conditional_functions.html
M docs/build/asf-site-html/topics/impala_config.html
M docs/build/asf-site-html/topics/impala_config_options.html
M docs/build/asf-site-html/topics/impala_config_performance.html
M docs/build/asf-site-html/topics/impala_connecting.html
M docs/build/asf-site-html/topics/impala_conversion_functions.html
M docs/build/asf-site-html/topics/impala_count.html
M docs/build/asf-site-html/topics/impala_create_database.html
M docs/build/asf-site-html/topics/impala_create_function.html
M docs/build/asf-site-html/topics/impala_create_role.html
M docs/build/asf-site-html/topics/impala_create_table.html
M docs/build/asf-site-html/topics/impala_create_view.html
M docs/build/asf-site-html/topics/impala_custom_timezones.html
M docs/build/asf-site-html/topics/impala_data_cache.html
M docs/build/asf-site-html/topics/impala_databases.html
M docs/build/asf-site-html/topics/impala_datatypes.html
M docs/build/asf-site-html/topics/impala_date.html
M docs/build/asf-site-html/topics/impala_datetime_functions.html
M docs/build/asf-site-html/topics/impala_ddl.html
M docs/build/asf-site-html/topics/impala_debug_action.html
M docs/build/asf-site-html/topics/impala_decimal.html
M docs/build/asf-site-html/topics/impala_decimal_v2.html
M docs/build/asf-site-html/topics/impala_dedicated_coordinator.html
M docs/build/asf-site-html/topics/impala_default_file_format.html
M docs/build/asf-site-html/topics/impala_default_hints_insert_statement.html
M docs/build/asf-site-html/topics/impala_default_join_distribution_mode.html
M docs/build/asf-site-html/topics/impala_default_spillable_buffer_size.html
M docs/build/asf-site-html/topics/impala_default_transactional_type.html
M docs/build/asf-site-html/topics/impala_delegation.html
M docs/build/asf-site-html/topics/impala_delete.html
M docs/build/asf-site-html/topics/impala_delete_stats_in_truncate.html
M docs/build/asf-site-html/topics/impala_describe.html
M docs/build/asf-site-html/topics/impala_development.html
M docs/build/asf-site-html/topics/impala_disable_codegen.html
M docs/build/asf-site-html/topics/impala_disable_codegen_rows_threshold.html
M 

[Impala-ASF-CR](asf-site) Add documentation, update links for 4.4.0

2024-04-16 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/21311


Change subject: Add documentation, update links for 4.4.0
..

Add documentation, update links for 4.4.0

Change-Id: Ibb93f7ba80b7a065ea83660fc75be9b065138ad9
---
M docs/build/asf-site-html/index.html
M docs/build/asf-site-html/shared/ImpalaVariables.html
M docs/build/asf-site-html/shared/impala_common.html
M docs/build/asf-site-html/topics/impala_abort_on_error.html
M docs/build/asf-site-html/topics/impala_adls.html
M docs/build/asf-site-html/topics/impala_admin.html
M docs/build/asf-site-html/topics/impala_admission.html
M docs/build/asf-site-html/topics/impala_admission_config.html
M docs/build/asf-site-html/topics/impala_aggregate_functions.html
M docs/build/asf-site-html/topics/impala_aliases.html
M docs/build/asf-site-html/topics/impala_allow_erasure_coded_files.html
M docs/build/asf-site-html/topics/impala_allow_unsupported_formats.html
M docs/build/asf-site-html/topics/impala_alter_database.html
M docs/build/asf-site-html/topics/impala_alter_table.html
M docs/build/asf-site-html/topics/impala_alter_view.html
M docs/build/asf-site-html/topics/impala_analytic_functions.html
M docs/build/asf-site-html/topics/impala_appx_count_distinct.html
M docs/build/asf-site-html/topics/impala_appx_median.html
M docs/build/asf-site-html/topics/impala_array.html
M docs/build/asf-site-html/topics/impala_auditing.html
M docs/build/asf-site-html/topics/impala_authentication.html
M docs/build/asf-site-html/topics/impala_authorization.html
M docs/build/asf-site-html/topics/impala_avg.html
M docs/build/asf-site-html/topics/impala_avro.html
M docs/build/asf-site-html/topics/impala_batch_size.html
M docs/build/asf-site-html/topics/impala_bigint.html
M docs/build/asf-site-html/topics/impala_bit_functions.html
M docs/build/asf-site-html/topics/impala_boolean.html
M docs/build/asf-site-html/topics/impala_breakpad.html
M docs/build/asf-site-html/topics/impala_broadcast_bytes_limit.html
M docs/build/asf-site-html/topics/impala_buffer_pool_limit.html
M docs/build/asf-site-html/topics/impala_char.html
M docs/build/asf-site-html/topics/impala_client.html
M docs/build/asf-site-html/topics/impala_comment.html
M docs/build/asf-site-html/topics/impala_comments.html
M docs/build/asf-site-html/topics/impala_complex_types.html
M docs/build/asf-site-html/topics/impala_components.html
M docs/build/asf-site-html/topics/impala_compression_codec.html
M docs/build/asf-site-html/topics/impala_compute_stats.html
M docs/build/asf-site-html/topics/impala_compute_stats_min_sample_size.html
M docs/build/asf-site-html/topics/impala_concepts.html
M docs/build/asf-site-html/topics/impala_conditional_functions.html
M docs/build/asf-site-html/topics/impala_config.html
M docs/build/asf-site-html/topics/impala_config_options.html
M docs/build/asf-site-html/topics/impala_config_performance.html
M docs/build/asf-site-html/topics/impala_connecting.html
M docs/build/asf-site-html/topics/impala_conversion_functions.html
M docs/build/asf-site-html/topics/impala_count.html
M docs/build/asf-site-html/topics/impala_create_database.html
M docs/build/asf-site-html/topics/impala_create_function.html
M docs/build/asf-site-html/topics/impala_create_role.html
M docs/build/asf-site-html/topics/impala_create_table.html
M docs/build/asf-site-html/topics/impala_create_view.html
M docs/build/asf-site-html/topics/impala_custom_timezones.html
M docs/build/asf-site-html/topics/impala_data_cache.html
M docs/build/asf-site-html/topics/impala_databases.html
M docs/build/asf-site-html/topics/impala_datatypes.html
M docs/build/asf-site-html/topics/impala_date.html
M docs/build/asf-site-html/topics/impala_datetime_functions.html
M docs/build/asf-site-html/topics/impala_ddl.html
M docs/build/asf-site-html/topics/impala_debug_action.html
M docs/build/asf-site-html/topics/impala_decimal.html
M docs/build/asf-site-html/topics/impala_decimal_v2.html
M docs/build/asf-site-html/topics/impala_dedicated_coordinator.html
M docs/build/asf-site-html/topics/impala_default_file_format.html
M docs/build/asf-site-html/topics/impala_default_hints_insert_statement.html
M docs/build/asf-site-html/topics/impala_default_join_distribution_mode.html
M docs/build/asf-site-html/topics/impala_default_spillable_buffer_size.html
M docs/build/asf-site-html/topics/impala_default_transactional_type.html
M docs/build/asf-site-html/topics/impala_delegation.html
M docs/build/asf-site-html/topics/impala_delete.html
M docs/build/asf-site-html/topics/impala_delete_stats_in_truncate.html
M docs/build/asf-site-html/topics/impala_describe.html
M docs/build/asf-site-html/topics/impala_development.html
M docs/build/asf-site-html/topics/impala_disable_codegen.html
M docs/build/asf-site-html/topics/impala_disable_codegen_rows_threshold.html
M docs/build/asf-site-html/topics/impala_disable_hbase_num_rows_estimate.html
M 

[Impala-ASF-CR](asf-site) Update download links for release 4.4.0

2024-04-16 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/21307


Change subject: Update download links for release 4.4.0
..

Update download links for release 4.4.0

Change-Id: Ie0e8736154e5289e02d5ec5cf5f664cd4de2739d
---
M downloads.html
1 file changed, 13 insertions(+), 4 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/07/21307/1
--
To view, visit http://gerrit.cloudera.org:8080/21307
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: asf-site
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ie0e8736154e5289e02d5ec5cf5f664cd4de2739d
Gerrit-Change-Number: 21307
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-13002: Iceberg V2 tables with Avro delete files aren't read properly

2024-04-15 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/21301


Change subject: IMPALA-13002: Iceberg V2 tables with Avro delete files aren't 
read properly
..

IMPALA-13002: Iceberg V2 tables with Avro delete files aren't read properly

If the Iceberg table has Avro delete files (e.g. by setting
'write.delete.format.default'='avro') then Impala won't be able to read
the contents of the delete files properly. It is because the avro
schema is not set properly for the virtual delete table.

Testing:
 * added e2e tests with position delete files of all kinds

Change-Id: Iff13198991caf32c51cd9e0ace4454fd00216cf6
---
M fe/src/main/java/org/apache/impala/catalog/IcebergDeleteTable.java
A 
testdata/workloads/functional-query/queries/QueryTest/iceberg-mixed-format-position-deletes.test
M tests/query_test/test_iceberg.py
3 files changed, 143 insertions(+), 0 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/01/21301/1
--
To view, visit http://gerrit.cloudera.org:8080/21301
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Iff13198991caf32c51cd9e0ace4454fd00216cf6
Gerrit-Change-Number: 21301
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-12810: Simplify IcebergDeleteNode and IcebergDeleteBuilder

2024-04-11 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21258 )

Change subject: IMPALA-12810: Simplify IcebergDeleteNode and 
IcebergDeleteBuilder
..


Patch Set 7:

(1 comment)

Thanks for the comment!

http://gerrit.cloudera.org:8080/#/c/21258/7/tests/query_test/test_iceberg.py
File tests/query_test/test_iceberg.py:

http://gerrit.cloudera.org:8080/#/c/21258/7/tests/query_test/test_iceberg.py@1455
PS7, Line 1455: if 
vector.get_value('exec_option')['disable_optimized_iceberg_v2_read'] == 0:
> Here the code says that "if we don't disable the V2 read optimisations then
Ah, those negations... you're right.

Anyway, it revealed a bug in the test code that I fixed in the new PS.



--
To view, visit http://gerrit.cloudera.org:8080/21258
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3ba02b33433990950b49628f11e732e01ed8a34d
Gerrit-Change-Number: 21258
Gerrit-PatchSet: 7
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 11 Apr 2024 15:28:13 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12810: Simplify IcebergDeleteNode and IcebergDeleteBuilder

2024-04-11 Thread Zoltan Borok-Nagy (Code Review)
Hello Daniel Becker, Gabor Kaszab, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21258

to look at the new patch set (#10).

Change subject: IMPALA-12810: Simplify IcebergDeleteNode and 
IcebergDeleteBuilder
..

IMPALA-12810: Simplify IcebergDeleteNode and IcebergDeleteBuilder

Now that we have the DIRECTED distribution mode, some parts of
IcebergDeleteNode and IcebergDeleteBuilder became dead code. It is
time to simplify the above classes.

IcebergDeleteBuilder and KrpcDataStreamSender now also tolerate
NULL file paths which are also not an error in the hash join mode.

Change-Id: I3ba02b33433990950b49628f11e732e01ed8a34d
---
M be/src/exec/iceberg-delete-builder.cc
M be/src/exec/iceberg-delete-builder.h
M be/src/exec/iceberg-delete-node.cc
M be/src/exec/iceberg-delete-node.h
M be/src/runtime/krpc-data-stream-sender.cc
M testdata/data/README
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/data/6348b186d3705f6b-370ecfbb_152551971_data.0.parq
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/data/delete_null_first.parq
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/data/delete_null_first_and_last.parq
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/data/delete_null_last.parq
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/data/delete_null_single.parq
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/data/delete_three_nulls.parq
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/data/same_data.0.parq
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/metadata/3a813d5e-fc0b-485f-bbba-010972a9f20a-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/metadata/e90d28aa-cd17-4655-ad04-aa3711792576-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/metadata/snap-5852039568708655222-1-3a813d5e-fc0b-485f-bbba-010972a9f20a.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/metadata/v1.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/metadata/version-hint.text
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
M tests/query_test/test_iceberg.py
21 files changed, 219 insertions(+), 104 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/58/21258/10
--
To view, visit http://gerrit.cloudera.org:8080/21258
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3ba02b33433990950b49628f11e732e01ed8a34d
Gerrit-Change-Number: 21258
Gerrit-PatchSet: 10
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-12991: Eliminate unnecessary SORT for Iceberg DELETEs

2024-04-11 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21285 )

Change subject: IMPALA-12991: Eliminate unnecessary SORT for Iceberg DELETEs
..


Patch Set 2: Code-Review+2

(1 comment)

Thanks for the comment! Carry +2

http://gerrit.cloudera.org:8080/#/c/21285/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/21285/1//COMMIT_MSG@9
PS1, Line 9: using
> nit: using
Done



--
To view, visit http://gerrit.cloudera.org:8080/21285
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I94a691e7990228a1ec2de03e6ad90ebb97931581
Gerrit-Change-Number: 21285
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 11 Apr 2024 12:33:32 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12991: Eliminate unnecessary SORT for Iceberg DELETEs

2024-04-11 Thread Zoltan Borok-Nagy (Code Review)
Hello Gabor Kaszab, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21285

to look at the new patch set (#2).

Change subject: IMPALA-12991: Eliminate unnecessary SORT for Iceberg DELETEs
..

IMPALA-12991: Eliminate unnecessary SORT for Iceberg DELETEs

Since we are using IcebergBufferedDeleteSink, which sorts the data
before flushing, there is no need to add a SORT node before the sink.

Testing:
 * updated planner tests

Change-Id: I94a691e7990228a1ec2de03e6ad90ebb97931581
---
M fe/src/main/java/org/apache/impala/planner/Planner.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-delete.test
2 files changed, 7 insertions(+), 41 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/85/21285/2
--
To view, visit http://gerrit.cloudera.org:8080/21285
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I94a691e7990228a1ec2de03e6ad90ebb97931581
Gerrit-Change-Number: 21285
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-12970: Fix ConcurrentModificationException for Iceberg table scans

2024-04-11 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21267 )

Change subject: IMPALA-12970: Fix ConcurrentModificationException for Iceberg 
table scans
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/21267/1/fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java
File fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java:

http://gerrit.cloudera.org:8080/#/c/21267/1/fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java@107
PS1, Line 107:   fileDescs_ = new ArrayList<>(fileDescs_);
 :   Collections.sort(fileDescs_);
> I'm experimenting with your suggestion and see that it would bring too much
I see, thanks Gabor for investigating this.



--
To view, visit http://gerrit.cloudera.org:8080/21267
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iafe57f05ffa0fa6a0875c141cfafd5ee1607a5c3
Gerrit-Change-Number: 21267
Gerrit-PatchSet: 1
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Peter Rozsa 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 11 Apr 2024 07:43:42 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12991: Eliminate unnecessary SORT for Iceberg DELETEs

2024-04-10 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/21285


Change subject: IMPALA-12991: Eliminate unnecessary SORT for Iceberg DELETEs
..

IMPALA-12991: Eliminate unnecessary SORT for Iceberg DELETEs

Since we are useing IcebergBufferedDeleteSink, which sorts the data
before flushing, there is no need to add a SORT node before the sink.

Testing:
 * updated planner tests

Change-Id: I94a691e7990228a1ec2de03e6ad90ebb97931581
---
M fe/src/main/java/org/apache/impala/planner/Planner.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-delete.test
2 files changed, 7 insertions(+), 41 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/85/21285/1
--
To view, visit http://gerrit.cloudera.org:8080/21285
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I94a691e7990228a1ec2de03e6ad90ebb97931581
Gerrit-Change-Number: 21285
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-12810: Simplify IcebergDeleteNode and IcebergDeleteBuilder

2024-04-10 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21258 )

Change subject: IMPALA-12810: Simplify IcebergDeleteNode and 
IcebergDeleteBuilder
..


Patch Set 9:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/21258/7/be/src/runtime/krpc-data-stream-sender.cc
File be/src/runtime/krpc-data-stream-sender.cc:

http://gerrit.cloudera.org:8080/#/c/21258/7/be/src/runtime/krpc-data-stream-sender.cc@1125
PS7, Line 1125: A
> I think it's a bit unexpected/abrupt after the previous paragraphs which ar
Done


http://gerrit.cloudera.org:8080/#/c/21258/8/testdata/data/README
File testdata/data/README:

http://gerrit.cloudera.org:8080/#/c/21258/8/testdata/data/README@1134
PS8, Line 1134: 1) Created the table via Impala and added some records to it.
> Could you include the CREATE TABLE and INSERT statements for reproducibilit
Done



--
To view, visit http://gerrit.cloudera.org:8080/21258
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3ba02b33433990950b49628f11e732e01ed8a34d
Gerrit-Change-Number: 21258
Gerrit-PatchSet: 9
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 10 Apr 2024 13:38:41 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12810: Simplify IcebergDeleteNode and IcebergDeleteBuilder

2024-04-10 Thread Zoltan Borok-Nagy (Code Review)
Hello Daniel Becker, Gabor Kaszab, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21258

to look at the new patch set (#9).

Change subject: IMPALA-12810: Simplify IcebergDeleteNode and 
IcebergDeleteBuilder
..

IMPALA-12810: Simplify IcebergDeleteNode and IcebergDeleteBuilder

Now that we have the DIRECTED distribution mode, some parts of
IcebergDeleteNode and IcebergDeleteBuilder became dead code. It is
time to simplify the above classes.

IcebergDeleteBuilder and KrpcDataStreamSender now also tolerate
NULL file paths which are also not an error in the hash join mode.

Change-Id: I3ba02b33433990950b49628f11e732e01ed8a34d
---
M be/src/exec/iceberg-delete-builder.cc
M be/src/exec/iceberg-delete-builder.h
M be/src/exec/iceberg-delete-node.cc
M be/src/exec/iceberg-delete-node.h
M be/src/runtime/krpc-data-stream-sender.cc
M testdata/data/README
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/data/6348b186d3705f6b-370ecfbb_152551971_data.0.parq
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/data/delete_null_first.parq
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/data/delete_null_first_and_last.parq
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/data/delete_null_last.parq
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/data/delete_null_single.parq
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/data/delete_three_nulls.parq
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/data/same_data.0.parq
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/metadata/3a813d5e-fc0b-485f-bbba-010972a9f20a-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/metadata/e90d28aa-cd17-4655-ad04-aa3711792576-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/metadata/snap-5852039568708655222-1-3a813d5e-fc0b-485f-bbba-010972a9f20a.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/metadata/v1.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/metadata/version-hint.text
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
M tests/query_test/test_iceberg.py
21 files changed, 217 insertions(+), 104 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/58/21258/9
--
To view, visit http://gerrit.cloudera.org:8080/21258
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3ba02b33433990950b49628f11e732e01ed8a34d
Gerrit-Change-Number: 21258
Gerrit-PatchSet: 9
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-12810: Simplify IcebergDeleteNode and IcebergDeleteBuilder

2024-04-10 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21258 )

Change subject: IMPALA-12810: Simplify IcebergDeleteNode and 
IcebergDeleteBuilder
..


Patch Set 8:

(5 comments)

Thanks for the comments!

http://gerrit.cloudera.org:8080/#/c/21258/7/be/src/exec/iceberg-delete-builder.cc
File be/src/exec/iceberg-delete-builder.cc:

http://gerrit.cloudera.org:8080/#/c/21258/7/be/src/exec/iceberg-delete-builder.cc@283
PS7, Line 283:   ErrorMsg(TErrorCode::GENERAL, "NULL found as file_path 
in delete file"));
> Can't we return or continue here?
Sure, we can continue


http://gerrit.cloudera.org:8080/#/c/21258/7/be/src/runtime/krpc-data-stream-sender.cc
File be/src/runtime/krpc-data-stream-sender.cc:

http://gerrit.cloudera.org:8080/#/c/21258/7/be/src/runtime/krpc-data-stream-sender.cc@
PS7, Line : (filename_value_ss.len == 0 && 
prev_channels.empty()));
> This is triggered when there is 2 consecutive rows in the delete file where
Yes, we have a delete file with only three NULLs


http://gerrit.cloudera.org:8080/#/c/21258/7/be/src/runtime/krpc-data-stream-sender.cc@1125
PS7, Line 1125: Or
> Nit: something like "A third case is..." would be nicer.
I can change it if you feel strong about it, but I think concise and simple 
phrasing is preferable in comments.


http://gerrit.cloudera.org:8080/#/c/21258/7/testdata/datasets/functional/functional_schema_template.sql
File testdata/datasets/functional/functional_schema_template.sql:

http://gerrit.cloudera.org:8080/#/c/21258/7/testdata/datasets/functional/functional_schema_template.sql@3957
PS7, Line 3957: iceberg_v2_null_delete_record
> Could you add some details about this table to the README? Would be nice to
Done


http://gerrit.cloudera.org:8080/#/c/21258/7/tests/query_test/test_iceberg.py
File tests/query_test/test_iceberg.py:

http://gerrit.cloudera.org:8080/#/c/21258/7/tests/query_test/test_iceberg.py@1455
PS7, Line 1455: if 
vector.get_value('exec_option')['disable_optimized_iceberg_v2_read'] == 0:
> Would it make sense to test where we have DIRECTED mode to see that the Krp
We only test DIRECTED mode + V2 operator (which go in hand with each other).

When 'disable_optimized_iceberg_v2_read' is true, we fallback to the old anti 
hash join which doesn't validate the delete records.



--
To view, visit http://gerrit.cloudera.org:8080/21258
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3ba02b33433990950b49628f11e732e01ed8a34d
Gerrit-Change-Number: 21258
Gerrit-PatchSet: 8
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 10 Apr 2024 13:08:22 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12810: Simplify IcebergDeleteNode and IcebergDeleteBuilder

2024-04-10 Thread Zoltan Borok-Nagy (Code Review)
Hello Daniel Becker, Gabor Kaszab, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21258

to look at the new patch set (#8).

Change subject: IMPALA-12810: Simplify IcebergDeleteNode and 
IcebergDeleteBuilder
..

IMPALA-12810: Simplify IcebergDeleteNode and IcebergDeleteBuilder

Now that we have the DIRECTED distribution mode, some parts of
IcebergDeleteNode and IcebergDeleteBuilder became dead code. It is
time to simplify the above classes.

IcebergDeleteBuilder and KrpcDataStreamSender now also tolerate
NULL file paths which are also not an error in the hash join mode.

Change-Id: I3ba02b33433990950b49628f11e732e01ed8a34d
---
M be/src/exec/iceberg-delete-builder.cc
M be/src/exec/iceberg-delete-builder.h
M be/src/exec/iceberg-delete-node.cc
M be/src/exec/iceberg-delete-node.h
M be/src/runtime/krpc-data-stream-sender.cc
M testdata/data/README
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/data/6348b186d3705f6b-370ecfbb_152551971_data.0.parq
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/data/delete_null_first.parq
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/data/delete_null_first_and_last.parq
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/data/delete_null_last.parq
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/data/delete_null_single.parq
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/data/delete_three_nulls.parq
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/data/same_data.0.parq
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/metadata/3a813d5e-fc0b-485f-bbba-010972a9f20a-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/metadata/e90d28aa-cd17-4655-ad04-aa3711792576-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/metadata/snap-5852039568708655222-1-3a813d5e-fc0b-485f-bbba-010972a9f20a.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/metadata/v1.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/metadata/version-hint.text
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
M tests/query_test/test_iceberg.py
21 files changed, 210 insertions(+), 104 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/58/21258/8
--
To view, visit http://gerrit.cloudera.org:8080/21258
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3ba02b33433990950b49628f11e732e01ed8a34d
Gerrit-Change-Number: 21258
Gerrit-PatchSet: 8
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-12810: Simplify IcebergDeleteNode and IcebergDeleteBuilder

2024-04-09 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21258 )

Change subject: IMPALA-12810: Simplify IcebergDeleteNode and 
IcebergDeleteBuilder
..


Patch Set 7:

(4 comments)

Thanks for the comments!

http://gerrit.cloudera.org:8080/#/c/21258/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/21258/2//COMMIT_MSG@13
PS2, Line 13: IcebergDeleteBuilder and KrpcDataStreamSender now also tolerate
> It is not valid, but we have seen such errors at certain customers. Unfortu
Actually there are still cases then the IcebergDeleteBuilder receives NULL file 
paths, e.g. num_nodes=1, or there's only a single data file


http://gerrit.cloudera.org:8080/#/c/21258/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/21258/4//COMMIT_MSG@13
PS4, Line 13: IcebergDeleteBuilder and KrpcDataStreamSender now also
> Is it possible to add a test for this?
Yeah I've just added tests


http://gerrit.cloudera.org:8080/#/c/21258/4/be/src/exec/iceberg-delete-builder.h
File be/src/exec/iceberg-delete-builder.h:

http://gerrit.cloudera.org:8080/#/c/21258/4/be/src/exec/iceberg-delete-builder.h@79
PS4, Line 79: /// Shared Build
> Shouldn't we mention DIRECTED mode here?
Actually we should only mention the DIRECTED mode, since this is the only 
supported mode.


http://gerrit.cloudera.org:8080/#/c/21258/4/be/src/exec/iceberg-delete-builder.cc
File be/src/exec/iceberg-delete-builder.cc:

http://gerrit.cloudera.org:8080/#/c/21258/4/be/src/exec/iceberg-delete-builder.cc@272
PS4, Line 272: state
> Do we use 'state' anywhere? If not, this parameter could also be removed fr
In PS5 we use again



--
To view, visit http://gerrit.cloudera.org:8080/21258
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3ba02b33433990950b49628f11e732e01ed8a34d
Gerrit-Change-Number: 21258
Gerrit-PatchSet: 7
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 09 Apr 2024 16:32:44 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12810: Simplify IcebergDeleteNode and IcebergDeleteBuilder

2024-04-09 Thread Zoltan Borok-Nagy (Code Review)
Hello Daniel Becker, Gabor Kaszab, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21258

to look at the new patch set (#7).

Change subject: IMPALA-12810: Simplify IcebergDeleteNode and 
IcebergDeleteBuilder
..

IMPALA-12810: Simplify IcebergDeleteNode and IcebergDeleteBuilder

Now that we have the DIRECTED distribution mode, some parts of
IcebergDeleteNode and IcebergDeleteBuilder became dead code. It is
time to simplify the above classes.

IcebergDeleteBuilder and KrpcDataStreamSender now also tolerate
NULL file paths which are also not an error in the hash join mode.

Change-Id: I3ba02b33433990950b49628f11e732e01ed8a34d
---
M be/src/exec/iceberg-delete-builder.cc
M be/src/exec/iceberg-delete-builder.h
M be/src/exec/iceberg-delete-node.cc
M be/src/exec/iceberg-delete-node.h
M be/src/runtime/krpc-data-stream-sender.cc
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/data/6348b186d3705f6b-370ecfbb_152551971_data.0.parq
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/data/delete_null_first.parq
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/data/delete_null_first_and_last.parq
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/data/delete_null_last.parq
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/data/delete_null_single.parq
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/data/delete_three_nulls.parq
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/data/same_data.0.parq
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/metadata/3a813d5e-fc0b-485f-bbba-010972a9f20a-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/metadata/e90d28aa-cd17-4655-ad04-aa3711792576-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/metadata/snap-5852039568708655222-1-3a813d5e-fc0b-485f-bbba-010972a9f20a.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/metadata/v1.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_null_delete_record/metadata/version-hint.text
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
M tests/query_test/test_iceberg.py
20 files changed, 172 insertions(+), 104 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/58/21258/7
--
To view, visit http://gerrit.cloudera.org:8080/21258
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3ba02b33433990950b49628f11e732e01ed8a34d
Gerrit-Change-Number: 21258
Gerrit-PatchSet: 7
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-12810: Simplify IcebergDeleteNode and IcebergDeleteBuilder

2024-04-09 Thread Zoltan Borok-Nagy (Code Review)
Hello Daniel Becker, Gabor Kaszab, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21258

to look at the new patch set (#6).

Change subject: IMPALA-12810: Simplify IcebergDeleteNode and 
IcebergDeleteBuilder
..

IMPALA-12810: Simplify IcebergDeleteNode and IcebergDeleteBuilder

Now that we have the DIRECTED distribution mode, some parts of
IcebergDeleteNode and IcebergDeleteBuilder became dead code. It is
time to simplify the above classes.

IcebergDeleteBuilder and KrpcDataStreamSender now also tolerate
NULL file paths which are also not an error in the hash join mode.

Change-Id: I3ba02b33433990950b49628f11e732e01ed8a34d
---
M be/src/exec/iceberg-delete-builder.cc
M be/src/exec/iceberg-delete-builder.h
M be/src/exec/iceberg-delete-node.cc
M be/src/exec/iceberg-delete-node.h
M be/src/runtime/krpc-data-stream-sender.cc
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
M tests/query_test/test_iceberg.py
8 files changed, 82 insertions(+), 104 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/58/21258/6
--
To view, visit http://gerrit.cloudera.org:8080/21258
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3ba02b33433990950b49628f11e732e01ed8a34d
Gerrit-Change-Number: 21258
Gerrit-PatchSet: 6
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-12810: Simplify IcebergDeleteNode and IcebergDeleteBuilder

2024-04-09 Thread Zoltan Borok-Nagy (Code Review)
Hello Daniel Becker, Gabor Kaszab, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21258

to look at the new patch set (#5).

Change subject: IMPALA-12810: Simplify IcebergDeleteNode and 
IcebergDeleteBuilder
..

IMPALA-12810: Simplify IcebergDeleteNode and IcebergDeleteBuilder

Now that we have the DIRECTED distribution mode, some parts of
IcebergDeleteNode and IcebergDeleteBuilder became dead code. It is
time to simplify the above classes.

IcebergDeleteBuilder and KrpcDataStreamSender now also tolerate
NULL file paths which are also not an error in the hash join mode.

Change-Id: I3ba02b33433990950b49628f11e732e01ed8a34d
---
M be/src/exec/iceberg-delete-builder.cc
M be/src/exec/iceberg-delete-builder.h
M be/src/exec/iceberg-delete-node.cc
M be/src/exec/iceberg-delete-node.h
M be/src/runtime/krpc-data-stream-sender.cc
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
M tests/query_test/test_iceberg.py
8 files changed, 79 insertions(+), 96 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/58/21258/5
--
To view, visit http://gerrit.cloudera.org:8080/21258
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3ba02b33433990950b49628f11e732e01ed8a34d
Gerrit-Change-Number: 21258
Gerrit-PatchSet: 5
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-12810: Simplify IcebergDeleteNode and IcebergDeleteBuilder

2024-04-09 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21258 )

Change subject: IMPALA-12810: Simplify IcebergDeleteNode and 
IcebergDeleteBuilder
..


Patch Set 4:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/21258/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/21258/2//COMMIT_MSG@9
PS2, Line 9: DIRECTED
> DIRECTED
Ouch, fixed in the Jira ticket as well.


http://gerrit.cloudera.org:8080/#/c/21258/2//COMMIT_MSG@13
PS2, Line 13: IcebergDeleteBuilder now also tolerates NULL file paths which are
> I'm wondering if there is a valid use case when a file path is null. Is it
It is not valid, but we have seen such errors at certain customers. 
Unfortunately we don't know which engine wrote those position delete files :(

But now that I think of it, with DIRECTED mode, IcebergDeleteBuilder will never 
get NULL file paths, as we will never have an entry for these in the routing 
map (filepath_to_hosts_). So I'm just keeping that logic as is and adding a 
DCHECK.

Now it depends on KrpcDataStreamSender how we handle NULL file paths. I'll try 
to prepare a test table for this.



--
To view, visit http://gerrit.cloudera.org:8080/21258
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3ba02b33433990950b49628f11e732e01ed8a34d
Gerrit-Change-Number: 21258
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 09 Apr 2024 11:22:22 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12810: Simplify IcebergDeleteNode and IcebergDeleteBuilder

2024-04-09 Thread Zoltan Borok-Nagy (Code Review)
Hello Gabor Kaszab, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21258

to look at the new patch set (#4).

Change subject: IMPALA-12810: Simplify IcebergDeleteNode and 
IcebergDeleteBuilder
..

IMPALA-12810: Simplify IcebergDeleteNode and IcebergDeleteBuilder

Now that we have the DIRECTED distribution mode, some parts of
IcebergDeleteNode and IcebergDeleteBuilder became dead code. It is
time to simplify the above classes.

IcebergDeleteBuilder now also tolerates NULL file paths which are
not an error in the hash join mode.

Change-Id: I3ba02b33433990950b49628f11e732e01ed8a34d
---
M be/src/exec/iceberg-delete-builder.cc
M be/src/exec/iceberg-delete-builder.h
M be/src/exec/iceberg-delete-node.cc
M be/src/exec/iceberg-delete-node.h
4 files changed, 30 insertions(+), 92 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/58/21258/4
--
To view, visit http://gerrit.cloudera.org:8080/21258
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3ba02b33433990950b49628f11e732e01ed8a34d
Gerrit-Change-Number: 21258
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-12810: Simplify IcebergDeleteNode and IcebergDeleteBuilder

2024-04-09 Thread Zoltan Borok-Nagy (Code Review)
Hello Gabor Kaszab, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21258

to look at the new patch set (#3).

Change subject: IMPALA-12810: Simplify IcebergDeleteNode and 
IcebergDeleteBuilder
..

IMPALA-12810: Simplify IcebergDeleteNode and IcebergDeleteBuilder

Now that we have the DIRECTED distribution mode, some parts of
IcebergDeleteNode and IcebergDeleteBuilder became dead code. It is
time to simplify the above classes.

IcebergDeleteBuilder now also tolerates NULL file paths which are
not an error in the hash join mode.

Change-Id: I3ba02b33433990950b49628f11e732e01ed8a34d
---
M be/src/exec/iceberg-delete-builder.cc
M be/src/exec/iceberg-delete-builder.h
M be/src/exec/iceberg-delete-node.cc
M be/src/exec/iceberg-delete-node.h
4 files changed, 30 insertions(+), 93 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/58/21258/3
--
To view, visit http://gerrit.cloudera.org:8080/21258
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3ba02b33433990950b49628f11e732e01ed8a34d
Gerrit-Change-Number: 21258
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-12970: Fix ConcurrentModificationException for Iceberg table scans

2024-04-09 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21267 )

Change subject: IMPALA-12970: Fix ConcurrentModificationException for Iceberg 
table scans
..


Patch Set 2: Code-Review+2

(1 comment)

Proposed an alternative approach. But I'm also OK to quickly push this fix and 
improve it later.

http://gerrit.cloudera.org:8080/#/c/21267/1/fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java
File fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java:

http://gerrit.cloudera.org:8080/#/c/21267/1/fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java@107
PS1, Line 107:   fileDescs_ = new ArrayList<>(fileDescs_);
 :   Collections.sort(fileDescs_);
Alternatively we could do the sorting during file metadata loading, so we 
wouldn't need to copy and sort fds for every Iceberg scan node.



--
To view, visit http://gerrit.cloudera.org:8080/21267
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iafe57f05ffa0fa6a0875c141cfafd5ee1607a5c3
Gerrit-Change-Number: 21267
Gerrit-PatchSet: 2
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Peter Rozsa 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 09 Apr 2024 08:45:55 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12810: Simplify IcebergDeleteNode and IcebergDeleteBuilder

2024-04-08 Thread Zoltan Borok-Nagy (Code Review)
Hello Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21258

to look at the new patch set (#2).

Change subject: IMPALA-12810: Simplify IcebergDeleteNode and 
IcebergDeleteBuilder
..

IMPALA-12810: Simplify IcebergDeleteNode and IcebergDeleteBuilder

Now that we have the BROADCAST distribution mode, some parts of
IcebergDeleteNode and IcebergDeleteBuilder became dead code. It is
time to simplify the above classes.

IcebergDeleteBuilder now also tolerates NULL file paths which are
not an error in the hash join mode.

Change-Id: I3ba02b33433990950b49628f11e732e01ed8a34d
---
M be/src/exec/iceberg-delete-builder.cc
M be/src/exec/iceberg-delete-builder.h
M be/src/exec/iceberg-delete-node.cc
M be/src/exec/iceberg-delete-node.h
4 files changed, 30 insertions(+), 93 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/58/21258/2
--
To view, visit http://gerrit.cloudera.org:8080/21258
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3ba02b33433990950b49628f11e732e01ed8a34d
Gerrit-Change-Number: 21258
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-12810: Simplify IcebergDeleteNode and IcebergDeleteBuilder

2024-04-08 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/21258


Change subject: IMPALA-12810: Simplify IcebergDeleteNode and 
IcebergDeleteBuilder
..

IMPALA-12810: Simplify IcebergDeleteNode and IcebergDeleteBuilder

Now that we have the BROADCAST distribution mode, some parts of
IcebergDeleteNode and IcebergDeleteBuilder became dead code. It is
time to simplify the above classes.

IcebergDeleteBuilder now also tolerates NULL file paths which are
not an error in the hash join mode.

Change-Id: I3ba02b33433990950b49628f11e732e01ed8a34d
---
M be/src/exec/iceberg-delete-builder.cc
M be/src/exec/iceberg-delete-builder.h
M be/src/exec/iceberg-delete-node.cc
M be/src/exec/iceberg-delete-node.h
M be/src/runtime/coordinator.cc
M be/src/scheduling/scheduler.cc
6 files changed, 46 insertions(+), 102 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/58/21258/1
--
To view, visit http://gerrit.cloudera.org:8080/21258
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I3ba02b33433990950b49628f11e732e01ed8a34d
Gerrit-Change-Number: 21258
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-12894: Addendum: Re-enable test plain count star optimization

2024-04-08 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/21249 )

Change subject: IMPALA-12894: Addendum: Re-enable 
test_plain_count_star_optimization
..

IMPALA-12894: Addendum: Re-enable test_plain_count_star_optimization

test_plain_count_star_optimization was disabled by IMPALA-12894
part 1, and part 2 didn't re-enable it. This patch re-enables it.

Change-Id: I30629632742c0d402a6bb852a169359edac59eba
Reviewed-on: http://gerrit.cloudera.org:8080/21249
Tested-by: Impala Public Jenkins 
Reviewed-by: Gabor Kaszab 
---
M tests/query_test/test_iceberg.py
1 file changed, 0 insertions(+), 1 deletion(-)

Approvals:
  Impala Public Jenkins: Verified
  Gabor Kaszab: Looks good to me, approved

--
To view, visit http://gerrit.cloudera.org:8080/21249
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I30629632742c0d402a6bb852a169359edac59eba
Gerrit-Change-Number: 21249
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-12894: Addendum: Re-enable test plain count star optimization

2024-04-05 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/21249


Change subject: IMPALA-12894: Addendum: Re-enable 
test_plain_count_star_optimization
..

IMPALA-12894: Addendum: Re-enable test_plain_count_star_optimization

test_plain_count_star_optimization was disabled by IMPALA-12894
part 1, and part 2 didn't re-enable it. This patch re-enables it.

Change-Id: I30629632742c0d402a6bb852a169359edac59eba
---
M tests/query_test/test_iceberg.py
1 file changed, 0 insertions(+), 1 deletion(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/21249/1
--
To view, visit http://gerrit.cloudera.org:8080/21249
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I30629632742c0d402a6bb852a169359edac59eba
Gerrit-Change-Number: 21249
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-12609: Implement SHOW METADATA TABLES IN statement to list Iceberg Metadata tables

2024-04-02 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21026 )

Change subject: IMPALA-12609: Implement SHOW METADATA TABLES IN statement to 
list Iceberg Metadata tables
..


Patch Set 15: Code-Review+2

(1 comment)

LGTM

http://gerrit.cloudera.org:8080/#/c/21026/15/fe/src/test/java/org/apache/impala/authorization/AuthorizationStmtTest.java
File 
fe/src/test/java/org/apache/impala/authorization/AuthorizationStmtTest.java:

http://gerrit.cloudera.org:8080/#/c/21026/15/fe/src/test/java/org/apache/impala/authorization/AuthorizationStmtTest.java@1258
PS15, Line 1258: functional_parquet
> Removed the table name because "functional_parquet.iceberg_query_metadata"
Do we know why? Is it related to local / legacy catalog modes?

functional_parquet.*.* can be a bit misleading. But I'm OK with fixing it in a 
follow-up Jira.



--
To view, visit http://gerrit.cloudera.org:8080/21026
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ide10ccf10fc0abf5c270119ba7092c67e712ec49
Gerrit-Change-Number: 21026
Gerrit-PatchSet: 15
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 02 Apr 2024 09:38:39 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12609: Implement SHOW METADATA TABLES IN statement to list Iceberg Metadata tables

2024-03-28 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21026 )

Change subject: IMPALA-12609: Implement SHOW METADATA TABLES IN statement to 
list Iceberg Metadata tables
..


Patch Set 11:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/21026/11/fe/src/main/java/org/apache/impala/service/JniFrontend.java
File fe/src/main/java/org/apache/impala/service/JniFrontend.java:

http://gerrit.cloudera.org:8080/#/c/21026/11/fe/src/main/java/org/apache/impala/service/JniFrontend.java@279
PS11, Line 279: params.getSession()
Doesn't it return null anyway if the session is not set?



--
To view, visit http://gerrit.cloudera.org:8080/21026
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ide10ccf10fc0abf5c270119ba7092c67e712ec49
Gerrit-Change-Number: 21026
Gerrit-PatchSet: 11
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 28 Mar 2024 15:23:20 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12894: (part 2) Fix optimized count(*) for Iceberg tables with dangling delete files

2024-03-28 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21190 )

Change subject: IMPALA-12894: (part 2) Fix optimized count(*) for Iceberg 
tables with dangling delete files
..


Patch Set 7: Code-Review+2

(1 comment)

Carry +2

http://gerrit.cloudera.org:8080/#/c/21190/6//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/21190/6//COMMIT_MSG@50
PS6, Line 50: SCAN
> This could be 'datafiles with deletes'.
Done



--
To view, visit http://gerrit.cloudera.org:8080/21190
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie3aca0b0a104f9ca4589cde9643f3f341d4ff99f
Gerrit-Change-Number: 21190
Gerrit-PatchSet: 7
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 28 Mar 2024 10:17:43 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12894: (part 2) Fix optimized count(*) for Iceberg tables with dangling delete files

2024-03-28 Thread Zoltan Borok-Nagy (Code Review)
Hello Daniel Becker, Gabor Kaszab, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21190

to look at the new patch set (#7).

Change subject: IMPALA-12894: (part 2) Fix optimized count(*) for Iceberg 
tables with dangling delete files
..

IMPALA-12894: (part 2) Fix optimized count(*) for Iceberg tables with dangling 
delete files

Impala can return incorrect results if a table has dangling delete
files. Dangling delete files are delete files that are part of the
snapshot but they are not applicable to any of the data files. We can
have such delete files after Spark's rewrite_data_files action.

During analysis we check the existence of delete files based on the
snapshot summary. If there are no delete files in the table, we just
replace the count(*) expression with NumericLiteral($record_count).
If there are delete files in the table (based on the summary), we set
optimize_count_star_for_iceberg_v2 in the query context.

Without optimize_count_star_for_iceberg_v2 in the query context, the
IcebergScanPlanner would create the following plan.

AGGREGATE
COUNT(*)
|
UNION ALL
   / \
  /   \
 / \
SCAN allANTI JOIN
datafiles  / \
without   /   \
deletes  SCAN SCAN
 datafilesdeletes
 with deletes

With optimize_count_star_for_iceberg_v2 the final plan looks like
the following:

  ArithmeticExpr(ADD)
  / \
 /   \
/ \
record_count   AGGREGATE
of all COUNT(*)
datafiles  |
withoutANTI JOIN
deletes   / \
 /   \
SCANSCAN
datafiles   deletes
with deletes

The ArithmeticExpr(ADD) and its left child (record_count) is created
by the analyzer, IcebergScanPlanner is responsible in creating the
plan under AGGREGATE COUNT(*). And if it has delete files and
optimize_count_star_for_iceberg_v2 is true, it knows it can omit
the original UNION ALL and its left child.

However, IcebergScanPlanner checks delete file existence based on the
result of planFiles(), hence dangling delete files are eliminated.
And if there are no delete files, IcebergScanPlanner assumes that
case is already handled by the Analyzer (i.e. it replaced count(*)
with NumericLiteral($record_count)). So it will incorrectly create a
normal SCAN plan of the table under COUNT(*), i.e. we end up
with this:

  ArithmeticExpr(ADD)
  / \
 /   \
/ \
record_count   AGGREGATE
of all COUNT(*)
datafiles  |
without  SCAN
deletesdatafiles
   without
   deletes

Which means Impala will yield $record_count * 2 as a result.

This patch fixes the FeIcebergTable.hasDeleteFiles() method, so it
also ignores dangling delete files. Therefore, the analyzer will just
substitute count(*) with NumericLiteral($record_count) if all deletes
are dangling, i.e. no need to involve the IcebergScanPlanner at all.

The patch also introduces a new query option,
"iceberg_disable_count_star_optimization", so users can completely
disable the statistic-based count(*)-optimization if necessary.

Testing:
 * e2e tests
 * planner tests

Change-Id: Ie3aca0b0a104f9ca4589cde9643f3f341d4ff99f
---
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables-hash-join.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-position-deletes-orc.test
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-position-deletes.test
11 files changed, 336 insertions(+), 433 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/90/21190/7
--
To view, visit http://gerrit.cloudera.org:8080/21190
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie3aca0b0a104f9ca4589cde9643f3f341d4ff99f
Gerrit-Change-Number: 21190
Gerrit-PatchSet: 7
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-12945: Fix Flaky Ticker Test

2024-03-28 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21214 )

Change subject: IMPALA-12945: Fix Flaky Ticker Test
..


Patch Set 1: Code-Review+2

Thanks for fixing this, LGTM!


--
To view, visit http://gerrit.cloudera.org:8080/21214
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibc8cf03ae68fb3103c5bbc438c32f6565b8c406c
Gerrit-Change-Number: 21214
Gerrit-PatchSet: 1
Gerrit-Owner: Jason Fehr 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 28 Mar 2024 09:08:31 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12942: deflake test virtual column file position generic

2024-03-27 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/21209 )

Change subject: IMPALA-12942: deflake test_virtual_column_file_position_generic
..

IMPALA-12942: deflake test_virtual_column_file_position_generic

Sometimes the runtime filters don't arrive in time in test
test_virtual_column_file_position_generic. This patch increases
RUNTIME_FILTER_WAIT_TIME_MS to 30 seconds.

Change-Id: I4d7a23389a2dcdd92602c2de22a2fc8f09aa618c
Reviewed-on: http://gerrit.cloudera.org:8080/21209
Tested-by: Impala Public Jenkins 
Reviewed-by: Daniel Becker 
---
M 
testdata/workloads/functional-query/queries/QueryTest/virtual-column-file-position-generic.test
1 file changed, 1 insertion(+), 0 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Daniel Becker: Looks good to me, approved

-- 
To view, visit http://gerrit.cloudera.org:8080/21209
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I4d7a23389a2dcdd92602c2de22a2fc8f09aa618c
Gerrit-Change-Number: 21209
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-12609: Implement SHOW METADATA TABLES IN statement to list Iceberg Metadata tables

2024-03-27 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21026 )

Change subject: IMPALA-12609: Implement SHOW METADATA TABLES IN statement to 
list Iceberg Metadata tables
..


Patch Set 10: Code-Review+2

(2 comments)

Small nits, otherwise LGTM!

http://gerrit.cloudera.org:8080/#/c/21026/10/fe/src/main/java/org/apache/impala/service/JniFrontend.java
File fe/src/main/java/org/apache/impala/service/JniFrontend.java:

http://gerrit.cloudera.org:8080/#/c/21026/10/fe/src/main/java/org/apache/impala/service/JniFrontend.java@314
PS10, Line 314: params.isSetSession() ?
  : new 
User(TSessionStateUtil.getEffectiveUser(params.getSession())) :
  : ImpalaInternalAdminUser.getInstance();
nit: Can we move this to a private method? There are 3 usages of this pattern 
(getTableNames, getMetadataTableNames, getDbs)


http://gerrit.cloudera.org:8080/#/c/21026/10/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java:

http://gerrit.cloudera.org:8080/#/c/21026/10/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java@4006
PS10, Line 4006: AnalyzesOk("show metadata tables in 
functional_parquet.iceberg_query_metadata");
We could have an AnalysisError test for a non-Iceberg table.



--
To view, visit http://gerrit.cloudera.org:8080/21026
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ide10ccf10fc0abf5c270119ba7092c67e712ec49
Gerrit-Change-Number: 21026
Gerrit-PatchSet: 10
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 27 Mar 2024 16:36:26 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12894: (part 2) Fix optimized count(*) for Iceberg tables with dangling delete files

2024-03-27 Thread Zoltan Borok-Nagy (Code Review)
Hello Daniel Becker, Gabor Kaszab, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21190

to look at the new patch set (#6).

Change subject: IMPALA-12894: (part 2) Fix optimized count(*) for Iceberg 
tables with dangling delete files
..

IMPALA-12894: (part 2) Fix optimized count(*) for Iceberg tables with dangling 
delete files

Impala can return incorrect results if a table has dangling delete
files. Dangling delete files are delete files that are part of the
snapshot but they are not applicable to any of the data files. We can
have such delete files after Spark's rewrite_data_files action.

During analysis we check the existence of delete files based on the
snapshot summary. If there are no delete files in the table, we just
replace the count(*) expression with NumericLiteral($record_count).
If there are delete files in the table (based on the summary), we set
optimize_count_star_for_iceberg_v2 in the query context.

Without optimize_count_star_for_iceberg_v2 in the query context, the
IcebergScanPlanner would create the following plan.

AGGREGATE
COUNT(*)
|
UNION ALL
   / \
  /   \
 / \
SCAN allANTI JOIN
datafiles  / \
without   /   \
deletes  SCAN SCAN
 datafilesdeletes

With optimize_count_star_for_iceberg_v2 the final plan looks like
the following:

  ArithmeticExpr(ADD)
  / \
 /   \
/ \
record_count   AGGREGATE
of all COUNT(*)
datafiles  |
withoutANTI JOIN
deletes   / \
 /   \
SCANSCAN
datafiles   deletes

The ArithmeticExpr(ADD) and its left child (record_count) is created
by the analyzer, IcebergScanPlanner is responsible in creating the
plan under AGGREGATE COUNT(*). And if it has delete files and
optimize_count_star_for_iceberg_v2 is true, it knows it can omit
the original UNION ALL and its left child.

However, IcebergScanPlanner checks delete file existence based on the
result of planFiles(), hence dangling delete files are eliminated.
And if there are no delete files, IcebergScanPlanner assumes that
case is already handled by the Analyzer (i.e. it replaced count(*)
with NumericLiteral($record_count)). So it will incorrectly create a
normal SCAN plan of the table under COUNT(*), i.e. we end up
with this:

  ArithmeticExpr(ADD)
  / \
 /   \
/ \
record_count   AGGREGATE
of all COUNT(*)
datafiles  |
without  SCAN
deletesdatafiles
   without
   deletes

Which means Impala will yield $record_count * 2 as a result.

This patch fixes the FeIcebergTable.hasDeleteFiles() method, so it
also ignores dangling delete files. Therefore, the analyzer will just
substitute count(*) with NumericLiteral($record_count) if all deletes
are dangling, i.e. no need to involve the IcebergScanPlanner at all.

The patch also introduces a new query option,
"iceberg_disable_count_star_optimization", so users can completely
disable the statistic-based count(*)-optimization if necessary.

Testing:
 * e2e tests
 * planner tests

Change-Id: Ie3aca0b0a104f9ca4589cde9643f3f341d4ff99f
---
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables-hash-join.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-position-deletes-orc.test
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-position-deletes.test
11 files changed, 336 insertions(+), 433 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/90/21190/6
--
To view, visit http://gerrit.cloudera.org:8080/21190
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie3aca0b0a104f9ca4589cde9643f3f341d4ff99f
Gerrit-Change-Number: 21190
Gerrit-PatchSet: 6
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-12600: Schema evolution with equality delete files

2024-03-27 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21210 )

Change subject: IMPALA-12600: Schema evolution with equality delete files
..


Patch Set 2: Code-Review+2

Thanks for adding these extra tests, LGTM!


--
To view, visit http://gerrit.cloudera.org:8080/21210
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I125f72bade5b79bad5aaa6b676d6afaf3ca98395
Gerrit-Change-Number: 21210
Gerrit-PatchSet: 2
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 27 Mar 2024 15:42:21 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12600: Schema evolution with equality delete files

2024-03-27 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21210 )

Change subject: IMPALA-12600: Schema evolution with equality delete files
..


Patch Set 1:

The change LGTM, but it would be also nice to see a planner test.


--
To view, visit http://gerrit.cloudera.org:8080/21210
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I125f72bade5b79bad5aaa6b676d6afaf3ca98395
Gerrit-Change-Number: 21210
Gerrit-PatchSet: 1
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 27 Mar 2024 13:34:58 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12942: deflake test virtual column file position generic

2024-03-27 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/21209


Change subject: IMPALA-12942: deflake test_virtual_column_file_position_generic
..

IMPALA-12942: deflake test_virtual_column_file_position_generic

Sometimes the runtime filters don't arrive in time in test
test_virtual_column_file_position_generic. This patch increases
RUNTIME_FILTER_WAIT_TIME_MS to 30 seconds.

Change-Id: I4d7a23389a2dcdd92602c2de22a2fc8f09aa618c
---
M 
testdata/workloads/functional-query/queries/QueryTest/virtual-column-file-position-generic.test
1 file changed, 1 insertion(+), 0 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/09/21209/1
--
To view, visit http://gerrit.cloudera.org:8080/21209
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I4d7a23389a2dcdd92602c2de22a2fc8f09aa618c
Gerrit-Change-Number: 21209
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-12894: (part 2) Fix optimized count(*) for Iceberg tables with dangling delete files

2024-03-27 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21190 )

Change subject: IMPALA-12894: (part 2) Fix optimized count(*) for Iceberg 
tables with dangling delete files
..


Patch Set 5:

(2 comments)

Thanks for the comments

http://gerrit.cloudera.org:8080/#/c/21190/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/21190/3//COMMIT_MSG@10
PS3, Line 10: files. Dangling delete files are delete files that are part of the
> Could you describe the cause of the bug in more detail?
Added a few extra sentences


http://gerrit.cloudera.org:8080/#/c/21190/3/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
File fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java:

http://gerrit.cloudera.org:8080/#/c/21190/3/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java@983
PS3, Line 983: don
> Nit: superfluous "use".
Done



--
To view, visit http://gerrit.cloudera.org:8080/21190
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie3aca0b0a104f9ca4589cde9643f3f341d4ff99f
Gerrit-Change-Number: 21190
Gerrit-PatchSet: 5
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 27 Mar 2024 09:32:02 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12894: (part 2) Fix optimized count(*) for Iceberg tables with dangling delete files

2024-03-27 Thread Zoltan Borok-Nagy (Code Review)
Hello Daniel Becker, Gabor Kaszab, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21190

to look at the new patch set (#4).

Change subject: IMPALA-12894: (part 2) Fix optimized count(*) for Iceberg 
tables with dangling delete files
..

IMPALA-12894: (part 2) Fix optimized count(*) for Iceberg tables with dangling 
delete files

Impala can return incorrect results if a table has dangling delete
files. Dangling delete files are delete files that are part of the
snapshot but they are not applicable to any of the data files. We can
have such delete files after Spark's rewrite_data_files action.

During analysis we check the existence of delete files based on the
snapshot summary. But during planning in IcebergScanPlanner we do it
based on planFiles(), i.e. dangling delete files don't count in the
latter case. Because of this Impala can create incorrectplans for
count(*) optimization.

This patch fixes the FeIcebergTable.hasDeleteFiles() method, so it
ignores dangling delete files. It also introduces a new query option,
"iceberg_disable_count_star_optimization", so users can completely
disable the statistic-based count(*)-optimization if necessary.

Testing:
 * e2e tests
 * planner tests

Change-Id: Ie3aca0b0a104f9ca4589cde9643f3f341d4ff99f
---
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables-hash-join.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-position-deletes-orc.test
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-position-deletes.test
11 files changed, 336 insertions(+), 433 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/90/21190/4
--
To view, visit http://gerrit.cloudera.org:8080/21190
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie3aca0b0a104f9ca4589cde9643f3f341d4ff99f
Gerrit-Change-Number: 21190
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-12894: (part 2) Fix optimized count(*) for Iceberg tables with dangling delete files

2024-03-27 Thread Zoltan Borok-Nagy (Code Review)
Hello Daniel Becker, Gabor Kaszab, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21190

to look at the new patch set (#5).

Change subject: IMPALA-12894: (part 2) Fix optimized count(*) for Iceberg 
tables with dangling delete files
..

IMPALA-12894: (part 2) Fix optimized count(*) for Iceberg tables with dangling 
delete files

Impala can return incorrect results if a table has dangling delete
files. Dangling delete files are delete files that are part of the
snapshot but they are not applicable to any of the data files. We can
have such delete files after Spark's rewrite_data_files action.

During analysis we check the existence of delete files based on the
snapshot summary. But during planning in IcebergScanPlanner we do it
based on planFiles(), i.e. dangling delete files don't count in the
latter case. Because of this Impala can create incorrect plans for
count(*) optimization.

This patch fixes the FeIcebergTable.hasDeleteFiles() method, so it
ignores dangling delete files. It also introduces a new query option,
"iceberg_disable_count_star_optimization", so users can completely
disable the statistic-based count(*)-optimization if necessary.

Testing:
 * e2e tests
 * planner tests

Change-Id: Ie3aca0b0a104f9ca4589cde9643f3f341d4ff99f
---
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables-hash-join.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-position-deletes-orc.test
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-position-deletes.test
11 files changed, 336 insertions(+), 433 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/90/21190/5
--
To view, visit http://gerrit.cloudera.org:8080/21190
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie3aca0b0a104f9ca4589cde9643f3f341d4ff99f
Gerrit-Change-Number: 21190
Gerrit-PatchSet: 5
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-12879: Conjunct not referring to table field causes ERROR for Iceberg table

2024-03-27 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/21179 )

Change subject: IMPALA-12879: Conjunct not referring to table field causes 
ERROR for Iceberg table
..

IMPALA-12879: Conjunct not referring to table field causes ERROR for Iceberg 
table

The following query throws an error for Iceberg tables:

 select * from ice_tbl where rand() < 0.001;

It's because the predicate 'rand() < 0.001' doesn't involve any table
columns. Because of a bug in
IcebergScanPlanner.hasPartitionTransformType() the method throws an
IndexOutOfBoundsException. This patch fixes the method to handle
such predicates.

Testing:
 * added e2e tests

Change-Id: Id43a6798df3f4cc3a0e00ac610e25aa3b5781342
Reviewed-on: http://gerrit.cloudera.org:8080/21179
Tested-by: Impala Public Jenkins 
Reviewed-by: Gabor Kaszab 
---
M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
M testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test
2 files changed, 94 insertions(+), 2 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Gabor Kaszab: Looks good to me, approved

--
To view, visit http://gerrit.cloudera.org:8080/21179
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Id43a6798df3f4cc3a0e00ac610e25aa3b5781342
Gerrit-Change-Number: 21179
Gerrit-PatchSet: 6
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-12879: Conjunct not referring to table field causes ERROR for Iceberg table

2024-03-25 Thread Zoltan Borok-Nagy (Code Review)
Hello Daniel Becker, Gabor Kaszab, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21179

to look at the new patch set (#5).

Change subject: IMPALA-12879: Conjunct not referring to table field causes 
ERROR for Iceberg table
..

IMPALA-12879: Conjunct not referring to table field causes ERROR for Iceberg 
table

The following query throws an error for Iceberg tables:

 select * from ice_tbl where rand() < 0.001;

It's because the predicate 'rand() < 0.001' doesn't involve any table
columns. Because of a bug in
IcebergScanPlanner.hasPartitionTransformType() the method throws an
IndexOutOfBoundsException. This patch fixes the method to handle
such predicates.

Testing:
 * added e2e tests

Change-Id: Id43a6798df3f4cc3a0e00ac610e25aa3b5781342
---
M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
M testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test
2 files changed, 94 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/79/21179/5
--
To view, visit http://gerrit.cloudera.org:8080/21179
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id43a6798df3f4cc3a0e00ac610e25aa3b5781342
Gerrit-Change-Number: 21179
Gerrit-PatchSet: 5
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-12894: Optimized count(*) for Iceberg gives wrong results after a Spark rewrite data files

2024-03-25 Thread Zoltan Borok-Nagy (Code Review)
Hello Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21190

to look at the new patch set (#3).

Change subject: IMPALA-12894: Optimized count(*) for Iceberg gives wrong 
results after a Spark rewrite_data_files
..

IMPALA-12894: Optimized count(*) for Iceberg gives wrong results after a Spark 
rewrite_data_files

Impala can return incorrect results if a table has dangling delete
files. During analysis we check the existence of delete files
based on the snapshot summary. But during planning in IcebergScanPlanner
we do it based on planFiles(), i.e. dangling delete files don't count
in the latter case. Because of this Impala can create incorrect
plans for count(*) optimization.

This patch fixes the FeIcebergTable.hasDeleteFiles() method, so it
ignores dangling delete files. It also introduces a new query option,
"iceberg_disable_count_star_optimization", so users can completely
disable the statistic-based count(*)-optimization if necessary.

Testing:
 * e2e tests
 * planner tests

Change-Id: Ie3aca0b0a104f9ca4589cde9643f3f341d4ff99f
---
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables-hash-join.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-position-deletes-orc.test
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-position-deletes.test
11 files changed, 336 insertions(+), 433 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/90/21190/3
--
To view, visit http://gerrit.cloudera.org:8080/21190
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie3aca0b0a104f9ca4589cde9643f3f341d4ff99f
Gerrit-Change-Number: 21190
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-12903: Querying virtual column FILE POSITION for TEXT and JSON tables crashes Impala

2024-03-25 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21148 )

Change subject: IMPALA-12903: Querying virtual column FILE__POSITION for TEXT 
and JSON tables crashes Impala
..


Patch Set 6: Code-Review+2

Carry +2


--
To view, visit http://gerrit.cloudera.org:8080/21148
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8e1af8d526f9046aceddb5944da9e6f9c63768b0
Gerrit-Change-Number: 21148
Gerrit-PatchSet: 6
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 25 Mar 2024 13:55:08 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12903: Querying virtual column FILE POSITION for TEXT and JSON tables crashes Impala

2024-03-25 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/21148 )

Change subject: IMPALA-12903: Querying virtual column FILE__POSITION for TEXT 
and JSON tables crashes Impala
..

IMPALA-12903: Querying virtual column FILE__POSITION for TEXT and JSON tables 
crashes Impala

Impala generates segmentation fault when it queries the virtual column
FILE__POSITION for TEXT or JSON tables. When the scanners that do not
support the FILE__POSITION virtual column detect its presence they
try to report an error and close themselves. The segfault is in the
scanners' Close() method when they try to dereference a NULL stream
object.

This patch simply adds NULL-checks in Close().

Alternatively we could detect the presence of FILE__POSITION during
planning in the HdfsScanNode, but doing it in the scanners lets us
handle more queries, e.g. queries that dynamically prune partitions
and the surviving partitions all have file formats that support
FILE__POSITION.

Testing:
 * added negative tests to properly report the errors
 * added tests for mixed file format tables

Change-Id: I8e1af8d526f9046aceddb5944da9e6f9c63768b0
Reviewed-on: http://gerrit.cloudera.org:8080/21148
Tested-by: Impala Public Jenkins 
Reviewed-by: Zoltan Borok-Nagy 
---
M be/src/exec/json/hdfs-json-scanner.cc
M be/src/exec/text/hdfs-text-scanner.cc
M 
testdata/workloads/functional-query/queries/QueryTest/virtual-column-file-position-generic.test
A 
testdata/workloads/functional-query/queries/QueryTest/virtual-column-file-position-negative.test
M tests/query_test/test_scanners.py
5 files changed, 94 insertions(+), 3 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Zoltan Borok-Nagy: Looks good to me, approved

--
To view, visit http://gerrit.cloudera.org:8080/21148
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I8e1af8d526f9046aceddb5944da9e6f9c63768b0
Gerrit-Change-Number: 21148
Gerrit-PatchSet: 7
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-12915: Use libgtest.so when built with shared libs

2024-03-25 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21163 )

Change subject: IMPALA-12915: Use libgtest.so when built with shared libs
..


Patch Set 3: Code-Review+2

Thanks for fixing this!


-- 
To view, visit http://gerrit.cloudera.org:8080/21163
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I27d21217db219f52b072a4e5cfa1caaace35d1a2
Gerrit-Change-Number: 21163
Gerrit-PatchSet: 3
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 25 Mar 2024 09:49:45 +
Gerrit-HasComments: No


[Impala-ASF-CR] PRELIMINIARY COUNT(*)

2024-03-22 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has abandoned this change. ( 
http://gerrit.cloudera.org:8080/21189 )

Change subject: PRELIMINIARY COUNT(*)
..


Abandoned
--
To view, visit http://gerrit.cloudera.org:8080/21189
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: abandon
Gerrit-Change-Id: I13a7cbb926d4ca56bc17690d61652fb837ebd672
Gerrit-Change-Number: 21189
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-12894: Optimized count(*) for Iceberg gives wrong results after a Spark rewrite data files

2024-03-22 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has uploaded a new patch set (#2). ( 
http://gerrit.cloudera.org:8080/21190 )

Change subject: IMPALA-12894: Optimized count(*) for Iceberg gives wrong 
results after a Spark rewrite_data_files
..

IMPALA-12894: Optimized count(*) for Iceberg gives wrong results after a Spark 
rewrite_data_files

Impala can return incorrect results if a table has dangling delete
files. During analysis we check the existence of delete files
based on the snapshot summary. But during planning in IcebergScanPlanner
we do it based on planFiles(), i.e. dangling delete files don't count
in the latter case. Because of this Impala can create incorrect
plans for count(*) optimization.

This patch fixes the FeIcebergTable.hasDeleteFiles() method, so it
ignores dangling delete files.

TODO:
 * introduce query option so we can completely disable the count(*) optimization

Testing:
 * e2e tests
 * planner tests

Change-Id: Ie3aca0b0a104f9ca4589cde9643f3f341d4ff99f
---
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables-hash-join.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-position-deletes-orc.test
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-position-deletes.test
7 files changed, 307 insertions(+), 431 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/90/21190/2
--
To view, visit http://gerrit.cloudera.org:8080/21190
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie3aca0b0a104f9ca4589cde9643f3f341d4ff99f
Gerrit-Change-Number: 21190
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-12894: Optimized count(*) for Iceberg gives wrong results after a Spark rewrite data files

2024-03-22 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/21190


Change subject: IMPALA-12894: Optimized count(*) for Iceberg gives wrong 
results after a Spark rewrite_data_files
..

IMPALA-12894: Optimized count(*) for Iceberg gives wrong results after a Spark 
rewrite_data_files

Impala can return incorrect results if a table has dangling delete
files. During analysis we check the existence of delete files
based on the snapshot summary. But during planning in IcebergScanPlanner
we do it based on planFiles(), i.e. dangling delete files don't count
in the latter case. Because of this Impala can create incorrect
plans for count(*) optimization.

This patch fixes the FeIcebergTable.hasDeleteFiles() method, so it
ignores dangling delete files.

TODO:
 * introduce query option so we can completely disable the count(*) optimization

Testing:
 * e2e tests
 * planner tests

Change-Id: Ie3aca0b0a104f9ca4589cde9643f3f341d4ff99f
---
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables-hash-join.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-position-deletes-orc.test
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-position-deletes.test
7 files changed, 307 insertions(+), 430 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/90/21190/1
--
To view, visit http://gerrit.cloudera.org:8080/21190
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ie3aca0b0a104f9ca4589cde9643f3f341d4ff99f
Gerrit-Change-Number: 21190
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy 


[Impala-ASF-CR] PRELIMINIARY COUNT(*)

2024-03-22 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/21189


Change subject: PRELIMINIARY COUNT(*)
..

PRELIMINIARY COUNT(*)

Change-Id: I13a7cbb926d4ca56bc17690d61652fb837ebd672
---
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
1 file changed, 1 insertion(+), 2 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/89/21189/1
--
To view, visit http://gerrit.cloudera.org:8080/21189
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I13a7cbb926d4ca56bc17690d61652fb837ebd672
Gerrit-Change-Number: 21189
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-12903: Querying virtual column FILE POSITION for TEXT and JSON tables crashes Impala

2024-03-22 Thread Zoltan Borok-Nagy (Code Review)
Hello Quanlong Huang, Daniel Becker, Riza Suminto, Gabor Kaszab, Zihao Ye, 
Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21148

to look at the new patch set (#6).

Change subject: IMPALA-12903: Querying virtual column FILE__POSITION for TEXT 
and JSON tables crashes Impala
..

IMPALA-12903: Querying virtual column FILE__POSITION for TEXT and JSON tables 
crashes Impala

Impala generates segmentation fault when it queries the virtual column
FILE__POSITION for TEXT or JSON tables. When the scanners that do not
support the FILE__POSITION virtual column detect its presence they
try to report an error and close themselves. The segfault is in the
scanners' Close() method when they try to dereference a NULL stream
object.

This patch simply adds NULL-checks in Close().

Alternatively we could detect the presence of FILE__POSITION during
planning in the HdfsScanNode, but doing it in the scanners lets us
handle more queries, e.g. queries that dynamically prune partitions
and the surviving partitions all have file formats that support
FILE__POSITION.

Testing:
 * added negative tests to properly report the errors
 * added tests for mixed file format tables

Change-Id: I8e1af8d526f9046aceddb5944da9e6f9c63768b0
---
M be/src/exec/json/hdfs-json-scanner.cc
M be/src/exec/text/hdfs-text-scanner.cc
M 
testdata/workloads/functional-query/queries/QueryTest/virtual-column-file-position-generic.test
A 
testdata/workloads/functional-query/queries/QueryTest/virtual-column-file-position-negative.test
M tests/query_test/test_scanners.py
5 files changed, 94 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/48/21148/6
--
To view, visit http://gerrit.cloudera.org:8080/21148
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8e1af8d526f9046aceddb5944da9e6f9c63768b0
Gerrit-Change-Number: 21148
Gerrit-PatchSet: 6
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-12879: Conjunct not referring to table field causes ERROR for Iceberg table

2024-03-22 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21179 )

Change subject: IMPALA-12879: Conjunct not referring to table field causes 
ERROR for Iceberg table
..


Patch Set 4:

Instead of using rand() I switched to rand(SEED) as the seed-generation seems 
to be system-specific.


--
To view, visit http://gerrit.cloudera.org:8080/21179
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id43a6798df3f4cc3a0e00ac610e25aa3b5781342
Gerrit-Change-Number: 21179
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 22 Mar 2024 16:10:44 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12879: Conjunct not referring to table field causes ERROR for Iceberg table

2024-03-22 Thread Zoltan Borok-Nagy (Code Review)
Hello Daniel Becker, Gabor Kaszab, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21179

to look at the new patch set (#4).

Change subject: IMPALA-12879: Conjunct not referring to table field causes 
ERROR for Iceberg table
..

IMPALA-12879: Conjunct not referring to table field causes ERROR for Iceberg 
table

The following query throws an error for Iceberg tables:

 select * from ice_tbl where rand() < 0.001;

It's because the predicate 'rand() < 0.001' doesn't involve any table
columns. Because of a bug in
IcebergScanPlanner.hasPartitionTransformType() the method throws an
IndexOutOfBoundsException. This patch fixes the method to handle
such predicates.

Testing:
 * added e2e tests

Change-Id: Id43a6798df3f4cc3a0e00ac610e25aa3b5781342
---
M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
M testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test
2 files changed, 106 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/79/21179/4
--
To view, visit http://gerrit.cloudera.org:8080/21179
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id43a6798df3f4cc3a0e00ac610e25aa3b5781342
Gerrit-Change-Number: 21179
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-12809: Iceberg metadata table scanner should always be scheduled to the coordinator

2024-03-22 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21138 )

Change subject: IMPALA-12809: Iceberg metadata table scanner should always be 
scheduled to the coordinator
..


Patch Set 3:

(1 comment)

Just quickly went over the code. Looks good overall, but could you please add 
planner tests?

http://gerrit.cloudera.org:8080/#/c/21138/3/fe/src/main/java/org/apache/impala/planner/PlanFragment.java
File fe/src/main/java/org/apache/impala/planner/PlanFragment.java:

http://gerrit.cloudera.org:8080/#/c/21138/3/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@192
PS3, Line 192: Preconditions.checkState(!coordinatorOnly ||
 : dataPartition_.equals(DataPartition.UNPARTITIONED));
Could you please add a comment for this?



--
To view, visit http://gerrit.cloudera.org:8080/21138
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib4397f64e9def42d2b84ffd7bc14ff31df27d58e
Gerrit-Change-Number: 21138
Gerrit-PatchSet: 3
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 22 Mar 2024 10:54:47 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12898: Tidy up test dimensions of test scanner.py

2024-03-22 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21162 )

Change subject: IMPALA-12898: Tidy up test dimensions of test_scanner.py
..


Patch Set 3: Code-Review+2

Thanks for applying the changes! LGTM!


--
To view, visit http://gerrit.cloudera.org:8080/21162
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5efd2b483338fb55b958d8e1a0acf6b365f8093e
Gerrit-Change-Number: 21162
Gerrit-PatchSet: 3
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 22 Mar 2024 10:35:43 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12879: Conjunct not referring to table field causes ERROR for Iceberg table

2024-03-21 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21179 )

Change subject: IMPALA-12879: Conjunct not referring to table field causes 
ERROR for Iceberg table
..


Patch Set 2:

(2 comments)

Thanks for the comments!

http://gerrit.cloudera.org:8080/#/c/21179/1/testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test
File testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test:

http://gerrit.cloudera.org:8080/#/c/21179/1/testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test@1192
PS1, Line 1192: select * from iceberg_avro_format where rand() < 0.5;
> Isn't this flaky because of the rand()? Or is it not that random? :)
It is not that random :)

It can take a seed, so the following would be truly random: 
rand(unix_timestamp()) < 0.5


http://gerrit.cloudera.org:8080/#/c/21179/1/testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test@1271
PS1, Line 1271: 1460
> Would it make sense to involve some time travel too?
Sure, done.



--
To view, visit http://gerrit.cloudera.org:8080/21179
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id43a6798df3f4cc3a0e00ac610e25aa3b5781342
Gerrit-Change-Number: 21179
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 21 Mar 2024 17:36:59 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12879: Conjunct not referring to table field causes ERROR for Iceberg table

2024-03-21 Thread Zoltan Borok-Nagy (Code Review)
Hello Daniel Becker, Gabor Kaszab, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21179

to look at the new patch set (#2).

Change subject: IMPALA-12879: Conjunct not referring to table field causes 
ERROR for Iceberg table
..

IMPALA-12879: Conjunct not referring to table field causes ERROR for Iceberg 
table

The following query throws an error for Iceberg tables:

 select * from ice_tbl where rand() < 0.001;

It's because the predicate 'rand() < 0.001' doesn't involve any table
columns. Because of a bug in
IcebergScanPlanner.hasPartitionTransformType() the method throws an
IndexOutOfBoundsException. This patch fixes the method to handle
such predicates.

Testing:
 * added e2e tests

Change-Id: Id43a6798df3f4cc3a0e00ac610e25aa3b5781342
---
M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
M testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test
2 files changed, 106 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/79/21179/2
--
To view, visit http://gerrit.cloudera.org:8080/21179
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id43a6798df3f4cc3a0e00ac610e25aa3b5781342
Gerrit-Change-Number: 21179
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-12898: Tidy up test dimensions of test scanner.py

2024-03-21 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21162 )

Change subject: IMPALA-12898: Tidy up test dimensions of test_scanner.py
..


Patch Set 1:

(17 comments)

Thanks for working on this! Mostly found style issues

http://gerrit.cloudera.org:8080/#/c/21162/1/tests/query_test/test_scanners.py
File tests/query_test/test_scanners.py:

http://gerrit.cloudera.org:8080/#/c/21162/1/tests/query_test/test_scanners.py@74
PS1, Line 74: return [0, 1]
Earlier [0, 1, 4] was used in core. Do you think it's not a problem to decrease 
the values in that dimension?


http://gerrit.cloudera.org:8080/#/c/21162/1/tests/query_test/test_scanners.py@81
PS1, Line 81: return [0, 1]
Same as above, earlier it was [0, 1, 16] in core.


http://gerrit.cloudera.org:8080/#/c/21162/1/tests/query_test/test_scanners.py@100
PS1, Line 100:
nit: 4 spaces are needed


http://gerrit.cloudera.org:8080/#/c/21162/1/tests/query_test/test_scanners.py@157
PS1, Line 157:
nit: 4 spaces are needed


http://gerrit.cloudera.org:8080/#/c/21162/1/tests/query_test/test_scanners.py@201
PS1, Line 201:
nit: indentation is off


http://gerrit.cloudera.org:8080/#/c/21162/1/tests/query_test/test_scanners.py@203
PS1, Line 203:
nit: indentation


http://gerrit.cloudera.org:8080/#/c/21162/1/tests/query_test/test_scanners.py@259
PS1, Line 259:
nit: indentation


http://gerrit.cloudera.org:8080/#/c/21162/1/tests/query_test/test_scanners.py@353
PS1, Line 353:
nit: indentation


http://gerrit.cloudera.org:8080/#/c/21162/1/tests/query_test/test_scanners.py@418
PS1, Line 418: and
nit: The original lines were more aligned, so I'm not sure if that formatting 
change is beneficial


http://gerrit.cloudera.org:8080/#/c/21162/1/tests/query_test/test_scanners.py@633
PS1, Line 633:
nit: needs +4 indent instead of +2


http://gerrit.cloudera.org:8080/#/c/21162/1/tests/query_test/test_scanners.py@1023
PS1, Line 1023:
nit: 4 spaces are needed. Same for L1025


http://gerrit.cloudera.org:8080/#/c/21162/1/tests/query_test/test_scanners.py@1443
PS1, Line 1443:
nit: indentation


http://gerrit.cloudera.org:8080/#/c/21162/1/tests/query_test/test_scanners.py@1474
PS1, Line 1474:
nit: indentation


http://gerrit.cloudera.org:8080/#/c/21162/1/tests/query_test/test_scanners.py@1508
PS1, Line 1508: +
nit: originally the lines were more aligned, so I'm not sure about this change 
in formatting. Is it enforced by PEP8?


http://gerrit.cloudera.org:8080/#/c/21162/1/tests/query_test/test_scanners.py@1595
PS1, Line 1595:
nit: indentation


http://gerrit.cloudera.org:8080/#/c/21162/1/tests/query_test/test_scanners.py@1636
PS1, Line 1636:  
nit: indentation


http://gerrit.cloudera.org:8080/#/c/21162/1/tests/query_test/test_scanners.py@1994
PS1, Line 1994:
nit: indentation



--
To view, visit http://gerrit.cloudera.org:8080/21162
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5efd2b483338fb55b958d8e1a0acf6b365f8093e
Gerrit-Change-Number: 21162
Gerrit-PatchSet: 1
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 21 Mar 2024 17:21:57 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12903: Querying virtual column FILE POSITION for TEXT and JSON tables crashes Impala

2024-03-21 Thread Zoltan Borok-Nagy (Code Review)
Hello Quanlong Huang, Daniel Becker, Riza Suminto, Gabor Kaszab, Zihao Ye, 
Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21148

to look at the new patch set (#4).

Change subject: IMPALA-12903: Querying virtual column FILE__POSITION for TEXT 
and JSON tables crashes Impala
..

IMPALA-12903: Querying virtual column FILE__POSITION for TEXT and JSON tables 
crashes Impala

Impala generates segmentation fault when it queries the virtual column
FILE__POSITION for TEXT or JSON tables. When the scanners that do not
support the FILE__POSITION virtual column detect its presence they
try to report an error and close themselves. The segfault is in the
scanners' Close() method when they try to dereference a NULL stream
object.

This patch simply adds NULL-checks in Close().

Alternatively we could detect the presence of FILE__POSITION during
planning in the HdfsScanNode, but doing it in the scanners lets us
handle more queries, e.g. queries that dynamically prune partitions
and the surviving partitions all have file formats that support
FILE__POSITION.

Testing:
 * added negative tests to properly report the errors
 * added tests for mixed file format tables

Change-Id: I8e1af8d526f9046aceddb5944da9e6f9c63768b0
---
M be/src/exec/json/hdfs-json-scanner.cc
M be/src/exec/text/hdfs-text-scanner.cc
M 
testdata/workloads/functional-query/queries/QueryTest/virtual-column-file-position-generic.test
A 
testdata/workloads/functional-query/queries/QueryTest/virtual-column-file-position-negative.test
M tests/query_test/test_scanners.py
5 files changed, 92 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/48/21148/4
--
To view, visit http://gerrit.cloudera.org:8080/21148
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8e1af8d526f9046aceddb5944da9e6f9c63768b0
Gerrit-Change-Number: 21148
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-12879: Conjunct not referring to table field causes ERROR for Iceberg table

2024-03-21 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/21179


Change subject: IMPALA-12879: Conjunct not referring to table field causes 
ERROR for Iceberg table
..

IMPALA-12879: Conjunct not referring to table field causes ERROR for Iceberg 
table

The following query throws an error for Iceberg tables:

 select * from ice_tbl where rand() < 0.001;

It's because the predicate 'rand() < 0.001' doesn't involve any table
columns. Because of a bug in
IcebergScanPlanner.hasPartitionTransformType() the method throws an
IndexOutOfBoundsException. This patch fixes the method to handle
such predicates.

Testing:
 * added e2e tests

Change-Id: Id43a6798df3f4cc3a0e00ac610e25aa3b5781342
---
M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
M testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test
2 files changed, 85 insertions(+), 2 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/79/21179/1
--
To view, visit http://gerrit.cloudera.org:8080/21179
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Id43a6798df3f4cc3a0e00ac610e25aa3b5781342
Gerrit-Change-Number: 21179
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-12443: Add catalog timeline for all DDL profiles

2024-03-21 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20491 )

Change subject: IMPALA-12443: Add catalog timeline for all DDL profiles
..


Patch Set 15: Code-Review+2

Nice feature! LGTM!


--
To view, visit http://gerrit.cloudera.org:8080/20491
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifbceefaeb24c66eb1a064c449d6f56077ea347c5
Gerrit-Change-Number: 20491
Gerrit-PatchSet: 15
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 21 Mar 2024 10:35:24 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12893: (part 1) Specify 'format-version' explicitly in Iceberg tests

2024-03-20 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/21167 )

Change subject: IMPALA-12893: (part 1) Specify 'format-version' explicitly in 
Iceberg tests
..

IMPALA-12893: (part 1) Specify 'format-version' explicitly in Iceberg tests

This CR is the first step to upgrade to Iceberg 1.4.3. The biggest
change in behavior in Iceberg 1.4.3 is that Iceberg V2 tables are
the default. Because of this we update some test files to
explicitly create V1/V2 tables. We also introduce test files that
create Iceberg tables without explicitly specifying the format
version, these tests have the name *-default.test. The latter tests
will need to be updated when we actually upgrade to Iceberg 1.4.3.

Change-Id: Ieb4f6c1b206d1d4fd878f07ea5f1436dcae560cd
Reviewed-on: http://gerrit.cloudera.org:8080/21167
Tested-by: Impala Public Jenkins 
Reviewed-by: Andrew Sherman 
---
R 
testdata/workloads/functional-query/queries/QueryTest/iceberg-alter-default.test
C testdata/workloads/functional-query/queries/QueryTest/iceberg-alter-v1.test
C testdata/workloads/functional-query/queries/QueryTest/iceberg-alter-v2.test
R 
testdata/workloads/functional-query/queries/QueryTest/iceberg-partitioned-insert-default.test
A 
testdata/workloads/functional-query/queries/QueryTest/iceberg-partitioned-insert-v1.test
A 
testdata/workloads/functional-query/queries/QueryTest/iceberg-partitioned-insert-v2.test
M tests/query_test/test_iceberg.py
7 files changed, 1,672 insertions(+), 120 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Andrew Sherman: Looks good to me, approved

--
To view, visit http://gerrit.cloudera.org:8080/21167
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ieb4f6c1b206d1d4fd878f07ea5f1436dcae560cd
Gerrit-Change-Number: 21167
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-12893: (part 1) Specify 'format-version' explicitly in Iceberg tests

2024-03-19 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/21167


Change subject: IMPALA-12893: (part 1) Specify 'format-version' explicitly in 
Iceberg tests
..

IMPALA-12893: (part 1) Specify 'format-version' explicitly in Iceberg tests

This CR is the first step to upgrade to Iceberg 1.4.3. The biggest
change in behavior in Iceberg 1.4.3 is that Iceberg V2 tables are
the default. Because of this we update some test files to
explicitly create V1/V2 tables. We also introduce test files that
create Iceberg tables without explicitly specifying the format
version, these tests have the name *-default.test. The latter tests
will need to be updated when we actually upgrade to Iceberg 1.4.3.

Change-Id: Ieb4f6c1b206d1d4fd878f07ea5f1436dcae560cd
---
R 
testdata/workloads/functional-query/queries/QueryTest/iceberg-alter-default.test
C testdata/workloads/functional-query/queries/QueryTest/iceberg-alter-v1.test
C testdata/workloads/functional-query/queries/QueryTest/iceberg-alter-v2.test
R 
testdata/workloads/functional-query/queries/QueryTest/iceberg-partitioned-insert-default.test
A 
testdata/workloads/functional-query/queries/QueryTest/iceberg-partitioned-insert-v1.test
A 
testdata/workloads/functional-query/queries/QueryTest/iceberg-partitioned-insert-v2.test
M tests/query_test/test_iceberg.py
7 files changed, 1,672 insertions(+), 120 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/67/21167/1
--
To view, visit http://gerrit.cloudera.org:8080/21167
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ieb4f6c1b206d1d4fd878f07ea5f1436dcae560cd
Gerrit-Change-Number: 21167
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-12904: test type conversions hive3 silently passes because of wrongly defined test dimensions

2024-03-18 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21151 )

Change subject: IMPALA-12904: test_type_conversions_hive3 silently passes 
because of wrongly defined test dimensions
..


Patch Set 4:

(3 comments)

Thanks for the comments!

http://gerrit.cloudera.org:8080/#/c/21151/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/21151/1//COMMIT_MSG@21
PS1, Line 21: f3f3b1427b20a1d2d28
> Maybe one day we want to add this back for IMPALA-12349. It's ok to remove
Thanks for pointing me to IMPALA-12349. If we have such plans then I think we 
shouldn't remove test_type_conversions_hive2 because we might forget to re-add 
it later.

I didn't fix the column names in test_type_conversions_hive2 because I cannot 
test it, but at least I've left a hint for the future contributor of 
IMPALA-12349.


http://gerrit.cloudera.org:8080/#/c/21151/2/tests/query_test/test_scanners.py
File tests/query_test/test_scanners.py:

http://gerrit.cloudera.org:8080/#/c/21151/2/tests/query_test/test_scanners.py@1717
PS2, Line 1717:   # TODO(IMPALA-12349): Rename the columns to use the correct 
names (see
> line has trailing whitespace
Done


http://gerrit.cloudera.org:8080/#/c/21151/2/tests/query_test/test_scanners.py@1717
PS2, Line 1717:
> flake8: W291 trailing whitespace
Done



--
To view, visit http://gerrit.cloudera.org:8080/21151
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I786a5eaae9243b4728484f3f3b1427b20a1d2d28
Gerrit-Change-Number: 21151
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 18 Mar 2024 16:18:20 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12904: test type conversions hive3 silently passes because of wrongly defined test dimensions

2024-03-18 Thread Zoltan Borok-Nagy (Code Review)
Hello Quanlong Huang, Riza Suminto, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21151

to look at the new patch set (#4).

Change subject: IMPALA-12904: test_type_conversions_hive3 silently passes 
because of wrongly defined test dimensions
..

IMPALA-12904: test_type_conversions_hive3 silently passes because of wrongly 
defined test dimensions

test_type_conversions_hive3 silently passes because we are not creating
the test dimenstion for query option orc_shema_resolution correctly. If
we set orc_shema_resolution correctly, i.e. to also exercise the
name-based schema resolution, the test fails. The cause of the failure
is that the ill-typed tables have dummy column names like 'c1', 'c2',
etc. These are completely fine for position-based schema resolution,
but it is not OK for name-based schema resolution.

The test just wants to check error messages related to type errors,
the column names are irrelevant, so we can just use the correct
names.

Change-Id: I786a5eaae9243b4728484f3f3b1427b20a1d2d28
---
M 
testdata/workloads/functional-query/queries/DataErrorsTest/orc-type-checks.test
M tests/query_test/test_scanners.py
2 files changed, 44 insertions(+), 36 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/51/21151/4
--
To view, visit http://gerrit.cloudera.org:8080/21151
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I786a5eaae9243b4728484f3f3b1427b20a1d2d28
Gerrit-Change-Number: 21151
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 


[Impala-ASF-CR] IMPALA-12904: test type conversions hive3 silently passes because of wrongly defined test dimensions

2024-03-18 Thread Zoltan Borok-Nagy (Code Review)
Hello Quanlong Huang, Riza Suminto, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21151

to look at the new patch set (#3).

Change subject: IMPALA-12904: test_type_conversions_hive3 silently passes 
because of wrongly defined test dimensions
..

IMPALA-12904: test_type_conversions_hive3 silently passes because of wrongly 
defined test dimensions

test_type_conversions_hive3 silently passes because we are not creating
the test dimenstion for query option orc_shema_resolution correctly. If
we set orc_shema_resolution correctly, i.e. to also exercise the
name-based schema resolution, the test fails. The cause of the failure
is that the ill-typed tables have dummy column names like 'c1', 'c2',
etc. These are completely fine for position-based schema resolution,
but it is not OK for name-based schema resolution.

The test just wants to check error messages related to type errors,
the column names are irrelevant, so we can just use the correct
names.

The test was copied from the old test_type_conversions_hive2 which is
not relevant anymore, so this CR also removes it.

Change-Id: I786a5eaae9243b4728484f3f3b1427b20a1d2d28
---
M 
testdata/workloads/functional-query/queries/DataErrorsTest/orc-type-checks.test
M tests/query_test/test_scanners.py
2 files changed, 44 insertions(+), 36 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/51/21151/3
--
To view, visit http://gerrit.cloudera.org:8080/21151
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I786a5eaae9243b4728484f3f3b1427b20a1d2d28
Gerrit-Change-Number: 21151
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 


[Impala-ASF-CR] IMPALA-12904: test type conversions hive3 silently passes because of wrongly defined test dimensions

2024-03-18 Thread Zoltan Borok-Nagy (Code Review)
Hello Quanlong Huang, Riza Suminto, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21151

to look at the new patch set (#2).

Change subject: IMPALA-12904: test_type_conversions_hive3 silently passes 
because of wrongly defined test dimensions
..

IMPALA-12904: test_type_conversions_hive3 silently passes because of wrongly 
defined test dimensions

test_type_conversions_hive3 silently passes because we are not creating
the test dimenstion for query option orc_shema_resolution correctly. If
we set orc_shema_resolution correctly, i.e. to also exercise the
name-based schema resolution, the test fails. The cause of the failure
is that the ill-typed tables have dummy column names like 'c1', 'c2',
etc. These are completely fine for position-based schema resolution,
but it is not OK for name-based schema resolution.

The test just wants to check error messages related to type errors,
the column names are irrelevant, so we can just use the correct
names.

The test was copied from the old test_type_conversions_hive2 which is
not relevant anymore, so this CR also removes it.

Change-Id: I786a5eaae9243b4728484f3f3b1427b20a1d2d28
---
M 
testdata/workloads/functional-query/queries/DataErrorsTest/orc-type-checks.test
M tests/query_test/test_scanners.py
2 files changed, 44 insertions(+), 36 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/51/21151/2
--
To view, visit http://gerrit.cloudera.org:8080/21151
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I786a5eaae9243b4728484f3f3b1427b20a1d2d28
Gerrit-Change-Number: 21151
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 


[Impala-ASF-CR] IMPALA-12903: Querying virtual column FILE POSITION for TEXT and JSON tables crashes Impala

2024-03-18 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21148 )

Change subject: IMPALA-12903: Querying virtual column FILE__POSITION for TEXT 
and JSON tables crashes Impala
..


Patch Set 3:

(6 comments)

Thanks for the comments!

http://gerrit.cloudera.org:8080/#/c/21148/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/21148/1//COMMIT_MSG@19
PS1, Line 19: let
> Nit: lets.
Done


http://gerrit.cloudera.org:8080/#/c/21148/1/testdata/workloads/functional-query/queries/QueryTest/virtual-column-file-position-generic.test
File 
testdata/workloads/functional-query/queries/QueryTest/virtual-column-file-position-generic.test:

http://gerrit.cloudera.org:8080/#/c/21148/1/testdata/workloads/functional-query/queries/QueryTest/virtual-column-file-position-generic.test@158
PS1, Line 158:  QUERY
> Are these the queries where some files in the table do not support FILE_POS
Done


http://gerrit.cloudera.org:8080/#/c/21148/1/testdata/workloads/functional-query/queries/QueryTest/virtual-column-file-position-generic.test@159
PS1, Line 159: # Regression test for IMPALA-12903. The following query uses 
static pruning. The surviving
> nit: could you add a comment that in this test we prune partitions that doe
Done


http://gerrit.cloudera.org:8080/#/c/21148/1/testdata/workloads/functional-query/queries/QueryTest/virtual-column-file-position-negative.test
File 
testdata/workloads/functional-query/queries/QueryTest/virtual-column-file-position-negative.test:

http://gerrit.cloudera.org:8080/#/c/21148/1/testdata/workloads/functional-query/queries/QueryTest/virtual-column-file-position-negative.test@1
PS1, Line 1: 
> Is FILE_POSITION the only virtual column that could cause this bug before t
Only FILE__POSITION cause this problem. INPUT__FILE__NAME is supported for all 
file formats.


http://gerrit.cloudera.org:8080/#/c/21148/1/testdata/workloads/functional-query/queries/QueryTest/virtual-column-file-position-negative.test@40
PS1, Line 40: Virtual column FILE__POSITION is not supported
> Could you replace some of the FILE__POSITIONS to some other virtual columns
INPUT__FILE__NAME is supported for all file formats.


http://gerrit.cloudera.org:8080/#/c/21148/1/tests/query_test/test_scanners.py
File tests/query_test/test_scanners.py:

http://gerrit.cloudera.org:8080/#/c/21148/1/tests/query_test/test_scanners.py@183
PS1, Line 183: )))
> Or just fix the table_format dimension to text/none and remove this constra
Thanks for the suggestions. I went with the uncompressed text dimension option.



--
To view, visit http://gerrit.cloudera.org:8080/21148
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8e1af8d526f9046aceddb5944da9e6f9c63768b0
Gerrit-Change-Number: 21148
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 18 Mar 2024 10:18:12 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12903: Querying virtual column FILE POSITION for TEXT and JSON tables crashes Impala

2024-03-18 Thread Zoltan Borok-Nagy (Code Review)
Hello Daniel Becker, Riza Suminto, Gabor Kaszab, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21148

to look at the new patch set (#3).

Change subject: IMPALA-12903: Querying virtual column FILE__POSITION for TEXT 
and JSON tables crashes Impala
..

IMPALA-12903: Querying virtual column FILE__POSITION for TEXT and JSON tables 
crashes Impala

Impala generates segmentation fault when it queries the virtual column
FILE__POSITION for TEXT or JSON tables. When the scanners that do not
support the FILE__POSITION virtual column detect its presence they
try to report an error and close themselves. The segfault is in the
scanners' Close() method when they try to dereference a NULL stream
object.

This patch simply adds NULL-checks in Close().

Alternatively we could detect the presence of FILE__POSITION during
planning in the HdfsScanNode, but doing it in the scanners lets us
handle more queries, e.g. queries that dynamically prune partitions
and the surviving partitions all have file formats that support
FILE__POSITION.

Testing:
 * added negative tests to properly report the errors
 * added tests for mixed file format tables

Change-Id: I8e1af8d526f9046aceddb5944da9e6f9c63768b0
---
M be/src/exec/json/hdfs-json-scanner.cc
M be/src/exec/text/hdfs-text-scanner.cc
M 
testdata/workloads/functional-query/queries/QueryTest/virtual-column-file-position-generic.test
A 
testdata/workloads/functional-query/queries/QueryTest/virtual-column-file-position-negative.test
M tests/query_test/test_scanners.py
5 files changed, 92 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/48/21148/3
--
To view, visit http://gerrit.cloudera.org:8080/21148
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8e1af8d526f9046aceddb5944da9e6f9c63768b0
Gerrit-Change-Number: 21148
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 


[Impala-ASF-CR] IMPALA-12903: Querying virtual column FILE POSITION for TEXT and JSON tables crashes Impala

2024-03-18 Thread Zoltan Borok-Nagy (Code Review)
Hello Daniel Becker, Riza Suminto, Gabor Kaszab, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21148

to look at the new patch set (#2).

Change subject: IMPALA-12903: Querying virtual column FILE__POSITION for TEXT 
and JSON tables crashes Impala
..

IMPALA-12903: Querying virtual column FILE__POSITION for TEXT and JSON tables 
crashes Impala

Impala generates segmentation fault when it queries the virtual column
FILE__POSITION for TEXT or JSON tables. When the scanners that do not
support the FILE__POSITION virtual column detect its presence they
try to report an error and close themselves. The segfault is in the
scanners' Close() method when they try to dereference a NULL stream
object.

This patch simply adds NULL-checks in Close().

Alternatively we could detect the presence of FILE__POSITION during
planning in the HdfsScanNode, but doing it in the scanners lets us
handle more queries, e.g. queries that dynamically prune partitions
and the surviving partitions all have file formats that support
FILE__POSITION.

Testing:
 * added negative tests to properly report the errors
 * added tests for mixed file format tables

Change-Id: I8e1af8d526f9046aceddb5944da9e6f9c63768b0
---
M be/src/exec/json/hdfs-json-scanner.cc
M be/src/exec/text/hdfs-text-scanner.cc
M 
testdata/workloads/functional-query/queries/QueryTest/virtual-column-file-position-generic.test
A 
testdata/workloads/functional-query/queries/QueryTest/virtual-column-file-position-negative.test
M tests/query_test/test_scanners.py
5 files changed, 92 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/48/21148/2
--
To view, visit http://gerrit.cloudera.org:8080/21148
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8e1af8d526f9046aceddb5944da9e6f9c63768b0
Gerrit-Change-Number: 21148
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 


[Impala-ASF-CR] IMPALA-12904: test type conversions hive3 silently passes because of wrongly defined test dimensions

2024-03-14 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/21151


Change subject: IMPALA-12904: test_type_conversions_hive3 silently passes 
because of wrongly defined test dimensions
..

IMPALA-12904: test_type_conversions_hive3 silently passes because of wrongly 
defined test dimensions

test_type_conversions_hive3 silently passes because we are not creating
the test dimenstion for query option orc_shema_resolution correctly. If
we set orc_shema_resolution correctly, i.e. to also exercise the
name-based schema resolution, the test fails. The cause of the failure
is that the ill-typed tables have dummy column names like 'c1', 'c2',
etc. These are completely fine for position-based schema resolution,
but it is not OK for name-based schema resolution.

The test just wants to check error messages related to type errors,
the column names are irrelevant, so we can just use the correct
names.

The test was copied from the old test_type_conversions_hive2 which is
not relevant anymore, so this CR also removes it.

Change-Id: I786a5eaae9243b4728484f3f3b1427b20a1d2d28
---
M 
testdata/workloads/functional-query/queries/DataErrorsTest/orc-type-checks.test
M tests/query_test/test_scanners.py
2 files changed, 42 insertions(+), 81 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/51/21151/1
--
To view, visit http://gerrit.cloudera.org:8080/21151
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I786a5eaae9243b4728484f3f3b1427b20a1d2d28
Gerrit-Change-Number: 21151
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-12903: Querying virtual column FILE POSITION for TEXT and JSON tables crashes Impala

2024-03-14 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/21148


Change subject: IMPALA-12903: Querying virtual column FILE__POSITION for TEXT 
and JSON tables crashes Impala
..

IMPALA-12903: Querying virtual column FILE__POSITION for TEXT and JSON tables 
crashes Impala

Impala generates segmentation fault when it queries the virtual column
FILE__POSITION for TEXT or JSON tables. When the scanners that do not
support the FILE__POSITION virtual column detect its presence they
try to report an error and close themselves. The segfault is in the
scanners' Close() method when they try to dereference a NULL stream
object.

This patch simply adds NULL-checks in Close().

Alternatively we could detect the presence of FILE__POSITION during
planning in the HdfsScanNode, but doing it in the scanners let us
handle more queries, e.g. queries that dynamically prune partitions
and the surviving partitions all have file formats that support
FILE__POSITION.

Testing:
 * added negative tests to properly report the errors
 * added tests for mixed file format tables

Change-Id: I8e1af8d526f9046aceddb5944da9e6f9c63768b0
---
M be/src/exec/json/hdfs-json-scanner.cc
M be/src/exec/text/hdfs-text-scanner.cc
M 
testdata/workloads/functional-query/queries/QueryTest/virtual-column-file-position-generic.test
A 
testdata/workloads/functional-query/queries/QueryTest/virtual-column-file-position-negative.test
M tests/query_test/test_scanners.py
5 files changed, 88 insertions(+), 3 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/48/21148/1
--
To view, visit http://gerrit.cloudera.org:8080/21148
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I8e1af8d526f9046aceddb5944da9e6f9c63768b0
Gerrit-Change-Number: 21148
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-12894: (part 1) Turn off the count(*) optimisation for V2 Iceberg tables

2024-03-13 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21139 )

Change subject: IMPALA-12894: (part 1) Turn off the count(*) optimisation for 
V2 Iceberg tables
..


Patch Set 4: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/21139
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ida9fb04fd076c987b6b5257ad801bf30f5900237
Gerrit-Change-Number: 21139
Gerrit-PatchSet: 4
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 13 Mar 2024 14:13:30 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12894: Turn off the count(*) optimisation for V2 Iceberg tables

2024-03-13 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21139 )

Change subject: IMPALA-12894: Turn off the count(*) optimisation for V2 Iceberg 
tables
..


Patch Set 3: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/21139/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/21139/3//COMMIT_MSG@7
PS3, Line 7:
nit: maybe you could include "part 1" in the title



--
To view, visit http://gerrit.cloudera.org:8080/21139
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ida9fb04fd076c987b6b5257ad801bf30f5900237
Gerrit-Change-Number: 21139
Gerrit-PatchSet: 3
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 13 Mar 2024 14:11:10 +
Gerrit-HasComments: Yes


  1   2   3   4   5   6   7   8   9   10   >