Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/21099 )
Change subject: IMPALA-12860: Invoke validateDataFilesExist for RowDelta operations ...................................................................... Patch Set 2: (9 comments) http://gerrit.cloudera.org:8080/#/c/21099/1/be/src/runtime/dml-exec-state.h File be/src/runtime/dml-exec-state.h: http://gerrit.cloudera.org:8080/#/c/21099/1/be/src/runtime/dml-exec-state.h@135 PS1, Line 135: // Returns vector of Iceberg data files referenced by position delete records by > since it returns const ref, can the function also be const? Done http://gerrit.cloudera.org:8080/#/c/21099/1/be/src/runtime/dml-exec-state.h@170 PS1, Line 170: _; > nit: data_files_referenced_by_position_deletes That's a better name indeed. Done. http://gerrit.cloudera.org:8080/#/c/21099/1/common/protobuf/control_service.proto File common/protobuf/control_service.proto: http://gerrit.cloudera.org:8080/#/c/21099/1/common/protobuf/control_service.proto@115 PS1, Line 115: data_files_referenced_by_position_deletes > nit: data_files_referenced_by_position_deletes? Done http://gerrit.cloudera.org:8080/#/c/21099/1/common/thrift/CatalogService.thrift File common/thrift/CatalogService.thrift: http://gerrit.cloudera.org:8080/#/c/21099/1/common/thrift/CatalogService.thrift@249 PS1, Line 249: data_files_referenced_by_position_deletes > nit: data_files_referenced_by_position_deletes maybe? Done http://gerrit.cloudera.org:8080/#/c/21099/2/testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-delete.test File testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-delete.test: http://gerrit.cloudera.org:8080/#/c/21099/2/testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-delete.test@421 PS2, Line 421: BUFFERED DELETE FROM ICEBERG [functional_parquet.iceberg_v2_partitioned_position_deletes-POSITION-DELETE] > I'm wondering if getting rid of DELETE FROM ICEBERG in this patch is how ti I think it's smaller risk to use IcebergBufferedDeleteSink: * IcebergBufferedDeleteSink is already used for UPDATEs * It was straightforward to add the new logic to IcebergBufferedDeleteSink * Developing this funcionality in IcebergDeleteSink has the risk of introducing bugs * Developing this funcionality in IcebergDeleteSink is also redundant as we plan to get rid of the whole sink (IMPALA-12640) * In future it's better to only maintain one delete sink implementation * we haven't done perf measurements of the operators, as DELETEs are not too performance critical. But AFAICT IcebergBufferedDeleteSink should be more performant (especially if we remove the SORT node completely in the future). http://gerrit.cloudera.org:8080/#/c/21099/1/tests/stress/test_update_stress.py File tests/stress/test_update_stress.py: http://gerrit.cloudera.org:8080/#/c/21099/1/tests/stress/test_update_stress.py@269 PS1, Line 269: read also invokes OPTMIZE regularly > nit: this table name omits optimize. test_concurrent_write_and_optimize? Done http://gerrit.cloudera.org:8080/#/c/21099/1/tests/stress/test_update_stress.py@277 PS1, Line 277: num_rows = 10 > Could you add a comment that here you prepare the INSERT query so that thes It's not crucial for the purpose of the test to do it like this. I only do it because this way we spare some time. http://gerrit.cloudera.org:8080/#/c/21099/1/tests/stress/test_update_stress.py@283 PS1, Line 283: > I know we've been using this variable name for a while now, but I feel that Done http://gerrit.cloudera.org:8080/#/c/21099/2/tests/stress/test_update_stress.py File tests/stress/test_update_stress.py: http://gerrit.cloudera.org:8080/#/c/21099/2/tests/stress/test_update_stress.py@279 PS2, Line 279: # Prepare INSERT statement of 'num_rows' records. > nit: I meant to add it to a comment that this code prefers the INSERT in su Yeah, I was hesitant to add that part as it is not important for the purpose of the test. We could also have separate INSERT statements, and the test would do its work. Commenting that might make the impression in the reader that it is important to do it that way, and otherwise the test won't work. But we only do it that way because to save some time. However, if you still think it's important that comment, I can add it. -- To view, visit http://gerrit.cloudera.org:8080/21099 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4869eb863ff0afe8f691ccf2fd681a92d36b405c Gerrit-Change-Number: 21099 Gerrit-PatchSet: 2 Gerrit-Owner: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Reviewer: Daniel Becker <daniel.bec...@cloudera.com> Gerrit-Reviewer: Gabor Kaszab <gaborkas...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Comment-Date: Tue, 05 Mar 2024 08:06:44 +0000 Gerrit-HasComments: Yes