Daniel Vanko has posted comments on this change. ( http://gerrit.cloudera.org:8080/24137 )
Change subject: IMPALA-13810: Generate lineage records for UPDATE and MERGE statements ...................................................................... Patch Set 6: (14 comments) http://gerrit.cloudera.org:8080/#/c/24137/5/fe/src/main/java/org/apache/impala/analysis/ColumnLineageGraph.java File fe/src/main/java/org/apache/impala/analysis/ColumnLineageGraph.java: http://gerrit.cloudera.org:8080/#/c/24137/5/fe/src/main/java/org/apache/impala/analysis/ColumnLineageGraph.java@482 PS5, Line 482: String.valueOf(tableName_)) > nit: String.valueOf(tableName_) Done http://gerrit.cloudera.org:8080/#/c/24137/5/fe/src/main/java/org/apache/impala/planner/IcebergBufferedDeleteSink.java File fe/src/main/java/org/apache/impala/planner/IcebergBufferedDeleteSink.java: http://gerrit.cloudera.org:8080/#/c/24137/5/fe/src/main/java/org/apache/impala/planner/IcebergBufferedDeleteSink.java@130 PS5, Line 130: collectExprs > If this method is only used for lineage, then it should be renamed to somet Done http://gerrit.cloudera.org:8080/#/c/24137/4/fe/src/main/java/org/apache/impala/planner/Planner.java File fe/src/main/java/org/apache/impala/planner/Planner.java: http://gerrit.cloudera.org:8080/#/c/24137/4/fe/src/main/java/org/apache/impala/planner/Planner.java@297 PS4, Line 297: se { > Instead of adding all columns, can we only add the target columns, just lik Done http://gerrit.cloudera.org:8080/#/c/24137/4/fe/src/main/java/org/apache/impala/planner/Planner.java@333 PS4, Line 333: > Can these have different sizes? If not, can we precondition check? No, I don't think they can have, changed to a precondition check. http://gerrit.cloudera.org:8080/#/c/24137/5/fe/src/main/java/org/apache/impala/planner/Planner.java File fe/src/main/java/org/apache/impala/planner/Planner.java: http://gerrit.cloudera.org:8080/#/c/24137/5/fe/src/main/java/org/apache/impala/planner/Planner.java@232 PS5, Line 232: ctx_.getRootAnalyzer().logCacheStats(); > is it possible to move out this whole section as a separate method? Done http://gerrit.cloudera.org:8080/#/c/24137/5/fe/src/main/java/org/apache/impala/planner/Planner.java@322 PS5, Line 322: or non-Kudu tables (e.g., Iceberg), extract target columns f > this could be a constant for MERGE Done http://gerrit.cloudera.org:8080/#/c/24137/4/testdata/workloads/functional-query/queries/QueryTest/lineage-update-merge.test File testdata/workloads/functional-query/queries/QueryTest/lineage-update-merge.test: http://gerrit.cloudera.org:8080/#/c/24137/4/testdata/workloads/functional-query/queries/QueryTest/lineage-update-merge.test@1044 PS4, Line 1044: "tableName": "functional_parquet.iceberg_non_partitioned", > Can we have more MERGE tests with Done http://gerrit.cloudera.org:8080/#/c/24137/4/testdata/workloads/functional-query/queries/QueryTest/lineage-update-merge.test@1448 PS4, Line 1448: "tableName": "lineage_dml_test_db.iceberg_non_partitioned", > Impala also has UPDATE FROM statements, e.g.: Thanks, added those as well, for Kudu and Iceberg also. http://gerrit.cloudera.org:8080/#/c/24137/5/testdata/workloads/functional-query/queries/QueryTest/lineage-update-merge.test File testdata/workloads/functional-query/queries/QueryTest/lineage-update-merge.test: http://gerrit.cloudera.org:8080/#/c/24137/5/testdata/workloads/functional-query/queries/QueryTest/lineage-update-merge.test@1 PS5, Line 1: ==== : ---- QUERY : drop database if exists lineage_dml_test_db cascade : ==== : ---- QUERY : create database lineage_dml_test_db; : ==== > Using unique_database on run_test_case could solve this automatically, alth Yeah, I think it's easier to handle and maintain like this, without the $DATABASE injections. http://gerrit.cloudera.org:8080/#/c/24137/5/testdata/workloads/functional-query/queries/QueryTest/lineage-update-merge.test@157 PS5, Line 157: "operationType": "UPDATE" > For the UPDATE queries, conceptually the source and destination tables corr Not really, I fixed it, so we have only one table. http://gerrit.cloudera.org:8080/#/c/24137/5/testdata/workloads/functional-query/queries/QueryTest/lineage-update-merge.test@198 PS5, Line 198: "tableName": "lineage_dml_test_db.iceberg_non_partitioned", : "tableType": "iceberg", : "tableCreateTime": 1775747129 : } : } : ] : } : ---- QUERY : # test update statements on Iceberg tables : update lineage_dml_test_db.iceberg_non_partitioned set user = "Dani" where action = "click" : ==== : ---- LINEAGE : { > This is just for my own understanding since I do not know enough about the This is an optimisation for update statements. We also consider the user column, because if it's already equals "Dani" in this case, we won't modify that row. I think technically we add a "where user != "Dani"" predicate that's why it will appear as a source as well. http://gerrit.cloudera.org:8080/#/c/24137/5/testdata/workloads/functional-query/queries/QueryTest/lineage-update-merge.test@582 PS5, Line 582: "edgeType": "PROJECTION" > In this query, we have one source table, and one destination table. Thus a Deduplicated those by using fully qualified column names. http://gerrit.cloudera.org:8080/#/c/24137/5/testdata/workloads/functional-query/queries/QueryTest/lineage-update-merge.test@1448 PS5, Line 1448: "tableName": "lineage_dml_test_db.iceberg_non_partitioned", : "tableType": "icebe > It looks like for the UPDATE queries against Kudu tables, we do not list al Updated the Iceberg part to be consistent about not listing all the columns, only those which are referenced. http://gerrit.cloudera.org:8080/#/c/24137/5/tests/custom_cluster/test_lineage.py File tests/custom_cluster/test_lineage.py: http://gerrit.cloudera.org:8080/#/c/24137/5/tests/custom_cluster/test_lineage.py@196 PS5, Line 196: @SkipIfFS.hbase > Is this needed? Well, I copied this from the test_lineage_output() and it was originally added by https://gerrit.cloudera.org/c/14230/ I'll try to test if this is needed. -- To view, visit http://gerrit.cloudera.org:8080/24137 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icba79f756509438455bfbf3067733f6f29284220 Gerrit-Change-Number: 24137 Gerrit-PatchSet: 6 Gerrit-Owner: Daniel Vanko <[email protected]> Gerrit-Reviewer: Daniel Vanko <[email protected]> Gerrit-Reviewer: Fang-Yu Rao <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Peter Rozsa <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Mon, 08 Jun 2026 09:44:33 +0000 Gerrit-HasComments: Yes
