Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/18536 )
Change subject: IMPALA-10465: Use IGNORE variant of Kudu write operations ...................................................................... Patch Set 5: (24 comments) http://gerrit.cloudera.org:8080/#/c/18536/5//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/18536/5//COMMIT_MSG@17 PS5, Line 17: the nit: drop the duplicate 'the' http://gerrit.cloudera.org:8080/#/c/18536/5//COMMIT_MSG@21 PS5, Line 21: runing nit: running http://gerrit.cloudera.org:8080/#/c/18536/5//COMMIT_MSG@22 PS5, Line 22: 9 iterations Just curious: why such number? Did the 10nth iteration fail for some reason? http://gerrit.cloudera.org:8080/#/c/18536/5//COMMIT_MSG@34 PS5, Line 34: Additionally, this patch adds 'workload_iterations' option in : single_node_perf_run.py to loop a workload and modify DDL_CRUD_PATTERN : in query_executor.py to not not trigger EXPLAIN in regular CREATE : query (non CTAS). Why not to separate that part in its own changelist for clarity? http://gerrit.cloudera.org:8080/#/c/18536/5/be/src/exec/kudu-table-sink.h File be/src/exec/kudu-table-sink.h: http://gerrit.cloudera.org:8080/#/c/18536/5/be/src/exec/kudu-table-sink.h@145 PS5, Line 145: conflicts nit: duplicate and absent key conflicts http://gerrit.cloudera.org:8080/#/c/18536/5/be/src/exec/kudu-table-sink.cc File be/src/exec/kudu-table-sink.cc: http://gerrit.cloudera.org:8080/#/c/18536/5/be/src/exec/kudu-table-sink.cc@61 PS5, Line 61: conficts conflicts on duplicate and absent primary keys http://gerrit.cloudera.org:8080/#/c/18536/5/be/src/exec/kudu-table-sink.cc@61 PS5, Line 61: by default. I'm not sure that 'by default' here provides enough context: if there is an extra control for this, maybe mention that in the description for this flag? http://gerrit.cloudera.org:8080/#/c/18536/5/be/src/exec/kudu-table-sink.cc@61 PS5, Line 61: kudu Kudu http://gerrit.cloudera.org:8080/#/c/18536/5/be/src/exec/kudu-table-sink.cc@62 PS5, Line 62: Master cluster http://gerrit.cloudera.org:8080/#/c/18536/5/be/src/exec/kudu-table-sink.cc@62 PS5, Line 62: node drop 'node' http://gerrit.cloudera.org:8080/#/c/18536/5/be/src/exec/kudu-table-sink.cc@62 PS5, Line 62: support supports http://gerrit.cloudera.org:8080/#/c/18536/5/be/src/exec/kudu-table-sink.cc@63 PS5, Line 63: Impala will use ignore operations of Kudu API (INSERT_IGNORE, UPDATE_IGNORE, " : "and DELETE_IGNORE) How about shortening this part: ... Impala will use {INSERT,UPDATE,DELETE}_IGNORE operations of the Kudu API ... http://gerrit.cloudera.org:8080/#/c/18536/5/be/src/exec/kudu-table-sink.cc@254 PS5, Line 254: if (sink_action_ == TSinkAction::INSERT) { : return table_->NewInsertIgnore(); : } else if (sink_action_ == TSinkAction::UPDATE) { : return table_->NewUpdateIgnore(); : } else if (sink_action_ == TSinkAction::UPSERT) { : return table_->NewUpsert(); : } else { : DCHECK(sink_action_ == TSinkAction::DELETE) : << "Sink type not supported: " << sink_action_; : return table_->NewDeleteIgnore(); : } Just curios: I used to see this is done usually via switch statements, not 'if/else'. Any reason why using if/else instead switch/case here? http://gerrit.cloudera.org:8080/#/c/18536/5/be/src/exec/kudu-table-sink.cc@354 PS5, Line 354: + metrics.GetMetric("delete_ignore_errors"); Does it make sense to add a DCHECK to verify that ignored_errors >= total_ignored_errors_ always? http://gerrit.cloudera.org:8080/#/c/18536/5/be/src/exec/kudu-table-sink.cc@356 PS5, Line 356: // Since ignore_conflicts_ is true, there should be no Kudu error message. : DCHECK(session_->CountPendingErrors() == 0); Why? What if some other than duplicate/absent key error happened? http://gerrit.cloudera.org:8080/#/c/18536/5/be/src/exec/kudu-table-sink.cc@360 PS5, Line 360: if (is_transactional_) { : // Return general status error to abort transaction. : return Status("Kudu reported write operation errors during transaction."); : } That's an interesting twist: with this, Impala unconditionally aborts multi-row Kudu transactions in case of duplicate keys even if using INSERT_IGNORE. Why? Would be great to have more detailed reasoning provided in the comment. http://gerrit.cloudera.org:8080/#/c/18536/5/be/src/exec/kudu-table-sink.cc@364 PS5, Line 364: return Status::OK(); What if there were some UPSERT operation errors in addition? Is it safe to return from here? Please add a comment to clarify. http://gerrit.cloudera.org:8080/#/c/18536/5/be/src/exec/kudu-table-sink.cc@433 PS5, Line 433: session_.reset(); Does it make sense to zero out total_ignored_errors_ at this point as well? http://gerrit.cloudera.org:8080/#/c/18536/5/fe/src/main/java/org/apache/impala/planner/KuduTableSink.java File fe/src/main/java/org/apache/impala/planner/KuduTableSink.java: http://gerrit.cloudera.org:8080/#/c/18536/5/fe/src/main/java/org/apache/impala/planner/KuduTableSink.java@59 PS5, Line 59: support supports http://gerrit.cloudera.org:8080/#/c/18536/5/fe/src/main/java/org/apache/impala/planner/KuduTableSink.java@59 PS5, Line 59: master cluster http://gerrit.cloudera.org:8080/#/c/18536/5/fe/src/main/java/org/apache/impala/planner/KuduTableSink.java@70 PS5, Line 70: support supports http://gerrit.cloudera.org:8080/#/c/18536/5/fe/src/main/java/org/apache/impala/planner/KuduTableSink.java@70 PS5, Line 70: master cluster http://gerrit.cloudera.org:8080/#/c/18536/5/fe/src/main/java/org/apache/impala/planner/KuduTableSink.java@75 PS5, Line 75: supportsIgnoreOperations_ = client.supportsIgnoreOperations(); Is there a way to find out what mode KuduTableSink uses later on during its operation? It might help while doing troubleshooting down the road, I guess. http://gerrit.cloudera.org:8080/#/c/18536/5/fe/src/main/java/org/apache/impala/planner/KuduTableSink.java@77 PS5, Line 77: Ignore nit: ignore ? -- To view, visit http://gerrit.cloudera.org:8080/18536 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8da7c41d61b0888378b390b8b643238433eb3b52 Gerrit-Change-Number: 18536 Gerrit-PatchSet: 5 Gerrit-Owner: Riza Suminto <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Kurt Deschler <[email protected]> Gerrit-Reviewer: Riza Suminto <[email protected]> Gerrit-Reviewer: Wenzhe Zhou <[email protected]> Gerrit-Comment-Date: Fri, 20 May 2022 19:00:28 +0000 Gerrit-HasComments: Yes
