Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18536 )

Change subject: IMPALA-10465: Use IGNORE variant of Kudu write operations
......................................................................


Patch Set 5:

(24 comments)

http://gerrit.cloudera.org:8080/#/c/18536/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/18536/5//COMMIT_MSG@17
PS5, Line 17: the
nit: drop the duplicate 'the'


http://gerrit.cloudera.org:8080/#/c/18536/5//COMMIT_MSG@21
PS5, Line 21: runing
nit: running


http://gerrit.cloudera.org:8080/#/c/18536/5//COMMIT_MSG@22
PS5, Line 22: 9 iterations
Just curious: why such number?  Did the 10nth iteration fail for some reason?


http://gerrit.cloudera.org:8080/#/c/18536/5//COMMIT_MSG@34
PS5, Line 34: Additionally, this patch adds 'workload_iterations' option in
            : single_node_perf_run.py to loop a workload and modify 
DDL_CRUD_PATTERN
            : in query_executor.py to not not trigger EXPLAIN in regular CREATE
            : query (non CTAS).
Why not to separate that part in its own changelist for clarity?


http://gerrit.cloudera.org:8080/#/c/18536/5/be/src/exec/kudu-table-sink.h
File be/src/exec/kudu-table-sink.h:

http://gerrit.cloudera.org:8080/#/c/18536/5/be/src/exec/kudu-table-sink.h@145
PS5, Line 145: conflicts
nit: duplicate and absent key conflicts


http://gerrit.cloudera.org:8080/#/c/18536/5/be/src/exec/kudu-table-sink.cc
File be/src/exec/kudu-table-sink.cc:

http://gerrit.cloudera.org:8080/#/c/18536/5/be/src/exec/kudu-table-sink.cc@61
PS5, Line 61: conficts
conflicts on duplicate and absent primary keys


http://gerrit.cloudera.org:8080/#/c/18536/5/be/src/exec/kudu-table-sink.cc@61
PS5, Line 61: by default.
I'm not sure that 'by default' here provides enough context: if there is an 
extra control for this, maybe mention that in the description for this flag?


http://gerrit.cloudera.org:8080/#/c/18536/5/be/src/exec/kudu-table-sink.cc@61
PS5, Line 61: kudu
Kudu


http://gerrit.cloudera.org:8080/#/c/18536/5/be/src/exec/kudu-table-sink.cc@62
PS5, Line 62: Master
cluster


http://gerrit.cloudera.org:8080/#/c/18536/5/be/src/exec/kudu-table-sink.cc@62
PS5, Line 62: node
drop 'node'


http://gerrit.cloudera.org:8080/#/c/18536/5/be/src/exec/kudu-table-sink.cc@62
PS5, Line 62: support
supports


http://gerrit.cloudera.org:8080/#/c/18536/5/be/src/exec/kudu-table-sink.cc@63
PS5, Line 63: Impala will use ignore operations of Kudu API (INSERT_IGNORE, 
UPDATE_IGNORE, "
            :     "and DELETE_IGNORE)
How about shortening this part:

... Impala will use {INSERT,UPDATE,DELETE}_IGNORE operations of the Kudu API ...


http://gerrit.cloudera.org:8080/#/c/18536/5/be/src/exec/kudu-table-sink.cc@254
PS5, Line 254:   if (sink_action_ == TSinkAction::INSERT) {
             :     return table_->NewInsertIgnore();
             :   } else if (sink_action_ == TSinkAction::UPDATE) {
             :     return table_->NewUpdateIgnore();
             :   } else if (sink_action_ == TSinkAction::UPSERT) {
             :     return table_->NewUpsert();
             :   } else {
             :     DCHECK(sink_action_ == TSinkAction::DELETE)
             :         << "Sink type not supported: " << sink_action_;
             :     return table_->NewDeleteIgnore();
             :   }
Just curios: I used to see this is done usually via switch statements, not 
'if/else'.  Any reason why using if/else instead switch/case here?


http://gerrit.cloudera.org:8080/#/c/18536/5/be/src/exec/kudu-table-sink.cc@354
PS5, Line 354:         + metrics.GetMetric("delete_ignore_errors");
Does it make sense to add a DCHECK to verify that ignored_errors >= 
total_ignored_errors_ always?


http://gerrit.cloudera.org:8080/#/c/18536/5/be/src/exec/kudu-table-sink.cc@356
PS5, Line 356:       // Since ignore_conflicts_ is true, there should be no 
Kudu error message.
             :       DCHECK(session_->CountPendingErrors() == 0);
Why?  What if some other than duplicate/absent key error happened?


http://gerrit.cloudera.org:8080/#/c/18536/5/be/src/exec/kudu-table-sink.cc@360
PS5, Line 360:       if (is_transactional_) {
             :         // Return general status error to abort transaction.
             :         return Status("Kudu reported write operation errors 
during transaction.");
             :       }
That's an interesting twist: with this, Impala unconditionally aborts multi-row 
Kudu transactions in case of duplicate keys even if using INSERT_IGNORE.  Why?  
Would be great to have more detailed reasoning provided in the comment.


http://gerrit.cloudera.org:8080/#/c/18536/5/be/src/exec/kudu-table-sink.cc@364
PS5, Line 364:       return Status::OK();
What if there were some UPSERT operation errors in addition?  Is it safe to 
return from here?  Please add a comment to clarify.


http://gerrit.cloudera.org:8080/#/c/18536/5/be/src/exec/kudu-table-sink.cc@433
PS5, Line 433:   session_.reset();
Does it make sense to zero out total_ignored_errors_ at this point as well?


http://gerrit.cloudera.org:8080/#/c/18536/5/fe/src/main/java/org/apache/impala/planner/KuduTableSink.java
File fe/src/main/java/org/apache/impala/planner/KuduTableSink.java:

http://gerrit.cloudera.org:8080/#/c/18536/5/fe/src/main/java/org/apache/impala/planner/KuduTableSink.java@59
PS5, Line 59: support
supports


http://gerrit.cloudera.org:8080/#/c/18536/5/fe/src/main/java/org/apache/impala/planner/KuduTableSink.java@59
PS5, Line 59: master
cluster


http://gerrit.cloudera.org:8080/#/c/18536/5/fe/src/main/java/org/apache/impala/planner/KuduTableSink.java@70
PS5, Line 70: support
supports


http://gerrit.cloudera.org:8080/#/c/18536/5/fe/src/main/java/org/apache/impala/planner/KuduTableSink.java@70
PS5, Line 70: master
cluster


http://gerrit.cloudera.org:8080/#/c/18536/5/fe/src/main/java/org/apache/impala/planner/KuduTableSink.java@75
PS5, Line 75:       supportsIgnoreOperations_ = 
client.supportsIgnoreOperations();
Is there a way to find out what mode KuduTableSink uses later on during its 
operation?  It might help while doing troubleshooting down the road, I guess.


http://gerrit.cloudera.org:8080/#/c/18536/5/fe/src/main/java/org/apache/impala/planner/KuduTableSink.java@77
PS5, Line 77: Ignore
nit: ignore ?



--
To view, visit http://gerrit.cloudera.org:8080/18536
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8da7c41d61b0888378b390b8b643238433eb3b52
Gerrit-Change-Number: 18536
Gerrit-PatchSet: 5
Gerrit-Owner: Riza Suminto <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Kurt Deschler <[email protected]>
Gerrit-Reviewer: Riza Suminto <[email protected]>
Gerrit-Reviewer: Wenzhe Zhou <[email protected]>
Gerrit-Comment-Date: Fri, 20 May 2022 19:00:28 +0000
Gerrit-HasComments: Yes

Reply via email to