Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18536 )

Change subject: IMPALA-10465: Use IGNORE variant of Kudu write operations
......................................................................


Patch Set 5:

(8 comments)

Thank you Alexey for your feedback!
I'm replying to some that we might want to discuss more.

http://gerrit.cloudera.org:8080/#/c/18536/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/18536/5//COMMIT_MSG@22
PS5, Line 22: 9 iterations
> Just curious: why such number?  Did the 10nth iteration fail for some reaso
No particular reason. Just pick an odd number out of habit to get exact median 
number.


http://gerrit.cloudera.org:8080/#/c/18536/5//COMMIT_MSG@34
PS5, Line 34: Additionally, this patch adds 'workload_iterations' option in
            : single_node_perf_run.py to loop a workload and modify 
DDL_CRUD_PATTERN
            : in query_executor.py to not not trigger EXPLAIN in regular CREATE
            : query (non CTAS).
> Why not to separate that part in its own changelist for clarity?
The kudu-ignore.test that I craft require these changes in Impala perf script. 
I'm OK to drop kudu-ignore.test and the related script changes to reduce the 
patch scope.


http://gerrit.cloudera.org:8080/#/c/18536/5/be/src/exec/kudu-table-sink.cc
File be/src/exec/kudu-table-sink.cc:

http://gerrit.cloudera.org:8080/#/c/18536/5/be/src/exec/kudu-table-sink.cc@254
PS5, Line 254:   if (sink_action_ == TSinkAction::INSERT) {
             :     return table_->NewInsertIgnore();
             :   } else if (sink_action_ == TSinkAction::UPDATE) {
             :     return table_->NewUpdateIgnore();
             :   } else if (sink_action_ == TSinkAction::UPSERT) {
             :     return table_->NewUpsert();
             :   } else {
             :     DCHECK(sink_action_ == TSinkAction::DELETE)
             :         << "Sink type not supported: " << sink_action_;
             :     return table_->NewDeleteIgnore();
             :   }
> Just curios: I used to see this is done usually via switch statements, not
This is implemented with 'if/else' just to maintain consistency with the 
existing KuduTableSink::NewWriteOp().


http://gerrit.cloudera.org:8080/#/c/18536/5/be/src/exec/kudu-table-sink.cc@354
PS5, Line 354:         + metrics.GetMetric("delete_ignore_errors");
> Does it make sense to add a DCHECK to verify that ignored_errors >= total_i
Make sense. Will do.


http://gerrit.cloudera.org:8080/#/c/18536/5/be/src/exec/kudu-table-sink.cc@356
PS5, Line 356:       // Since ignore_conflicts_ is true, there should be no 
Kudu error message.
             :       DCHECK(session_->CountPendingErrors() == 0);
> Why?  What if some other than duplicate/absent key error happened?
My assumption is that XXX_IGNORE ops will not return any error message back to 
client, regardless of the root cause. Is that incorrect?


http://gerrit.cloudera.org:8080/#/c/18536/5/be/src/exec/kudu-table-sink.cc@360
PS5, Line 360:       if (is_transactional_) {
             :         // Return general status error to abort transaction.
             :         return Status("Kudu reported write operation errors 
during transaction.");
             :       }
> That's an interesting twist: with this, Impala unconditionally aborts multi
This is to maintain previous behavior. In line 382, Impala immediately return 
error to abort transaction.
With XXX_IGNORE ops usage, I think we should still abort the transaction even 
although the error message is discarded.
Will clarify this as comment.


http://gerrit.cloudera.org:8080/#/c/18536/5/be/src/exec/kudu-table-sink.cc@364
PS5, Line 364:       return Status::OK();
> What if there were some UPSERT operation errors in addition?  Is it safe to
Impala does not mix write operation in single KuduTableSink instance lifetime. 
The sink_action_ will either be INSERT[_IGNORE], UPDATE[_IGNORE], or UPSERT and 
remain unchanged. Will add comment about this.


http://gerrit.cloudera.org:8080/#/c/18536/5/fe/src/main/java/org/apache/impala/planner/KuduTableSink.java
File fe/src/main/java/org/apache/impala/planner/KuduTableSink.java:

http://gerrit.cloudera.org:8080/#/c/18536/5/fe/src/main/java/org/apache/impala/planner/KuduTableSink.java@75
PS5, Line 75:       supportsIgnoreOperations_ = 
client.supportsIgnoreOperations();
> Is there a way to find out what mode KuduTableSink uses later on during its
I think we can put an ExecOption info in Impala query profile about this.



--
To view, visit http://gerrit.cloudera.org:8080/18536
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8da7c41d61b0888378b390b8b643238433eb3b52
Gerrit-Change-Number: 18536
Gerrit-PatchSet: 5
Gerrit-Owner: Riza Suminto <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Kurt Deschler <[email protected]>
Gerrit-Reviewer: Riza Suminto <[email protected]>
Gerrit-Reviewer: Wenzhe Zhou <[email protected]>
Gerrit-Comment-Date: Fri, 20 May 2022 21:36:46 +0000
Gerrit-HasComments: Yes

Reply via email to