Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/13559 )
Change subject: IMPALA-8636: Implement INSERT for insert-only ACID tables ...................................................................... Patch Set 11: (16 comments) http://gerrit.cloudera.org:8080/#/c/13559/11//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/13559/11//COMMIT_MSG@25 PS11, Line 25: The Backend commits/aborts the transaction by calling the Frontend via I think this needs to be updated to explain what is done on the catalog. http://gerrit.cloudera.org:8080/#/c/13559/11//COMMIT_MSG@34 PS11, Line 34: tests only run with exhaustive exploration strategy) I'm thinking about how we can test more of the error paths, that seems like the biggest gap at the moment. It might be worth adding cancellation tests like TestCancellationSerial.test_cancel_insert() that cancels a query against an ACID table, and I suppose also one that cancels a select query against an acid table. Ideally we would also test some of the other error paths (e.g. HMS returns error) - I think we'd have to add failpoints for that. But I don't want to expand the scope of this too much - maybe take a look to see what it would take to do that. Don't think this is a blocker. http://gerrit.cloudera.org:8080/#/c/13559/11//COMMIT_MSG@37 PS11, Line 37: * add locks and heartbeats What are the implications of not having heartbeats? Will long-running queries against ACID tables have the transaction time out on the HMS? http://gerrit.cloudera.org:8080/#/c/13559/11/be/src/exec/hdfs-table-sink.h File be/src/exec/hdfs-table-sink.h: http://gerrit.cloudera.org:8080/#/c/13559/11/be/src/exec/hdfs-table-sink.h@134 PS11, Line 134: / "base or delta". It was confusing whether it was a path separate or or not http://gerrit.cloudera.org:8080/#/c/13559/11/be/src/service/client-request-state.h File be/src/service/client-request-state.h: http://gerrit.cloudera.org:8080/#/c/13559/11/be/src/service/client-request-state.h@493 PS11, Line 493: /// Gather and publish all required updates to the metastore Can you document the postcondition around transactions being committed/aborted? It seems like we're guaranteeing that DML transactions on partitioned tables should be committed or aborted when this function returns. http://gerrit.cloudera.org:8080/#/c/13559/11/be/src/service/client-request-state.cc File be/src/service/client-request-state.cc: http://gerrit.cloudera.org:8080/#/c/13559/11/be/src/service/client-request-state.cc@1106 PS11, Line 1106: Status rpc_status =client.DoRpc(&CatalogServiceClientWrapper::UpdateCatalog, missing space http://gerrit.cloudera.org:8080/#/c/13559/11/be/src/service/client-request-state.cc@1107 PS11, Line 1107: catalog_update, &resp); It would be more explicit to if (!rpc_status.ok()) { if (InTransaction()) AbortTransaction(); return rpc_status; } Actually could maybe combine the checks to reduce the number of error paths like: Status status = client.DoRpc(); if (status.ok()) status = Status(resp.result.status); if (!status.ok()...) { if (InTransaction()) AbortTransaction(); LOG(ERROR) << "ERROR Finalizing DML: " << status.GetDetail(); return status; } This has a slight behaviour change of logging RPC errors, but that seems like a reasonable thing to do. This is a bit of a nit-pick, and your code is correct, but this kind of error handling is easy to introduce bugs into. I think the current code is ok, will leave it to you to determine if it's an improvement or not. http://gerrit.cloudera.org:8080/#/c/13559/11/common/thrift/Frontend.thrift File common/thrift/Frontend.thrift: http://gerrit.cloudera.org:8080/#/c/13559/11/common/thrift/Frontend.thrift@361 PS11, Line 361: traget target http://gerrit.cloudera.org:8080/#/c/13559/11/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/13559/11/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@941 PS11, Line 941: //TODO writeIDs may also be loaded in other code paths. I guess you didn't actually add this comment, is there a JIRA for this? http://gerrit.cloudera.org:8080/#/c/13559/11/fe/src/main/java/org/apache/impala/common/TransactionException.java File fe/src/main/java/org/apache/impala/common/TransactionException.java: http://gerrit.cloudera.org:8080/#/c/13559/11/fe/src/main/java/org/apache/impala/common/TransactionException.java@21 PS11, Line 21: * Thrown for errors related to ACID transactions. I feel like this could be a little more specific, e.g. "Thrown for errors that occur when interacting with ACID transactions, e.g. failures to open, commit, or abort a transaction". I.e. it wouldn't include something like failing a query because it uses an unsupported table type or something. http://gerrit.cloudera.org:8080/#/c/13559/11/fe/src/main/java/org/apache/impala/common/TransactionException.java@22 PS11, Line 22: * unnecesssary blank line? http://gerrit.cloudera.org:8080/#/c/13559/11/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/13559/11/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3801 PS11, Line 3801: failers failures http://gerrit.cloudera.org:8080/#/c/13559/11/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3803 PS11, Line 3803: // moved to ClientRequestState::UpdateCatalog(). I think handling the abort on the client is probably the right thing to do anyway, because we want to abort the transaction if there's an RPC error talking to the catalog anyway. So we could actually just say "The client, i.e. query coordinator, is always responsible for aborting transactions when queries hit errors." http://gerrit.cloudera.org:8080/#/c/13559/11/fe/src/main/java/org/apache/impala/service/Frontend.java File fe/src/main/java/org/apache/impala/service/Frontend.java: http://gerrit.cloudera.org:8080/#/c/13559/11/fe/src/main/java/org/apache/impala/service/Frontend.java@1266 PS11, Line 1266: //TODO: should load table write ids in transaction context. is there a jira? http://gerrit.cloudera.org:8080/#/c/13559/11/fe/src/main/java/org/apache/impala/service/Frontend.java@1379 PS11, Line 1379: catch (TransactionException te) { nit: extra newline before catch http://gerrit.cloudera.org:8080/#/c/13559/11/fe/src/main/java/org/apache/impala/service/Frontend.java@1629 PS11, Line 1629: private static boolean stmtNeedsTransaction(AnalysisResult analysisResult) { The name/comment is a bit misleading, since it depends on whether the query is referencing an ACID table or not. -- To view, visit http://gerrit.cloudera.org:8080/13559 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id6c36fa6902676f06b4e38730f737becfc7c06ad Gerrit-Change-Number: 13559 Gerrit-PatchSet: 11 Gerrit-Owner: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: Todd Lipcon <t...@apache.org> Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Comment-Date: Thu, 18 Jul 2019 04:17:05 +0000 Gerrit-HasComments: Yes