Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/13559 )
Change subject: IMPALA-8636: Implement INSERT for insert-only ACID tables ...................................................................... Patch Set 10: (13 comments) http://gerrit.cloudera.org:8080/#/c/13559/10/be/src/exec/hdfs-table-sink.h File be/src/exec/hdfs-table-sink.h: http://gerrit.cloudera.org:8080/#/c/13559/10/be/src/exec/hdfs-table-sink.h@264 PS10, Line 264: long > int64_t? Done http://gerrit.cloudera.org:8080/#/c/13559/10/be/src/exec/hdfs-table-sink.cc File be/src/exec/hdfs-table-sink.cc: http://gerrit.cloudera.org:8080/#/c/13559/10/be/src/exec/hdfs-table-sink.cc@500 PS10, Line 500: // However, for transactional tables we should create a new empty base directory. > Why? I assume there is some good reason but it's not immediately obvious to Sorry, I left out some information. We only need to do that in case of INSERT OVERWRITEs with empty result sets. For insert-only ACID tables we don't delete old files, we just create a new base directory. Extended the comment and modified the condition. http://gerrit.cloudera.org:8080/#/c/13559/10/be/src/exec/hdfs-table-sink.cc@700 PS10, Line 700: if (IsTransactional()) return true; > This one seems more obvious to me, but makes me think that the class commen Extended the class comment. http://gerrit.cloudera.org:8080/#/c/13559/10/be/src/runtime/coordinator.cc File be/src/runtime/coordinator.cc: http://gerrit.cloudera.org:8080/#/c/13559/10/be/src/runtime/coordinator.cc@571 PS10, Line 571: Status Coordinator::FinalizeHdfsInsert() { > I think we should probably do the transaction abort in this function, since Thanks, done. http://gerrit.cloudera.org:8080/#/c/13559/10/be/src/runtime/coordinator.cc@571 PS10, Line 571: FinalizeHdfsInsert > We should maybe rename this to FinalizeHdfsDml(), now or later. Done http://gerrit.cloudera.org:8080/#/c/13559/10/be/src/service/client-request-state.h File be/src/service/client-request-state.h: http://gerrit.cloudera.org:8080/#/c/13559/10/be/src/service/client-request-state.h@307 PS10, Line 307: /// True if there is an open transaction. > Hive ACID transaction, just to be clear. Since then removed this bool because it caused more issues than solved because the handover of the transaction ownership between the FE and BE should happen immediately. Now I'm using a member of FinalizeParams to detect whether we are in a transaction or not although it might not be the cleanest solution. http://gerrit.cloudera.org:8080/#/c/13559/10/be/src/service/client-request-state.h@308 PS10, Line 308: bool in_transaction_ = false; > I think this would best fit in DmlExecState. We'll have transactions for queries as well. http://gerrit.cloudera.org:8080/#/c/13559/10/be/src/service/client-request-state.cc File be/src/service/client-request-state.cc: http://gerrit.cloudera.org:8080/#/c/13559/10/be/src/service/client-request-state.cc@720 PS10, Line 720: DCHECK(exec_request().__isset.transaction_id); > I think we would prefer to abort the transaction earlier in the query lifec Thanks for the suggestion, moved it to FinalizeHdfsDml() http://gerrit.cloudera.org:8080/#/c/13559/10/be/src/service/client-request-state.cc@820 PS10, Line 820: //TODO: HMS is not updated at this point, only in UpdateCatalog(). But > Zoli explained this to me out-of-band. I think I agree that having the cata Added commit to catalog. Was also thinking about adding abort as well, but the error handling in CatalogOpExecutor.UpdateCatalog() was inconsistent (sometimes an exception is thrown, sometimes we set an error in the result object), and in some cases we still need to abort the transaction from the BE. http://gerrit.cloudera.org:8080/#/c/13559/10/be/src/service/frontend.h File be/src/service/frontend.h: http://gerrit.cloudera.org:8080/#/c/13559/10/be/src/service/frontend.h@170 PS10, Line 170: /// Commits transaction with the given transaction id. > Are there any invariants we should document for these methods? I guess we'r Removed it since now we commit through the Catalog. http://gerrit.cloudera.org:8080/#/c/13559/10/be/src/service/frontend.h@171 PS10, Line 171: long > I think we want to use int64_t, to match the thrift definition and our styl Done. I think I got used to using longs in Java code. http://gerrit.cloudera.org:8080/#/c/13559/10/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java File fe/src/main/java/org/apache/impala/analysis/InsertStmt.java: http://gerrit.cloudera.org:8080/#/c/13559/10/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java@179 PS10, Line 179: private Long writeId_; > This is just a nit, but it would be slightly more intuitive to me to use -1 Done http://gerrit.cloudera.org:8080/#/c/13559/10/testdata/workloads/functional-query/queries/QueryTest/acid-insert.test File testdata/workloads/functional-query/queries/QueryTest/acid-insert.test: http://gerrit.cloudera.org:8080/#/c/13559/10/testdata/workloads/functional-query/queries/QueryTest/acid-insert.test@3 PS10, Line 3: RESET insertonly_nopart_insert > We've generally preferred using unique_database for tables that are mutated I followed the pattern of TestInsertQueries.test_insert(). Also, this way we can easily test different file formats, whereas being able to create tables with "STORED AS $FILEFORMAT" clause requires a bit additional work for the .test files AFAICT. -- To view, visit http://gerrit.cloudera.org:8080/13559 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id6c36fa6902676f06b4e38730f737becfc7c06ad Gerrit-Change-Number: 13559 Gerrit-PatchSet: 10 Gerrit-Owner: Zoltan Borok-Nagy <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Reviewer: Todd Lipcon <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Wed, 03 Jul 2019 14:28:40 +0000 Gerrit-HasComments: Yes
