Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/13559 )
Change subject: IMPALA-8636: Implement INSERT for insert-only ACID tables ...................................................................... Patch Set 6: (13 comments) http://gerrit.cloudera.org:8080/#/c/13559/5//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/13559/5//COMMIT_MSG@36 PS5, Line 36: TODO in following commits: > It is not clear to me whether dynamic partitioning is supported or not. Yeah, it is supported. Added tests about it. http://gerrit.cloudera.org:8080/#/c/13559/5/be/src/exec/hdfs-table-sink.cc File be/src/exec/hdfs-table-sink.cc: http://gerrit.cloudera.org:8080/#/c/13559/5/be/src/exec/hdfs-table-sink.cc@214 PS5, Line 214: // Create final_hdfs_file_name_prefix and tmp_hdfs_file_name_prefix. > Can you mention the transactional naming logic in the comment? Added comment. http://gerrit.cloudera.org:8080/#/c/13559/5/be/src/exec/hdfs-table-sink.cc@696 PS5, Line 696: closed_ = true; : } : > nit: can you improve readability? e.g if (IsTransactional()) return true; c Done http://gerrit.cloudera.org:8080/#/c/13559/5/be/src/service/client-request-state.cc File be/src/service/client-request-state.cc: http://gerrit.cloudera.org:8080/#/c/13559/5/be/src/service/client-request-state.cc@817 PS5, Line 817: RETURN_IF_ERROR(GetCoordinator()->Wait()); : if (exec_request().__isset.transaction_id) { : RETURN_IF_ERROR(frontend_->CommitTransaction(exec_request().transaction_id)); : in_transaction_ = false; : } > I am not sure here, but the order of UpdateCatalog() and CommitTransaction( UpdateCatalog() won't load the new write ids and won't see the new files until they are not committed. The problematic scenario is when we commit the transaction successfully but UpdateCatalog() fails to update HMS. http://gerrit.cloudera.org:8080/#/c/13559/5/fe/src/main/java/org/apache/impala/service/Frontend.java File fe/src/main/java/org/apache/impala/service/Frontend.java: http://gerrit.cloudera.org:8080/#/c/13559/5/fe/src/main/java/org/apache/impala/service/Frontend.java@1259 PS5, Line 1259: try { > I think it would be more readable if the logic inside the try block would b This 'doCreateExecRequest()' method also exist for the same reason. I'm not sure about the usefulness of adding another method. http://gerrit.cloudera.org:8080/#/c/13559/5/fe/src/main/java/org/apache/impala/service/Frontend.java@1370 PS5, Line 1370: Exception e) { > Should this only catch ImpalaException? e.g. if there is a null pointer exc Done http://gerrit.cloudera.org:8080/#/c/13559/5/fe/src/main/java/org/apache/impala/service/Frontend.java@1372 PS5, Line 1372: try { > This can also throw an exception, for example if HMS was shut down, and thi For the general case maybe a composite exception type is the solution. Here I think it's best to handle (log) the transaction exception and throw the original one coming from analysis/planning. http://gerrit.cloudera.org:8080/#/c/13559/5/testdata/datasets/functional/functional_schema_template.sql File testdata/datasets/functional/functional_schema_template.sql: http://gerrit.cloudera.org:8080/#/c/13559/5/testdata/datasets/functional/functional_schema_template.sql@2611 PS5, Line 2611: ==== > nit: extra ; Done http://gerrit.cloudera.org:8080/#/c/13559/5/testdata/workloads/functional-query/queries/QueryTest/acid-insert.test File testdata/workloads/functional-query/queries/QueryTest/acid-insert.test: http://gerrit.cloudera.org:8080/#/c/13559/5/testdata/workloads/functional-query/queries/QueryTest/acid-insert.test@7 PS5, Line 7: ---- QUERY > It would be great to also read the tables back with HMS. I have some interop tests at test_hms_integration.py http://gerrit.cloudera.org:8080/#/c/13559/5/tests/metadata/test_hms_integration.py File tests/metadata/test_hms_integration.py: http://gerrit.cloudera.org:8080/#/c/13559/5/tests/metadata/test_hms_integration.py@662 PS5, Line 662: acid > As this test runs in a unique_database, I don't think that it needs to run Thanks for the tip! http://gerrit.cloudera.org:8080/#/c/13559/5/tests/query_test/test_insert.py File tests/query_test/test_insert.py: http://gerrit.cloudera.org:8080/#/c/13559/5/tests/query_test/test_insert.py@142 PS5, Line 142: def test_acid_insert(self, vector): > What is the reason for not using a unique database? The tables being used are tied to these tests only. They don't even contain any data initially. I followed the pattern of test_insert(). http://gerrit.cloudera.org:8080/#/c/13559/5/tests/query_test/test_insert.py@146 PS5, Line 146: # We need to turn off capab > @SkipIfHive2.acid already ensures that HIVE_MAJOR_VERSION >= 3. Done http://gerrit.cloudera.org:8080/#/c/13559/5/tests/query_test/test_insert.py@147 PS5, Line 147: # this python client doesn't have the capability for handling ACID tables. But we only : # need to drop and create such tables, and table properties are preserved during : # those operations and this is enough for the tests (A table is ACID if it has the : # relevant table properties). : capability_check = self.hive_client.getMetaConf("metastore.client.capability > I do not understand why this is needed. Updated the comment. -- To view, visit http://gerrit.cloudera.org:8080/13559 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id6c36fa6902676f06b4e38730f737becfc7c06ad Gerrit-Change-Number: 13559 Gerrit-PatchSet: 6 Gerrit-Owner: Zoltan Borok-Nagy <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Thu, 20 Jun 2019 14:45:58 +0000 Gerrit-HasComments: Yes
