Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13559 )

Change subject: IMPALA-8636: Implement INSERT for insert-only ACID tables
......................................................................


Patch Set 6:

(13 comments)

http://gerrit.cloudera.org:8080/#/c/13559/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/13559/5//COMMIT_MSG@36
PS5, Line 36: TODO in following commits:
> It is not clear to me whether dynamic partitioning is supported or not.
Yeah, it is supported. Added tests about it.


http://gerrit.cloudera.org:8080/#/c/13559/5/be/src/exec/hdfs-table-sink.cc
File be/src/exec/hdfs-table-sink.cc:

http://gerrit.cloudera.org:8080/#/c/13559/5/be/src/exec/hdfs-table-sink.cc@214
PS5, Line 214:   // Create final_hdfs_file_name_prefix and 
tmp_hdfs_file_name_prefix.
> Can you mention the transactional naming logic in the comment?
Added comment.


http://gerrit.cloudera.org:8080/#/c/13559/5/be/src/exec/hdfs-table-sink.cc@696
PS5, Line 696:   closed_ = true;
             : }
             :
> nit: can you improve readability? e.g if (IsTransactional()) return true; c
Done


http://gerrit.cloudera.org:8080/#/c/13559/5/be/src/service/client-request-state.cc
File be/src/service/client-request-state.cc:

http://gerrit.cloudera.org:8080/#/c/13559/5/be/src/service/client-request-state.cc@817
PS5, Line 817:     RETURN_IF_ERROR(GetCoordinator()->Wait());
             :     if (exec_request().__isset.transaction_id) {
             :       
RETURN_IF_ERROR(frontend_->CommitTransaction(exec_request().transaction_id));
             :       in_transaction_ = false;
             :     }
> I am not sure here, but the order of UpdateCatalog() and CommitTransaction(
UpdateCatalog() won't load the new write ids and won't see the new files until 
they are not committed.

The problematic scenario is when we commit the transaction successfully but 
UpdateCatalog() fails to update HMS.


http://gerrit.cloudera.org:8080/#/c/13559/5/fe/src/main/java/org/apache/impala/service/Frontend.java
File fe/src/main/java/org/apache/impala/service/Frontend.java:

http://gerrit.cloudera.org:8080/#/c/13559/5/fe/src/main/java/org/apache/impala/service/Frontend.java@1259
PS5, Line 1259:     try {
> I think it would be more readable if the logic inside the try block would b
This 'doCreateExecRequest()' method also exist for the same reason. I'm not 
sure about the usefulness of adding another method.


http://gerrit.cloudera.org:8080/#/c/13559/5/fe/src/main/java/org/apache/impala/service/Frontend.java@1370
PS5, Line 1370: Exception e) {
> Should this only catch ImpalaException? e.g. if there is a null pointer exc
Done


http://gerrit.cloudera.org:8080/#/c/13559/5/fe/src/main/java/org/apache/impala/service/Frontend.java@1372
PS5, Line 1372: try {
> This can also throw an exception, for example if HMS was shut down, and thi
For the general case maybe a composite exception type is the solution.

Here I think it's best to handle (log) the transaction exception and throw the 
original one coming from analysis/planning.


http://gerrit.cloudera.org:8080/#/c/13559/5/testdata/datasets/functional/functional_schema_template.sql
File testdata/datasets/functional/functional_schema_template.sql:

http://gerrit.cloudera.org:8080/#/c/13559/5/testdata/datasets/functional/functional_schema_template.sql@2611
PS5, Line 2611: ====
> nit: extra ;
Done


http://gerrit.cloudera.org:8080/#/c/13559/5/testdata/workloads/functional-query/queries/QueryTest/acid-insert.test
File testdata/workloads/functional-query/queries/QueryTest/acid-insert.test:

http://gerrit.cloudera.org:8080/#/c/13559/5/testdata/workloads/functional-query/queries/QueryTest/acid-insert.test@7
PS5, Line 7: ---- QUERY
> It would be great to also read the tables back with HMS.
I have some interop tests at test_hms_integration.py


http://gerrit.cloudera.org:8080/#/c/13559/5/tests/metadata/test_hms_integration.py
File tests/metadata/test_hms_integration.py:

http://gerrit.cloudera.org:8080/#/c/13559/5/tests/metadata/test_hms_integration.py@662
PS5, Line 662: acid
> As this test runs in a unique_database, I don't think that it needs to run
Thanks for the tip!


http://gerrit.cloudera.org:8080/#/c/13559/5/tests/query_test/test_insert.py
File tests/query_test/test_insert.py:

http://gerrit.cloudera.org:8080/#/c/13559/5/tests/query_test/test_insert.py@142
PS5, Line 142:   def test_acid_insert(self, vector):
> What is the reason for not using a unique database?
The tables being used are tied to these tests only. They don't even contain any 
data initially. I followed the pattern of test_insert().


http://gerrit.cloudera.org:8080/#/c/13559/5/tests/query_test/test_insert.py@146
PS5, Line 146:     # We need to turn off capab
> @SkipIfHive2.acid already ensures that HIVE_MAJOR_VERSION >= 3.
Done


http://gerrit.cloudera.org:8080/#/c/13559/5/tests/query_test/test_insert.py@147
PS5, Line 147:     # this python client doesn't have the capability for 
handling ACID tables. But we only
             :     # need to drop and create such tables, and table properties 
are preserved during
             :     # those operations and this is enough for the tests (A table 
is ACID if it has the
             :     # relevant table properties).
             :     capability_check = 
self.hive_client.getMetaConf("metastore.client.capability
> I do not understand why this is needed.
Updated the comment.



--
To view, visit http://gerrit.cloudera.org:8080/13559
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id6c36fa6902676f06b4e38730f737becfc7c06ad
Gerrit-Change-Number: 13559
Gerrit-PatchSet: 6
Gerrit-Owner: Zoltan Borok-Nagy <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Thu, 20 Jun 2019 14:45:58 +0000
Gerrit-HasComments: Yes

Reply via email to