Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13559 )

Change subject: IMPALA-8636: Implement INSERT for insert-only ACID tables
......................................................................


Patch Set 10:

(13 comments)

http://gerrit.cloudera.org:8080/#/c/13559/10/be/src/exec/hdfs-table-sink.h
File be/src/exec/hdfs-table-sink.h:

http://gerrit.cloudera.org:8080/#/c/13559/10/be/src/exec/hdfs-table-sink.h@264
PS10, Line 264: long
> int64_t?
Done


http://gerrit.cloudera.org:8080/#/c/13559/10/be/src/exec/hdfs-table-sink.cc
File be/src/exec/hdfs-table-sink.cc:

http://gerrit.cloudera.org:8080/#/c/13559/10/be/src/exec/hdfs-table-sink.cc@500
PS10, Line 500:   // However, for transactional tables we should create a new 
empty base directory.
> Why? I assume there is some good reason but it's not immediately obvious to
Sorry, I left out some information. We only need to do that in case of INSERT 
OVERWRITEs with empty result sets. For insert-only ACID tables we don't delete 
old files, we just create a new base directory.

Extended the comment and modified the condition.


http://gerrit.cloudera.org:8080/#/c/13559/10/be/src/exec/hdfs-table-sink.cc@700
PS10, Line 700:   if (IsTransactional()) return true;
> This one seems more obvious to me, but makes me think that the class commen
Extended the class comment.


http://gerrit.cloudera.org:8080/#/c/13559/10/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/13559/10/be/src/runtime/coordinator.cc@571
PS10, Line 571: Status Coordinator::FinalizeHdfsInsert() {
> I think we should probably do the transaction abort in this function, since
Thanks, done.


http://gerrit.cloudera.org:8080/#/c/13559/10/be/src/runtime/coordinator.cc@571
PS10, Line 571: FinalizeHdfsInsert
> We should maybe rename this to FinalizeHdfsDml(), now or later.
Done


http://gerrit.cloudera.org:8080/#/c/13559/10/be/src/service/client-request-state.h
File be/src/service/client-request-state.h:

http://gerrit.cloudera.org:8080/#/c/13559/10/be/src/service/client-request-state.h@307
PS10, Line 307:   /// True if there is an open transaction.
> Hive ACID transaction, just to be clear.
Since then removed this bool because it caused more issues than solved because 
the handover of the transaction ownership between the FE and BE should happen 
immediately.

Now I'm using a member of FinalizeParams to detect whether we are in a 
transaction or not although it might not be the cleanest solution.


http://gerrit.cloudera.org:8080/#/c/13559/10/be/src/service/client-request-state.h@308
PS10, Line 308:   bool in_transaction_ = false;
> I think this would best fit in DmlExecState.
We'll have transactions for queries as well.


http://gerrit.cloudera.org:8080/#/c/13559/10/be/src/service/client-request-state.cc
File be/src/service/client-request-state.cc:

http://gerrit.cloudera.org:8080/#/c/13559/10/be/src/service/client-request-state.cc@720
PS10, Line 720:     DCHECK(exec_request().__isset.transaction_id);
> I think we would prefer to abort the transaction earlier in the query lifec
Thanks for the suggestion, moved it to FinalizeHdfsDml()


http://gerrit.cloudera.org:8080/#/c/13559/10/be/src/service/client-request-state.cc@820
PS10, Line 820:       //TODO: HMS is not updated at this point, only in 
UpdateCatalog(). But
> Zoli explained this to me out-of-band. I think I agree that having the cata
Added commit to catalog. Was also thinking about adding abort as well, but the 
error handling in CatalogOpExecutor.UpdateCatalog() was inconsistent (sometimes 
an exception is thrown, sometimes we set an error in the result object), and in 
some cases we still need to abort the transaction from the BE.


http://gerrit.cloudera.org:8080/#/c/13559/10/be/src/service/frontend.h
File be/src/service/frontend.h:

http://gerrit.cloudera.org:8080/#/c/13559/10/be/src/service/frontend.h@170
PS10, Line 170:   /// Commits transaction with the given transaction id.
> Are there any invariants we should document for these methods? I guess we'r
Removed it since now we commit through the Catalog.


http://gerrit.cloudera.org:8080/#/c/13559/10/be/src/service/frontend.h@171
PS10, Line 171: long
> I think we want to use int64_t, to match the thrift definition and our styl
Done. I think I got used to using longs in Java code.


http://gerrit.cloudera.org:8080/#/c/13559/10/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
File fe/src/main/java/org/apache/impala/analysis/InsertStmt.java:

http://gerrit.cloudera.org:8080/#/c/13559/10/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java@179
PS10, Line 179:   private Long writeId_;
> This is just a nit, but it would be slightly more intuitive to me to use -1
Done


http://gerrit.cloudera.org:8080/#/c/13559/10/testdata/workloads/functional-query/queries/QueryTest/acid-insert.test
File testdata/workloads/functional-query/queries/QueryTest/acid-insert.test:

http://gerrit.cloudera.org:8080/#/c/13559/10/testdata/workloads/functional-query/queries/QueryTest/acid-insert.test@3
PS10, Line 3: RESET insertonly_nopart_insert
> We've generally preferred using unique_database for tables that are mutated
I followed the pattern of TestInsertQueries.test_insert().

Also, this way we can easily test different file formats, whereas being able to 
create tables with "STORED AS $FILEFORMAT" clause requires a bit additional 
work for the .test files AFAICT.



--
To view, visit http://gerrit.cloudera.org:8080/13559
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id6c36fa6902676f06b4e38730f737becfc7c06ad
Gerrit-Change-Number: 13559
Gerrit-PatchSet: 10
Gerrit-Owner: Zoltan Borok-Nagy <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Reviewer: Todd Lipcon <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Wed, 03 Jul 2019 14:28:40 +0000
Gerrit-HasComments: Yes

Reply via email to