Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9793 )

Change subject: IMPALA-5384, part 1: introduce InsertExecState
......................................................................


Patch Set 4:

(13 comments)

Looks like a big improvement. Main thing is that I think we can make the 
interface and usage of the new class clearer.

http://gerrit.cloudera.org:8080/#/c/9793/4/be/src/runtime/coordinator.h
File be/src/runtime/coordinator.h:

http://gerrit.cloudera.org:8080/#/c/9793/4/be/src/runtime/coordinator.h@336
PS4, Line 336:   /// Returns request's finalize params, or nullptr if not 
present.
It seems worth documenting that nullptr means that finalization is not required.


http://gerrit.cloudera.org:8080/#/c/9793/4/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/9793/4/be/src/runtime/coordinator.cc@a492
PS4, Line 492:
I reviewed the moved code by copying and pasting the moved parts into gvimdiff.


http://gerrit.cloudera.org:8080/#/c/9793/4/be/src/runtime/coordinator.cc@497
PS4, Line 497:     HdfsTableDescriptor* hdfs_table;
This isn't a huge deal, but any reason we can't move the table descriptor 
creation into InsertExecState?


http://gerrit.cloudera.org:8080/#/c/9793/4/be/src/runtime/insert-exec-state.h
File be/src/runtime/insert-exec-state.h:

http://gerrit.cloudera.org:8080/#/c/9793/4/be/src/runtime/insert-exec-state.h@28
PS4, Line 28: #include "common/hdfs.h"
nit: hdfs goes before status


http://gerrit.cloudera.org:8080/#/c/9793/4/be/src/runtime/insert-exec-state.h@41
PS4, Line 41: /// InsertExecState manages the state related to the execution of 
an Insert
The lifecycle of this is a bit unclear, especially since we construct the 
object for non-insert queries too. Do we need to do any initialization before 
using it? What calls are valid before/after FinalizeInsert()?

It probably just needs a sentence or two of explanation.


http://gerrit.cloudera.org:8080/#/c/9793/4/be/src/runtime/insert-exec-state.h@47
PS4, Line 47: class InsertExecState {
The class name and comment are misleading in that this is also used for update 
and upsert commands to Kudu. DmlExecState?


http://gerrit.cloudera.org:8080/#/c/9793/4/be/src/runtime/insert-exec-state.h@53
PS4, Line 53:   typedef std::map<std::string, TInsertPartitionStatus> 
PartitionStatusMap;
It might be worth extending the comment here and for FileMoveMap to explain why 
we use an ordered map. I spent a couple of minutes trying to understand if it 
was deliberate or not. I concluded that it makes sense, since we iterate over 
the maps to do certain actions and this makes the iteration order totally 
deterministic.


http://gerrit.cloudera.org:8080/#/c/9793/4/be/src/runtime/insert-exec-state.h@75
PS4, Line 75:   void AddKuduPartition(const std::string& name, int64_t id);
The Kudu Partition API is confusing - it only creates a dummy partition with 
name=g_ImpalaInternalService_constants.ROOT_PARTITION_KEY for bookkeeping. I 
don't think this actually maps to a real Kudu partition. Can we change the API 
to reflect this?

E.g. InitForKuduDml() and SetKuduDmlStats().

Hbase has a similar usage pattern, but I don't think we necessarily need to add 
a special API for that.


http://gerrit.cloudera.org:8080/#/c/9793/4/be/src/runtime/insert-exec-state.h@80
PS4, Line 80: num_modfied_rows
typo: modified


http://gerrit.cloudera.org:8080/#/c/9793/4/be/src/runtime/insert-exec-state.h@118
PS4, Line 118:   PartitionStatusMap per_partition_status_;
It looks like the contents of these maps is documented in the typedef above. 
Can we add a comment referring back to the typedefs?


http://gerrit.cloudera.org:8080/#/c/9793/4/be/src/runtime/insert-exec-state.cc
File be/src/runtime/insert-exec-state.cc:

http://gerrit.cloudera.org:8080/#/c/9793/4/be/src/runtime/insert-exec-state.cc@121
PS4, Line 121:     max_ts = std::max(max_ts,
Not the end of the world, but I prefer max<uint64_t>(max_tx, 
entry.second.kudu_latest_observed_ts)


http://gerrit.cloudera.org:8080/#/c/9793/4/be/src/runtime/insert-exec-state.cc@431
PS4, Line 431:     const std::string& name, int64_t id, const std::string* 
base_dir) {
unnecessary std:: prefixes in .cc here and below


http://gerrit.cloudera.org:8080/#/c/9793/4/be/src/runtime/runtime-state.h
File be/src/runtime/runtime-state.h:

http://gerrit.cloudera.org:8080/#/c/9793/4/be/src/runtime/runtime-state.h@327
PS4, Line 327:   /// Execution state for INSERT statements.
Also for UPDATE and UPSERT, right?



--
To view, visit http://gerrit.cloudera.org:8080/9793
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id4c025917620a7bff2acbeb46464f107ab4b7565
Gerrit-Change-Number: 9793
Gerrit-PatchSet: 4
Gerrit-Owner: Dan Hecht <[email protected]>
Gerrit-Reviewer: Thomas Tauber-Marshall <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Comment-Date: Tue, 27 Mar 2018 17:28:42 +0000
Gerrit-HasComments: Yes

Reply via email to