Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/9793 )
Change subject: IMPALA-5384, part 1: introduce InsertExecState ...................................................................... Patch Set 4: (13 comments) Looks like a big improvement. Main thing is that I think we can make the interface and usage of the new class clearer. http://gerrit.cloudera.org:8080/#/c/9793/4/be/src/runtime/coordinator.h File be/src/runtime/coordinator.h: http://gerrit.cloudera.org:8080/#/c/9793/4/be/src/runtime/coordinator.h@336 PS4, Line 336: /// Returns request's finalize params, or nullptr if not present. It seems worth documenting that nullptr means that finalization is not required. http://gerrit.cloudera.org:8080/#/c/9793/4/be/src/runtime/coordinator.cc File be/src/runtime/coordinator.cc: http://gerrit.cloudera.org:8080/#/c/9793/4/be/src/runtime/coordinator.cc@a492 PS4, Line 492: I reviewed the moved code by copying and pasting the moved parts into gvimdiff. http://gerrit.cloudera.org:8080/#/c/9793/4/be/src/runtime/coordinator.cc@497 PS4, Line 497: HdfsTableDescriptor* hdfs_table; This isn't a huge deal, but any reason we can't move the table descriptor creation into InsertExecState? http://gerrit.cloudera.org:8080/#/c/9793/4/be/src/runtime/insert-exec-state.h File be/src/runtime/insert-exec-state.h: http://gerrit.cloudera.org:8080/#/c/9793/4/be/src/runtime/insert-exec-state.h@28 PS4, Line 28: #include "common/hdfs.h" nit: hdfs goes before status http://gerrit.cloudera.org:8080/#/c/9793/4/be/src/runtime/insert-exec-state.h@41 PS4, Line 41: /// InsertExecState manages the state related to the execution of an Insert The lifecycle of this is a bit unclear, especially since we construct the object for non-insert queries too. Do we need to do any initialization before using it? What calls are valid before/after FinalizeInsert()? It probably just needs a sentence or two of explanation. http://gerrit.cloudera.org:8080/#/c/9793/4/be/src/runtime/insert-exec-state.h@47 PS4, Line 47: class InsertExecState { The class name and comment are misleading in that this is also used for update and upsert commands to Kudu. DmlExecState? http://gerrit.cloudera.org:8080/#/c/9793/4/be/src/runtime/insert-exec-state.h@53 PS4, Line 53: typedef std::map<std::string, TInsertPartitionStatus> PartitionStatusMap; It might be worth extending the comment here and for FileMoveMap to explain why we use an ordered map. I spent a couple of minutes trying to understand if it was deliberate or not. I concluded that it makes sense, since we iterate over the maps to do certain actions and this makes the iteration order totally deterministic. http://gerrit.cloudera.org:8080/#/c/9793/4/be/src/runtime/insert-exec-state.h@75 PS4, Line 75: void AddKuduPartition(const std::string& name, int64_t id); The Kudu Partition API is confusing - it only creates a dummy partition with name=g_ImpalaInternalService_constants.ROOT_PARTITION_KEY for bookkeeping. I don't think this actually maps to a real Kudu partition. Can we change the API to reflect this? E.g. InitForKuduDml() and SetKuduDmlStats(). Hbase has a similar usage pattern, but I don't think we necessarily need to add a special API for that. http://gerrit.cloudera.org:8080/#/c/9793/4/be/src/runtime/insert-exec-state.h@80 PS4, Line 80: num_modfied_rows typo: modified http://gerrit.cloudera.org:8080/#/c/9793/4/be/src/runtime/insert-exec-state.h@118 PS4, Line 118: PartitionStatusMap per_partition_status_; It looks like the contents of these maps is documented in the typedef above. Can we add a comment referring back to the typedefs? http://gerrit.cloudera.org:8080/#/c/9793/4/be/src/runtime/insert-exec-state.cc File be/src/runtime/insert-exec-state.cc: http://gerrit.cloudera.org:8080/#/c/9793/4/be/src/runtime/insert-exec-state.cc@121 PS4, Line 121: max_ts = std::max(max_ts, Not the end of the world, but I prefer max<uint64_t>(max_tx, entry.second.kudu_latest_observed_ts) http://gerrit.cloudera.org:8080/#/c/9793/4/be/src/runtime/insert-exec-state.cc@431 PS4, Line 431: const std::string& name, int64_t id, const std::string* base_dir) { unnecessary std:: prefixes in .cc here and below http://gerrit.cloudera.org:8080/#/c/9793/4/be/src/runtime/runtime-state.h File be/src/runtime/runtime-state.h: http://gerrit.cloudera.org:8080/#/c/9793/4/be/src/runtime/runtime-state.h@327 PS4, Line 327: /// Execution state for INSERT statements. Also for UPDATE and UPSERT, right? -- To view, visit http://gerrit.cloudera.org:8080/9793 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id4c025917620a7bff2acbeb46464f107ab4b7565 Gerrit-Change-Number: 9793 Gerrit-PatchSet: 4 Gerrit-Owner: Dan Hecht <[email protected]> Gerrit-Reviewer: Thomas Tauber-Marshall <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Comment-Date: Tue, 27 Mar 2018 17:28:42 +0000 Gerrit-HasComments: Yes
