Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16057 )
Change subject: IMPALA-9382: part 2/3: aggregate profiles sent to coordinator ...................................................................... Patch Set 12: (9 comments) Address the code comments. Pushing out changes so I can rebase before tackling the tests. http://gerrit.cloudera.org:8080/#/c/16057/12/be/src/runtime/coordinator-backend-state.cc File be/src/runtime/coordinator-backend-state.cc: http://gerrit.cloudera.org:8080/#/c/16057/12/be/src/runtime/coordinator-backend-state.cc@423 PS12, Line 423: // 'thrift_profiles' and 'instance_exec_status' vectors have one-to-one correspondance. > This comment is now out of date Done http://gerrit.cloudera.org:8080/#/c/16057/12/be/src/runtime/query-state.cc File be/src/runtime/query-state.cc: http://gerrit.cloudera.org:8080/#/c/16057/12/be/src/runtime/query-state.cc@591 PS12, Line 591: LOG(INFO) << it->second->min_per_fragment_instance_idx(); > Nit: Leftover logging? Done http://gerrit.cloudera.org:8080/#/c/16057/12/be/src/util/runtime-profile-counters.h File be/src/util/runtime-profile-counters.h: http://gerrit.cloudera.org:8080/#/c/16057/12/be/src/util/runtime-profile-counters.h@405 PS12, Line 405: No locks are obtained within this method because UpdateCounter() is called from : /// Update() > Nit: Now that there is an Update() method on AveragedCounter, this comment Done http://gerrit.cloudera.org:8080/#/c/16057/12/be/src/util/runtime-profile.h File be/src/util/runtime-profile.h: http://gerrit.cloudera.org:8080/#/c/16057/12/be/src/util/runtime-profile.h@781 PS12, Line 781: void Update(const TRuntimeProfileTree& src, int start_idx); > Nit: On first glance at code, it is easy to get confused between these two Done http://gerrit.cloudera.org:8080/#/c/16057/12/be/src/util/runtime-profile.cc File be/src/util/runtime-profile.cc: http://gerrit.cloudera.org:8080/#/c/16057/12/be/src/util/runtime-profile.cc@177 PS12, Line 177: // Search forward until the insert position is either at the end of the vector : // or after this child. This preserves the order if the relative order of : // children in all updates is consistent. : bool found_child = false; > Are there cases that actually hit the children_.end() without finding the c Added a comment to explain why it was necessary - IMPALA-6694. http://gerrit.cloudera.org:8080/#/c/16057/12/be/src/util/runtime-profile.cc@394 PS12, Line 394: { : lock_guard<SpinLock> l(agg_info_strings_lock_); : lock_guard<SpinLock> m(other->info_strings_lock_); : for (const auto& entry : other->info_strings_) { : vector<string>& values = agg_info_strings_[entry.first]; : if (values.empty()) values.resize(num_input_profiles_); : if (values[idx] != entry.second) values[idx] = entry.second; : } : } > A pattern in this file is having functions that have a lot of these subbloc THanks, this is a good suggestion and made it clearer I think. http://gerrit.cloudera.org:8080/#/c/16057/12/be/src/util/runtime-profile.cc@633 PS12, Line 633: // Decode label and assign a new index based on the unique labels in this : // profile. : int32_t src_label_idx = tseq.label_idxs[i][j]; : DCHECK_GE(src_label_idx, 0); : DCHECK_LT(src_label_idx, tseq.unique_labels.size()); : const string& label = tseq.unique_labels[src_label_idx]; : auto it = seq.labels.find(label); : int32_t label_idx; : if (it == seq.labels.end()) { : label_idx = seq.labels.size(); : seq.labels.emplace(label, label_idx); : } else { : label_idx = it->second; : } > As noted in the thrift, the individual steps here make sense to be describe Done http://gerrit.cloudera.org:8080/#/c/16057/12/be/src/util/runtime-profile.cc@609 PS12, Line 609: { : // Merge event sequence. : lock_guard<SpinLock> l(event_sequence_lock_); : for (const TAggEventSequence& tseq : agg_node.event_sequences) { : DCHECK_GT(num_input_profiles_, 0); : AggEventSequence& seq = event_sequence_map_[tseq.name]; : if (seq.label_idxs.empty()) { : seq.label_idxs.resize(num_input_profiles_); : seq.timestamps.resize(num_input_profiles_); : } else { : DCHECK_EQ(num_input_profiles_, seq.label_idxs.size()); : DCHECK_EQ(num_input_profiles_, seq.timestamps.size()); : } : : DCHECK_LE(start_idx + tseq.timestamps.size(), num_input_profiles_); : DCHECK_EQ(tseq.timestamps.size(), tseq.label_idxs.size()); : for (int i = 0; i < tseq.timestamps.size(); ++i) { : int idx = start_idx + i; : std::vector<int32_t>& label_idxs = seq.label_idxs[idx]; : std::vector<int64_t>& timestamps = seq.timestamps[idx]; : label_idxs.clear(); : timestamps.clear(); : DCHECK_EQ(tseq.timestamps[i].size(), tseq.label_idxs[i].size()); : for (int j = 0; j < tseq.timestamps[i].size(); ++j) { : // Decode label and assign a new index based on the unique labels in this : // profile. : int32_t src_label_idx = tseq.label_idxs[i][j]; : DCHECK_GE(src_label_idx, 0); : DCHECK_LT(src_label_idx, tseq.unique_labels.size()); : const string& label = tseq.unique_labels[src_label_idx]; : auto it = seq.labels.find(label); : int32_t label_idx; : if (it == seq.labels.end()) { : label_idx = seq.labels.size(); : seq.labels.emplace(label, label_idx); : } else { : label_idx = it->second; : } : label_idxs.push_back(label_idx); : timestamps.push_back(tseq.timestamps[i][j]); : } : } : } : } > As noted in AggregatedRuntimeProfile::UpdateRecursive(), I think each of th Done http://gerrit.cloudera.org:8080/#/c/16057/12/common/thrift/RuntimeProfile.thrift File common/thrift/RuntimeProfile.thrift: http://gerrit.cloudera.org:8080/#/c/16057/12/common/thrift/RuntimeProfile.thrift@80 PS12, Line 80: // One entry per unique label. : 2: required list<string> unique_labels > This is using dictionary encoding, so we could use name our variables using Done -- To view, visit http://gerrit.cloudera.org:8080/16057 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic680cbfe94c939c2a8fad9d0943034ed058c6bca Gerrit-Change-Number: 16057 Gerrit-PatchSet: 12 Gerrit-Owner: Tim Armstrong <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Joe McDonnell <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Comment-Date: Fri, 13 Nov 2020 04:59:29 +0000 Gerrit-HasComments: Yes
