Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16057 )

Change subject: IMPALA-9382: part 2/3: aggregate profiles sent to coordinator
......................................................................


Patch Set 12:

(9 comments)

Address the code comments. Pushing out changes so I can rebase before tackling 
the tests.

http://gerrit.cloudera.org:8080/#/c/16057/12/be/src/runtime/coordinator-backend-state.cc
File be/src/runtime/coordinator-backend-state.cc:

http://gerrit.cloudera.org:8080/#/c/16057/12/be/src/runtime/coordinator-backend-state.cc@423
PS12, Line 423:   // 'thrift_profiles' and 'instance_exec_status' vectors have 
one-to-one correspondance.
> This comment is now out of date
Done


http://gerrit.cloudera.org:8080/#/c/16057/12/be/src/runtime/query-state.cc
File be/src/runtime/query-state.cc:

http://gerrit.cloudera.org:8080/#/c/16057/12/be/src/runtime/query-state.cc@591
PS12, Line 591:       LOG(INFO) << it->second->min_per_fragment_instance_idx();
> Nit: Leftover logging?
Done


http://gerrit.cloudera.org:8080/#/c/16057/12/be/src/util/runtime-profile-counters.h
File be/src/util/runtime-profile-counters.h:

http://gerrit.cloudera.org:8080/#/c/16057/12/be/src/util/runtime-profile-counters.h@405
PS12, Line 405: No locks are obtained within this method because 
UpdateCounter() is called from
              :   /// Update()
> Nit: Now that there is an Update() method on AveragedCounter, this comment
Done


http://gerrit.cloudera.org:8080/#/c/16057/12/be/src/util/runtime-profile.h
File be/src/util/runtime-profile.h:

http://gerrit.cloudera.org:8080/#/c/16057/12/be/src/util/runtime-profile.h@781
PS12, Line 781:   void Update(const TRuntimeProfileTree& src, int start_idx);
> Nit: On first glance at code, it is easy to get confused between these two
Done


http://gerrit.cloudera.org:8080/#/c/16057/12/be/src/util/runtime-profile.cc
File be/src/util/runtime-profile.cc:

http://gerrit.cloudera.org:8080/#/c/16057/12/be/src/util/runtime-profile.cc@177
PS12, Line 177:     // Search forward until the insert position is either at 
the end of the vector
              :     // or after this child. This preserves the order if the 
relative order of
              :     // children in all updates is consistent.
              :     bool found_child = false;
> Are there cases that actually hit the children_.end() without finding the c
Added a comment to explain why it was necessary - IMPALA-6694.


http://gerrit.cloudera.org:8080/#/c/16057/12/be/src/util/runtime-profile.cc@394
PS12, Line 394:   {
              :     lock_guard<SpinLock> l(agg_info_strings_lock_);
              :     lock_guard<SpinLock> m(other->info_strings_lock_);
              :     for (const auto& entry : other->info_strings_) {
              :       vector<string>& values = agg_info_strings_[entry.first];
              :       if (values.empty()) values.resize(num_input_profiles_);
              :       if (values[idx] != entry.second) values[idx] = 
entry.second;
              :     }
              :   }
> A pattern in this file is having functions that have a lot of these subbloc
THanks, this is a good suggestion and made it clearer I think.


http://gerrit.cloudera.org:8080/#/c/16057/12/be/src/util/runtime-profile.cc@633
PS12, Line 633:           // Decode label and assign a new index based on the 
unique labels in this
              :           // profile.
              :           int32_t src_label_idx = tseq.label_idxs[i][j];
              :           DCHECK_GE(src_label_idx, 0);
              :           DCHECK_LT(src_label_idx, tseq.unique_labels.size());
              :           const string& label = 
tseq.unique_labels[src_label_idx];
              :           auto it = seq.labels.find(label);
              :           int32_t label_idx;
              :           if (it == seq.labels.end()) {
              :             label_idx = seq.labels.size();
              :             seq.labels.emplace(label, label_idx);
              :           } else {
              :             label_idx = it->second;
              :           }
> As noted in the thrift, the individual steps here make sense to be describe
Done


http://gerrit.cloudera.org:8080/#/c/16057/12/be/src/util/runtime-profile.cc@609
PS12, Line 609:   {
              :     // Merge event sequence.
              :     lock_guard<SpinLock> l(event_sequence_lock_);
              :     for (const TAggEventSequence& tseq : 
agg_node.event_sequences) {
              :       DCHECK_GT(num_input_profiles_, 0);
              :       AggEventSequence& seq = event_sequence_map_[tseq.name];
              :       if (seq.label_idxs.empty()) {
              :         seq.label_idxs.resize(num_input_profiles_);
              :         seq.timestamps.resize(num_input_profiles_);
              :       } else {
              :         DCHECK_EQ(num_input_profiles_, seq.label_idxs.size());
              :         DCHECK_EQ(num_input_profiles_, seq.timestamps.size());
              :       }
              :
              :       DCHECK_LE(start_idx + tseq.timestamps.size(), 
num_input_profiles_);
              :       DCHECK_EQ(tseq.timestamps.size(), tseq.label_idxs.size());
              :       for (int i = 0; i < tseq.timestamps.size(); ++i) {
              :         int idx = start_idx + i;
              :         std::vector<int32_t>& label_idxs = seq.label_idxs[idx];
              :         std::vector<int64_t>& timestamps = seq.timestamps[idx];
              :         label_idxs.clear();
              :         timestamps.clear();
              :         DCHECK_EQ(tseq.timestamps[i].size(), 
tseq.label_idxs[i].size());
              :         for (int j = 0; j < tseq.timestamps[i].size(); ++j) {
              :           // Decode label and assign a new index based on the 
unique labels in this
              :           // profile.
              :           int32_t src_label_idx = tseq.label_idxs[i][j];
              :           DCHECK_GE(src_label_idx, 0);
              :           DCHECK_LT(src_label_idx, tseq.unique_labels.size());
              :           const string& label = 
tseq.unique_labels[src_label_idx];
              :           auto it = seq.labels.find(label);
              :           int32_t label_idx;
              :           if (it == seq.labels.end()) {
              :             label_idx = seq.labels.size();
              :             seq.labels.emplace(label, label_idx);
              :           } else {
              :             label_idx = it->second;
              :           }
              :           label_idxs.push_back(label_idx);
              :           timestamps.push_back(tseq.timestamps[i][j]);
              :         }
              :       }
              :     }
              :   }
> As noted in AggregatedRuntimeProfile::UpdateRecursive(), I think each of th
Done


http://gerrit.cloudera.org:8080/#/c/16057/12/common/thrift/RuntimeProfile.thrift
File common/thrift/RuntimeProfile.thrift:

http://gerrit.cloudera.org:8080/#/c/16057/12/common/thrift/RuntimeProfile.thrift@80
PS12, Line 80:   // One entry per unique label.
             :   2: required list<string> unique_labels
> This is using dictionary encoding, so we could use name our variables using
Done



--
To view, visit http://gerrit.cloudera.org:8080/16057
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic680cbfe94c939c2a8fad9d0943034ed058c6bca
Gerrit-Change-Number: 16057
Gerrit-PatchSet: 12
Gerrit-Owner: Tim Armstrong <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Comment-Date: Fri, 13 Nov 2020 04:59:29 +0000
Gerrit-HasComments: Yes

Reply via email to