Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12069 )
Change subject: IMPALA-7694: Add host resource usage metrics to profile ...................................................................... Patch Set 11: (12 comments) Still wrapping my head around this big change. Some initial comments for now. http://gerrit.cloudera.org:8080/#/c/12069/11/be/src/runtime/coordinator-backend-state.h File be/src/runtime/coordinator-backend-state.h: http://gerrit.cloudera.org:8080/#/c/12069/11/be/src/runtime/coordinator-backend-state.h@67 PS11, Line 67: RuntimeProfile* host_profiles nit: Please add a comment on the role of this parameter. http://gerrit.cloudera.org:8080/#/c/12069/11/be/src/runtime/coordinator-backend-state.h@249 PS11, Line 249: host_profile_ = nullptr; nit: A brief statement of what this contains. http://gerrit.cloudera.org:8080/#/c/12069/11/be/src/runtime/query-state.cc File be/src/runtime/query-state.cc: http://gerrit.cloudera.org:8080/#/c/12069/11/be/src/runtime/query-state.cc@296 PS11, Line 296: // Add profile to report : host_profile_->ToThrift(&profiles_forest->host_profile); : profiles_forest->__isset.host_profile = true; : // Free resources in chunked counters in the profile : host_profile_->ClearChunkedTimeSeriesCounters(); I wonder how this may reconcile with https://gerrit.cloudera.org/#/c/12049/ ? In particular, with the mentioned patch, on a failure to send the profile report, the query state thread will back off, and retry sending a new profile. So, the host profile is potentially another non-idempotent state added to the report. That said, how tolerant is the coordinator's side logic on handling missing chunks in this time series ? In other words, if we fail to send a report, how bad is it to let the host profile of this epoch to be dropped ? http://gerrit.cloudera.org:8080/#/c/12069/11/be/src/runtime/runtime-state.cc File be/src/runtime/runtime-state.cc: http://gerrit.cloudera.org:8080/#/c/12069/11/be/src/runtime/runtime-state.cc@76 PS11, Line 76: profile_(RuntimeProfile::Create(obj_pool(), "<fragment instance>")), Why this change ? http://gerrit.cloudera.org:8080/#/c/12069/11/be/src/util/runtime-profile-counters.h File be/src/util/runtime-profile-counters.h: http://gerrit.cloudera.org:8080/#/c/12069/11/be/src/util/runtime-profile-counters.h@443 PS11, Line 443: previous_sample_count_ = 0; This deserves a comment. http://gerrit.cloudera.org:8080/#/c/12069/11/be/src/util/runtime-profile.h File be/src/util/runtime-profile.h: http://gerrit.cloudera.org:8080/#/c/12069/11/be/src/util/runtime-profile.h@419 PS11, Line 419: Samples : /// are not re-sampled into larger intervals, instead owners of this profile can call : /// ClearChunkedTimeSeriesCounters() to reset the sample buffers of all chunked time : /// series counters So, misuse of this kind of counter (e.g. never calling ClearChunkedTimeSeriesCounters) may lead to large untracked memory usage, right ? http://gerrit.cloudera.org:8080/#/c/12069/11/be/src/util/runtime-profile.cc File be/src/util/runtime-profile.cc: http://gerrit.cloudera.org:8080/#/c/12069/11/be/src/util/runtime-profile.cc@1216 PS11, Line 1216: nit: blank space http://gerrit.cloudera.org:8080/#/c/12069/11/be/src/util/runtime-profile.cc@1231 PS11, Line 1231: nit: blank space http://gerrit.cloudera.org:8080/#/c/12069/11/be/src/util/runtime-profile.cc@1279 PS11, Line 1279: DCHECK( DCHECK_EQ http://gerrit.cloudera.org:8080/#/c/12069/11/be/src/util/system-state-info.h File be/src/util/system-state-info.h: http://gerrit.cloudera.org:8080/#/c/12069/11/be/src/util/system-state-info.h@66 PS11, Line 66: /// CaptureSystemStateSnapshot(). 'cpu_ratios_' is updated after returning from this function. http://gerrit.cloudera.org:8080/#/c/12069/11/be/src/util/system-state-info.h@69 PS11, Line 69: NUM_CPU_VALUES = 5; Seems more future proof to have this as the last value in the enum below like: enum PROC_STAT_CPU_VALUES { CPU_USER = 0, CPU_NICE, CPU_SYSTEM, CPU_IDLE, CPU_IOWAIT, NUM_CPU_VALUES }; http://gerrit.cloudera.org:8080/#/c/12069/11/be/src/util/system-state-info.cc File be/src/util/system-state-info.cc: http://gerrit.cloudera.org:8080/#/c/12069/11/be/src/util/system-state-info.cc@44 PS11, Line 44: int64_t SystemStateInfo::ParseInt64(const string& val) const { : StringParser::ParseResult result; : int64_t parsed = StringParser::StringToInt<int64_t>(val.c_str(), val.size(), &result); : if (result == StringParser::PARSE_SUCCESS) return parsed; : return -1; : } Wonder if this fits better as a utility function in StringParser ? -- To view, visit http://gerrit.cloudera.org:8080/12069 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3aedc20c553ab8d7ed50f72a1a936eba151487d9 Gerrit-Change-Number: 12069 Gerrit-PatchSet: 11 Gerrit-Owner: Lars Volker <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Lars Volker <[email protected]> Gerrit-Reviewer: Michael Ho <[email protected]> Gerrit-Reviewer: Philip Zeyliger <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Comment-Date: Thu, 24 Jan 2019 06:01:09 +0000 Gerrit-HasComments: Yes
