Lars Volker has posted comments on this change. ( http://gerrit.cloudera.org:8080/12069 )
Change subject: IMPALA-7694: Add host resource usage metrics to profile ...................................................................... Patch Set 12: (20 comments) http://gerrit.cloudera.org:8080/#/c/12069/11/be/src/runtime/coordinator-backend-state.h File be/src/runtime/coordinator-backend-state.h: http://gerrit.cloudera.org:8080/#/c/12069/11/be/src/runtime/coordinator-backend-state.h@67 PS11, Line 67: > nit: Please add a comment on the role of this parameter. Done http://gerrit.cloudera.org:8080/#/c/12069/11/be/src/runtime/coordinator-backend-state.h@67 PS11, Line 67: > host_profile_parent? It sounds like its an array otherwise Done http://gerrit.cloudera.org:8080/#/c/12069/11/be/src/runtime/coordinator-backend-state.h@249 PS11, Line 249: nters for the backend ho > nit: A brief statement of what this contains. Done http://gerrit.cloudera.org:8080/#/c/12069/11/be/src/runtime/query-state.cc File be/src/runtime/query-state.cc: http://gerrit.cloudera.org:8080/#/c/12069/11/be/src/runtime/query-state.cc@296 PS11, Line 296: // Add profile to report : host_profile_->ToThrift(&profiles_forest->host_profile); : profiles_forest->__isset.host_profile = true; : // Free resources in chunked counters in the profile : host_profile_->ClearChunkedTimeSeriesCounters(); > I wonder how this may reconcile with https://gerrit.cloudera.org/#/c/12049/ We drop duplicate values and fill up gaps with 0 (see runtime-profile.cc:1281). To make this more robust, we can clear the counters after sending the profile successfully, so that the values would be included in a retry of an unsuccessful send. This could be done in FragmentInstanceState::ReportSuccessful() in https://gerrit.cloudera.org/#/c/12049/3/be/src/runtime/fragment-instance-state.cc@271 . Do you think that scenario is likely enough to warrant such a separation of getting the values and then "committing" the send once it was successful? http://gerrit.cloudera.org:8080/#/c/12069/11/be/src/runtime/runtime-state.cc File be/src/runtime/runtime-state.cc: http://gerrit.cloudera.org:8080/#/c/12069/11/be/src/runtime/runtime-state.cc@76 PS11, Line 76: profile_(RuntimeProfile::Create(obj_pool(), "<fragment instance>")), > Why this change ? To make it more consistent. The name is ignored since the profile itself is never added as a child to another profile. Instead the profile serves as a container to transport counters to the coordinator where they are merged into the query profile. In other places of the code we use this notation so I adopted it here. Would you like me to add a comment to explain this? http://gerrit.cloudera.org:8080/#/c/12069/11/be/src/service/impala-server.cc File be/src/service/impala-server.cc: http://gerrit.cloudera.org:8080/#/c/12069/11/be/src/service/impala-server.cc@1021 PS11, Line 1021: query_ctx->query_id = UuidToQueryId(random_generator()()); > This usage pattern of the RNG isn't quite right - we're constructing a new I had chosen to create a new one for the same reason as the comment above points out, that it might be cheaper than locking. However, I had not thought of it using the same seed value, which the documentation says it does. I switched it to using Kudu's ThreadSafeRandom instead. http://gerrit.cloudera.org:8080/#/c/12069/11/be/src/util/runtime-profile-counters.h File be/src/util/runtime-profile-counters.h: http://gerrit.cloudera.org:8080/#/c/12069/11/be/src/util/runtime-profile-counters.h@443 PS11, Line 443: number of samples that have > This deserves a comment. Done http://gerrit.cloudera.org:8080/#/c/12069/11/be/src/util/runtime-profile.h File be/src/util/runtime-profile.h: http://gerrit.cloudera.org:8080/#/c/12069/11/be/src/util/runtime-profile.h@419 PS11, Line 419: Samples : /// are not re-sampled into larger intervals, instead owners of this profile can call : /// ClearChunkedTimeSeriesCounters() to reset the sample buffers of all chunked time : /// series counters > So, misuse of this kind of counter (e.g. never calling ClearChunkedTimeSeri Yes. If you think that's a problem, we can cap the number of samples at some sufficiently large value, e.g. 10 * status_report_interval_ms / periodic_counter_update_period_ms ? http://gerrit.cloudera.org:8080/#/c/12069/11/be/src/util/runtime-profile.cc File be/src/util/runtime-profile.cc: http://gerrit.cloudera.org:8080/#/c/12069/11/be/src/util/runtime-profile.cc@1216 PS11, Line 1216: > nit: blank space Done http://gerrit.cloudera.org:8080/#/c/12069/11/be/src/util/runtime-profile.cc@1231 PS11, Line 1231: > nit: blank space Done http://gerrit.cloudera.org:8080/#/c/12069/11/be/src/util/runtime-profile.cc@1279 PS11, Line 1279: DCHECK_ > DCHECK_EQ Done http://gerrit.cloudera.org:8080/#/c/12069/11/be/src/util/system-state-info.h File be/src/util/system-state-info.h: http://gerrit.cloudera.org:8080/#/c/12069/11/be/src/util/system-state-info.h@66 PS11, Line 66: /// CaptureSystemStateSnapshot() and stores the result in 'cpu_ratios_'. > 'cpu_ratios_' is updated after returning from this function. Done http://gerrit.cloudera.org:8080/#/c/12069/11/be/src/util/system-state-info.h@69 PS11, Line 69: U_VALUES { > Seems more future proof to have this as the last value in the enum below li Done http://gerrit.cloudera.org:8080/#/c/12069/11/be/src/util/system-state-info.cc File be/src/util/system-state-info.cc: http://gerrit.cloudera.org:8080/#/c/12069/11/be/src/util/system-state-info.cc@44 PS11, Line 44: : int64_t SystemStateInfo::ParseInt64(const string& val) const { : StringParser::ParseResult result; : int64_t parsed = StringParser::StringToInt<int64_t>(val.c_str(), val.size(), &result); : if (result == StringParser::PARSE_SUCCESS) return parsed; : > Wonder if this fits better as a utility function in StringParser ? I kept it here because the difference is mostly in the signature. Moving it to the StringParser would make it an outlier there, or I would have to add a bunch methods that are currently not needed. http://gerrit.cloudera.org:8080/#/c/12069/11/be/src/util/system-state-info.cc@54 PS11, Line 54: if (!proc_file.is_open()) { > I believe you can use GetStrErrMsg() to get the error Done http://gerrit.cloudera.org:8080/#/c/12069/11/lib/python/impala_py_lib/profiles.py File lib/python/impala_py_lib/profiles.py: http://gerrit.cloudera.org:8080/#/c/12069/11/lib/python/impala_py_lib/profiles.py@28 PS11, Line 28: > flake8: E302 expected 2 blank lines, found 1 Done http://gerrit.cloudera.org:8080/#/c/12069/11/tests/common/impala_service.py File tests/common/impala_service.py: http://gerrit.cloudera.org:8080/#/c/12069/11/tests/common/impala_service.py@91 PS11, Line 91: e > flake8: E703 statement ends with a semicolon Done http://gerrit.cloudera.org:8080/#/c/12069/11/tests/observability/test_plot_profile_resource_usage.py File tests/observability/test_plot_profile_resource_usage.py: http://gerrit.cloudera.org:8080/#/c/12069/11/tests/observability/test_plot_profile_resource_usage.py@25 PS11, Line 25: > flake8: E302 expected 2 blank lines, found 1 Done http://gerrit.cloudera.org:8080/#/c/12069/11/tests/observability/test_plot_profile_resource_usage.py@35 PS11, Line 35: result = self.execute_query("select sleep(2000)", query_opts) > I think we have to be a little careful with this pattern since it assumes t I didn't know about the HS2 api, that's something that we should make use of. The way the code is structured currently means all query execution and profile retrieval is handled inside ImpalaBeeswaxClient.execute() and the query handle is closed afterwards. To switch that to HS2 will require some plumbing so that we can execute the query through HS2 altogether. I'd prefer to do that in a follow up Jira, if you think the current approach is too risky we can mark the test as execute_serially. http://gerrit.cloudera.org:8080/#/c/12069/11/tests/query_test/test_observability.py File tests/query_test/test_observability.py: http://gerrit.cloudera.org:8080/#/c/12069/11/tests/query_test/test_observability.py@399 PS11, Line 399: f > flake8: F841 local variable 'expected_strs' is assigned to but never used Done -- To view, visit http://gerrit.cloudera.org:8080/12069 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3aedc20c553ab8d7ed50f72a1a936eba151487d9 Gerrit-Change-Number: 12069 Gerrit-PatchSet: 12 Gerrit-Owner: Lars Volker <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Lars Volker <[email protected]> Gerrit-Reviewer: Michael Ho <[email protected]> Gerrit-Reviewer: Philip Zeyliger <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Comment-Date: Thu, 24 Jan 2019 23:09:34 +0000 Gerrit-HasComments: Yes
