Lars Volker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12069 )

Change subject: IMPALA-7694: Add host resource usage metrics to profile
......................................................................


Patch Set 12:

(20 comments)

http://gerrit.cloudera.org:8080/#/c/12069/11/be/src/runtime/coordinator-backend-state.h
File be/src/runtime/coordinator-backend-state.h:

http://gerrit.cloudera.org:8080/#/c/12069/11/be/src/runtime/coordinator-backend-state.h@67
PS11, Line 67:
> nit: Please add a comment on the role of this parameter.
Done


http://gerrit.cloudera.org:8080/#/c/12069/11/be/src/runtime/coordinator-backend-state.h@67
PS11, Line 67:
> host_profile_parent? It sounds like its an array otherwise
Done


http://gerrit.cloudera.org:8080/#/c/12069/11/be/src/runtime/coordinator-backend-state.h@249
PS11, Line 249: nters for the backend ho
> nit: A brief statement of what this contains.
Done


http://gerrit.cloudera.org:8080/#/c/12069/11/be/src/runtime/query-state.cc
File be/src/runtime/query-state.cc:

http://gerrit.cloudera.org:8080/#/c/12069/11/be/src/runtime/query-state.cc@296
PS11, Line 296:   // Add profile to report
              :   host_profile_->ToThrift(&profiles_forest->host_profile);
              :   profiles_forest->__isset.host_profile = true;
              :   // Free resources in chunked counters in the profile
              :   host_profile_->ClearChunkedTimeSeriesCounters();
> I wonder how this may reconcile with https://gerrit.cloudera.org/#/c/12049/
We drop duplicate values and fill up gaps with 0 (see runtime-profile.cc:1281).

To make this more robust, we can clear the counters after sending the profile 
successfully, so that the values would be included in a retry of an 
unsuccessful send. This could be done in 
FragmentInstanceState::ReportSuccessful() in 
https://gerrit.cloudera.org/#/c/12049/3/be/src/runtime/fragment-instance-state.cc@271
 .

Do you think that scenario is likely enough to warrant such a separation of 
getting the values and then "committing" the send once it was successful?


http://gerrit.cloudera.org:8080/#/c/12069/11/be/src/runtime/runtime-state.cc
File be/src/runtime/runtime-state.cc:

http://gerrit.cloudera.org:8080/#/c/12069/11/be/src/runtime/runtime-state.cc@76
PS11, Line 76: profile_(RuntimeProfile::Create(obj_pool(), "<fragment 
instance>")),
> Why this change ?
To make it more consistent. The name is ignored since the profile itself is 
never added as a child to another profile. Instead the profile serves as a 
container to transport counters to the coordinator where they are merged into 
the query profile. In other places of the code we use this notation so I 
adopted it here. Would you like me to add a comment to explain this?


http://gerrit.cloudera.org:8080/#/c/12069/11/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

http://gerrit.cloudera.org:8080/#/c/12069/11/be/src/service/impala-server.cc@1021
PS11, Line 1021:   query_ctx->query_id = UuidToQueryId(random_generator()());
> This usage pattern of the RNG isn't quite right - we're constructing a new
I had chosen to create a new one for the same reason as the comment above 
points out, that it might be cheaper than locking. However, I had not thought 
of it using the same seed value, which the documentation says it does. I 
switched it to using Kudu's ThreadSafeRandom instead.


http://gerrit.cloudera.org:8080/#/c/12069/11/be/src/util/runtime-profile-counters.h
File be/src/util/runtime-profile-counters.h:

http://gerrit.cloudera.org:8080/#/c/12069/11/be/src/util/runtime-profile-counters.h@443
PS11, Line 443: number of samples that have
> This deserves a comment.
Done


http://gerrit.cloudera.org:8080/#/c/12069/11/be/src/util/runtime-profile.h
File be/src/util/runtime-profile.h:

http://gerrit.cloudera.org:8080/#/c/12069/11/be/src/util/runtime-profile.h@419
PS11, Line 419:  Samples
              :   /// are not re-sampled into larger intervals, instead owners 
of this profile can call
              :   /// ClearChunkedTimeSeriesCounters() to reset the sample 
buffers of all chunked time
              :   /// series counters
> So, misuse of this kind of counter (e.g. never calling ClearChunkedTimeSeri
Yes. If you think that's a problem, we can cap the number of samples at some 
sufficiently large value, e.g. 10 * status_report_interval_ms / 
periodic_counter_update_period_ms ?


http://gerrit.cloudera.org:8080/#/c/12069/11/be/src/util/runtime-profile.cc
File be/src/util/runtime-profile.cc:

http://gerrit.cloudera.org:8080/#/c/12069/11/be/src/util/runtime-profile.cc@1216
PS11, Line 1216:
> nit: blank space
Done


http://gerrit.cloudera.org:8080/#/c/12069/11/be/src/util/runtime-profile.cc@1231
PS11, Line 1231:
> nit: blank space
Done


http://gerrit.cloudera.org:8080/#/c/12069/11/be/src/util/runtime-profile.cc@1279
PS11, Line 1279: DCHECK_
> DCHECK_EQ
Done


http://gerrit.cloudera.org:8080/#/c/12069/11/be/src/util/system-state-info.h
File be/src/util/system-state-info.h:

http://gerrit.cloudera.org:8080/#/c/12069/11/be/src/util/system-state-info.h@66
PS11, Line 66:   /// CaptureSystemStateSnapshot() and stores the result in 
'cpu_ratios_'.
> 'cpu_ratios_' is updated after returning from this function.
Done


http://gerrit.cloudera.org:8080/#/c/12069/11/be/src/util/system-state-info.h@69
PS11, Line 69: U_VALUES {
> Seems more future proof to have this as the last value in the enum below li
Done


http://gerrit.cloudera.org:8080/#/c/12069/11/be/src/util/system-state-info.cc
File be/src/util/system-state-info.cc:

http://gerrit.cloudera.org:8080/#/c/12069/11/be/src/util/system-state-info.cc@44
PS11, Line 44:
             : int64_t SystemStateInfo::ParseInt64(const string& val) const {
             :   StringParser::ParseResult result;
             :   int64_t parsed = 
StringParser::StringToInt<int64_t>(val.c_str(), val.size(), &result);
             :   if (result == StringParser::PARSE_SUCCESS) return parsed;
             :
> Wonder if this fits better as a utility function in StringParser ?
I kept it here because the difference is mostly in the signature. Moving it to 
the StringParser would make it an outlier there, or I would have to add a bunch 
methods that are currently not needed.


http://gerrit.cloudera.org:8080/#/c/12069/11/be/src/util/system-state-info.cc@54
PS11, Line 54:   if (!proc_file.is_open()) {
> I believe you can use GetStrErrMsg() to get the error
Done


http://gerrit.cloudera.org:8080/#/c/12069/11/lib/python/impala_py_lib/profiles.py
File lib/python/impala_py_lib/profiles.py:

http://gerrit.cloudera.org:8080/#/c/12069/11/lib/python/impala_py_lib/profiles.py@28
PS11, Line 28:
> flake8: E302 expected 2 blank lines, found 1
Done


http://gerrit.cloudera.org:8080/#/c/12069/11/tests/common/impala_service.py
File tests/common/impala_service.py:

http://gerrit.cloudera.org:8080/#/c/12069/11/tests/common/impala_service.py@91
PS11, Line 91: e
> flake8: E703 statement ends with a semicolon
Done


http://gerrit.cloudera.org:8080/#/c/12069/11/tests/observability/test_plot_profile_resource_usage.py
File tests/observability/test_plot_profile_resource_usage.py:

http://gerrit.cloudera.org:8080/#/c/12069/11/tests/observability/test_plot_profile_resource_usage.py@25
PS11, Line 25:
> flake8: E302 expected 2 blank lines, found 1
Done


http://gerrit.cloudera.org:8080/#/c/12069/11/tests/observability/test_plot_profile_resource_usage.py@35
PS11, Line 35:     result = self.execute_query("select sleep(2000)", query_opts)
> I think we have to be a little careful with this pattern since it assumes t
I didn't know about the HS2 api, that's something that we should make use of. 
The way the code is structured currently means all query execution and profile 
retrieval is handled inside ImpalaBeeswaxClient.execute() and the query handle 
is closed afterwards.

To switch that to HS2 will require some plumbing so that we can execute the 
query through HS2 altogether. I'd prefer to do that in a follow up Jira, if you 
think the current approach is too risky we can mark the test as 
execute_serially.


http://gerrit.cloudera.org:8080/#/c/12069/11/tests/query_test/test_observability.py
File tests/query_test/test_observability.py:

http://gerrit.cloudera.org:8080/#/c/12069/11/tests/query_test/test_observability.py@399
PS11, Line 399: f
> flake8: F841 local variable 'expected_strs' is assigned to but never used
Done



--
To view, visit http://gerrit.cloudera.org:8080/12069
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3aedc20c553ab8d7ed50f72a1a936eba151487d9
Gerrit-Change-Number: 12069
Gerrit-PatchSet: 12
Gerrit-Owner: Lars Volker <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Lars Volker <[email protected]>
Gerrit-Reviewer: Michael Ho <[email protected]>
Gerrit-Reviewer: Philip Zeyliger <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Comment-Date: Thu, 24 Jan 2019 23:09:34 +0000
Gerrit-HasComments: Yes

Reply via email to