Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/13979 )
Change subject: IMPALA-8806: Add metrics to improve observability of executor groups ...................................................................... Patch Set 1: (5 comments) Some minor comments, I think this will be very useful. http://gerrit.cloudera.org:8080/#/c/13979/1/be/src/scheduling/cluster-membership-mgr-test.cc File be/src/scheduling/cluster-membership-mgr-test.cc: http://gerrit.cloudera.org:8080/#/c/13979/1/be/src/scheduling/cluster-membership-mgr-test.cc@271 PS1, Line 271: ClusterMembershipMgr cmm1(b1->address.hostname, nullptr, nullptr); We should create a dummy metrics group here instead of adding extra logic in the product code. http://gerrit.cloudera.org:8080/#/c/13979/1/be/src/scheduling/cluster-membership-mgr.cc File be/src/scheduling/cluster-membership-mgr.cc: http://gerrit.cloudera.org:8080/#/c/13979/1/be/src/scheduling/cluster-membership-mgr.cc@48 PS1, Line 48: static const string LIVE_EXEC_GROUP_KEY("cluster-membership.executor-groups.total-live"); Related to another comment, we could maybe omit the "live" adjective, since I think it's intuitive that metrics shouldn't include "dead" groups, whatever that is. I'm not that set on this, but I can imagine people asking "what does live mean?". Maybe others disagree, not too concerned about this. http://gerrit.cloudera.org:8080/#/c/13979/1/common/thrift/metrics.json File common/thrift/metrics.json: http://gerrit.cloudera.org:8080/#/c/13979/1/common/thrift/metrics.json@2446 PS1, Line 2446: atleast "at least" - here and below http://gerrit.cloudera.org:8080/#/c/13979/1/common/thrift/metrics.json@2446 PS1, Line 2446: live It might be better to avoid introducing the new "live" terminology and simply omit the adjective. http://gerrit.cloudera.org:8080/#/c/13979/1/tests/custom_cluster/test_executor_groups.py File tests/custom_cluster/test_executor_groups.py: http://gerrit.cloudera.org:8080/#/c/13979/1/tests/custom_cluster/test_executor_groups.py@43 PS1, Line 43: existing_ss_args = method.func_dict.get("state_store_args", "") I have a change that should be setting it to 50ms across all custom cluster tests, so we could remove this (200ms is actually slower) - https://gerrit.cloudera.org/#/c/13967/5/tests/common/custom_cluster_test_suite.py -- To view, visit http://gerrit.cloudera.org:8080/13979 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7745ea1c7c6778d3fb5e59adbc873697beb0f3b9 Gerrit-Change-Number: 13979 Gerrit-PatchSet: 1 Gerrit-Owner: Bikramjeet Vig <[email protected]> Gerrit-Reviewer: Andrew Sherman <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Lars Volker <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Comment-Date: Thu, 01 Aug 2019 17:10:04 +0000 Gerrit-HasComments: Yes
