Alexey Serbin has submitted this change and it was merged. (
http://gerrit.cloudera.org:8080/16938 )
Change subject: [util] add a few new metrics in MaintenanceManager
......................................................................
[util] add a few new metrics in MaintenanceManager
This patch adds a couple of metrics for MaintenanceManager to track the
duration of choosing the best candidate among available maintenance
operations and number of times the Prepare() method for a maintenance
operation failed:
* maintenance_op_find_best_candidate_duration
* maintenance_op_prepare_failed
In addition, it adds SCOPED_LOG_SLOW_EXECUTION with the threshold of
10 seconds into the MaintenanceManager::FindBestOp() method.
At this point, I manually verified that those metrics are present and
show relevant information. I'm planning to add an automated test
to cover the behavior of these new metrics in [1] to have less conflicts
with the mentioned patch.
The motivation for this change is a finding that FindBestOp()'s
computational complexity is O(n^2) of the number of replicas per tablet
server (each tablet replica registers about 8 maintenance operations).
Also, BudgetedCompactionPolicy::RunApproximation()'s computational
complexity is O(n^2) of the number of rowset in max and min keys.
In the wild, there was an instance of a Kudu cluster with high data
ingest ratio with the following stack showing in every snapshot in the
diagnostic logs for many hours in a row:
0xa11735 kudu::tablet::BudgetedCompactionPolicy::RunApproximation()
0xa129c9 kudu::tablet::BudgetedCompactionPolicy::PickRowSets()
0x9c8d80 kudu::tablet::Tablet::UpdateCompactionStats()
0x9ec848 kudu::tablet::CompactRowSetsOp::UpdateStats()
0x1b3de5c kudu::MaintenanceManager::FindBestOp()
0x1b3f3c5 kudu::MaintenanceManager::RunSchedulerThread()
0x1b86014 kudu::Thread::SuperviseThread()
[1] https://gerrit.cloudera.org/#/c/16937/
Change-Id: If5420afd605f9bd22207af142b49e73336907486
Reviewed-on: http://gerrit.cloudera.org:8080/16938
Reviewed-by: Andrew Wong <[email protected]>
Tested-by: Kudu Jenkins
---
M src/kudu/master/master.cc
M src/kudu/tserver/tablet_server.cc
M src/kudu/util/CMakeLists.txt
M src/kudu/util/maintenance_manager-test.cc
M src/kudu/util/maintenance_manager.cc
M src/kudu/util/maintenance_manager.h
A src/kudu/util/maintenance_manager_metrics.cc
A src/kudu/util/maintenance_manager_metrics.h
8 files changed, 141 insertions(+), 10 deletions(-)
Approvals:
Andrew Wong: Looks good to me, approved
Kudu Jenkins: Verified
--
To view, visit http://gerrit.cloudera.org:8080/16938
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: If5420afd605f9bd22207af142b49e73336907486
Gerrit-Change-Number: 16938
Gerrit-PatchSet: 5
Gerrit-Owner: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Andrew Wong <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)