Hello Tidy Bot, Alexey Serbin, Kudu Jenkins,

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/10787

to look at the new patch set (#3).

Change subject: Add a simple metric for cluster skew
......................................................................

Add a simple metric for cluster skew

This adds a very simple 'cluster_skew' metric to the master that reports
on the difference in number of replicas between the most and least
loaded tablet servers. This information was already computable from the
tablets_num_* metrics available on all the tablet servers, but this
centralizes it in one place and handles counting the correct tablet
states, so it's much easier to consume. This simple metric should be
useful for operators trying to set up simple alerting schemes based on
cluster balance.

Why not introduce a more comprehensive set of metrics around balance?
Because eventually rebalancing should be tightly integrated with the
master. This metric is just meant as a useful "canary" for when the
rebalancer ought to be run, until a more sophisticated and automated
procedure can be put in place. At that time there will likely be better
metrics exposed to gauge the balance of the cluster and the behavior of
the rebalancer.

I also wrote a quick script to simulate placing replicas on tablet
servers and measure the resulting distribution of skew. The results of
the simulations show skew is almost certainly 6 or less when replica
distribution is determined solely by the current power of two choices
algorithm with a fixed number of tablet servers. This can provide some
guide to operators looking to set a threshold for concerning skew: a
value of e.g. 10 should be vanishingly unlikely to result except by some
external force like unbalanced re-replication or the addition of a
tablet server, so it should suffice as a threshold.

Change-Id: I107256de604998cbf9206a8fccb3a43de86f81a8
---
M src/kudu/master/master.cc
M src/kudu/master/ts_manager.cc
M src/kudu/master/ts_manager.h
A src/kudu/scripts/max_skew_estimate.py
4 files changed, 129 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/87/10787/3
--
To view, visit http://gerrit.cloudera.org:8080/10787
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I107256de604998cbf9206a8fccb3a43de86f81a8
Gerrit-Change-Number: 10787
Gerrit-PatchSet: 3
Gerrit-Owner: Will Berkeley <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Will Berkeley <[email protected]>

Reply via email to