Will Berkeley has uploaded this change for review. ( http://gerrit.cloudera.org:8080/10787
Change subject: Add a simple metric for cluster skew ...................................................................... Add a simple metric for cluster skew This adds a very simple 'cluster_skew' metric to the master that reports on the difference in number of replicas between the most and least loaded tablet servers. This information was already computable from the tablets_num_* metrics available on all the tablet servers, but this centralizes it in one place and handles counting the correct tablet states, so it's much easier to consume. This simple metric should be useful for operators trying to set up simple alerting schemes based on cluster balance. Why not introduce a more comprehensive set of metrics around balance? Because eventually rebalancing should be tightly integrated with the master. This metric is just meant as a useful "canary" for when the rebalancer ought to be run, until a more sophisticated and automated procedure can be put in place. At that time there will likely be better metrics exposed to gauge the balance of the cluster and the behavior of the rebalancer. I also wrote a quick script to simulate placing replicas on tablet servers and measure the resulting distribution of skew. The results of the simulations show skew is almost certainly 6 or less when replica distribution is determined solely by the current power of two choices algorithm with a fixed number of tablet servers. This can provide some guide to operators looking to set a theshold for concerning skew- a value of e.g. 10 should be vanishingly unlikely to result except by some external force like unbalanced re-replication or the addition of a tablet server, so it should suffice as a threshold. Change-Id: I107256de604998cbf9206a8fccb3a43de86f81a8 --- M src/kudu/master/master.cc M src/kudu/master/ts_manager.cc M src/kudu/master/ts_manager.h A src/kudu/scripts/max_skew_estimate.py 4 files changed, 129 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/87/10787/1 -- To view, visit http://gerrit.cloudera.org:8080/10787 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I107256de604998cbf9206a8fccb3a43de86f81a8 Gerrit-Change-Number: 10787 Gerrit-PatchSet: 1 Gerrit-Owner: Will Berkeley <[email protected]>
