----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/68706/#review209425 -----------------------------------------------------------
Patch looks great! Reviews applied: [68706] Passed command: export OS='ubuntu:14.04' BUILDTOOL='autotools' COMPILER='gcc' CONFIGURATION='--verbose --disable-libtool-wrappers' ENVIRONMENT='GLOG_v=1 MESOS_VERBOSE=1'; ./support/docker-build.sh - Mesos Reviewbot On Oct. 10, 2018, 5:22 p.m., Xudong Ni wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/68706/ > ----------------------------------------------------------- > > (Updated Oct. 10, 2018, 5:22 p.m.) > > > Review request for mesos, Benjamin Mahler, James Peach, and Jiang Yan Xu. > > > Bugs: MESOS-9178 > https://issues.apache.org/jira/browse/MESOS-9178 > > > Repository: mesos > > > Description > ------- > > During the master failover, the time that the master elected is > considered as the start of failover. In the progress of > reregistration, the percentile represents the time when such > percentile of agents finished registration again; The percentile of > these data as in this metrics can represent overall reregistration > progress; In case of degradation towards to the end of reregistration, > the high percentile can reflect it; In the case there are unreachable > agents in the failover, if certain percentile recovery couldn't be > reached, the intiail value of that percentile will not be updated. > > > Diffs > ----- > > docs/monitoring.md 00c6ea94bcb73746aef740236632ede123f5b534 > src/master/master.hpp ea7e9242b62fe6c2cc0e717f9a9f2f0c1cc0a390 > src/master/master.cpp 06d769aeba16586a020729d454f4d00688b78c78 > src/master/metrics.hpp e1da18e6ba2737f729e1e30653020538150ae898 > src/master/metrics.cpp 56a7eef2d279ad3248092d37d19013d3ac110757 > > > Diff: https://reviews.apache.org/r/68706/diff/3/ > > > Testing > ------- > > Tested in mmaster with 6 reregistration agents: > "master/slave_reregistrations": 6, > > In the middle of reregistration process: > "master/slaves_100_percent_reregistered_secs": 0, > "master/slaves_25_percent_reregistered_secs": 2.244662016, > "master/slaves_50_percent_reregistered_secs": 3.599491072, > "master/slaves_75_percent_reregistered_secs": 9.53919616, > "master/slaves_90_percent_reregistered_secs": 0, > "master/slaves_99_percent_reregistered_secs": 0, > > When all registrations finished: > "master/slaves_100_percent_reregistered_secs": 29.697210112, > "master/slaves_25_percent_reregistered_secs": 2.244662016, > "master/slaves_50_percent_reregistered_secs": 3.599491072, > "master/slaves_75_percent_reregistered_secs": 9.53919616, > "master/slaves_90_percent_reregistered_secs": 29.697210112, > "master/slaves_99_percent_reregistered_secs": 29.697210112, > > With 3606 agents, the last 1% take significant time > "master/slave_reregistrations": 3606, > "master/slave_shutdowns_canceled": 0, > "master/slave_shutdowns_completed": 0, > "master/slave_shutdowns_scheduled": 0, > "master/slave_unreachable_canceled": 0, > "master/slave_unreachable_completed": 0, > "master/slave_unreachable_scheduled": 0, > "master/slaves_100_percent_reregistered_secs": 58.585202944, > "master/slaves_25_percent_reregistered_secs": 9.966434048, > "master/slaves_50_percent_reregistered_secs": 20.259571968, > "master/slaves_75_percent_reregistered_secs": 30.598885888, > "master/slaves_90_percent_reregistered_secs": 36.396082944, > "master/slaves_99_percent_reregistered_secs": 39.811022848, > > > Thanks, > > Xudong Ni > >
