----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/54803/#review159484 -----------------------------------------------------------
src/tests/slave_tests.cpp (lines 2733 - 2736) <https://reviews.apache.org/r/54803/#comment230487> It looks to me like ``` Clock::advance(totalTimeout); Clock::advance(flags.registration_backoff_factor); ``` here would be sufficient. First we advance by `totalTimeout`, which allows two ping timeout intervals to elapse, leading to the agent being removed. If we then advance by the backoff factor, we can be assured that the agent will reregister even if it delays the first registration attempt. Does that make sense? - Greg Mann On Dec. 16, 2016, 7:23 p.m., Alex Clemmer wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/54803/ > ----------------------------------------------------------- > > (Updated Dec. 16, 2016, 7:23 p.m.) > > > Review request for mesos, Adam B, Andrew Schwartzmeyer, Daniel Pravat, Greg > Mann, John Kordich, Joseph Wu, and Vinod Kone. > > > Bugs: MESOS-6803 > https://issues.apache.org/jira/browse/MESOS-6803 > > > Repository: mesos > > > Description > ------- > > Currently, when `HAS_AUTHENTICATION` is undefined, the Agent will > use `delay` to schedule a random time in the future to register with the > Master, to avoid the thundering herd problem after a Master failover. > The authentication codepath, in contrast, schedules the registration > immediately. > > In tests where we have `Clock::pause`'d when we are supposed to be > registering the slave, the authention codepath will succeeed, while > no-authentication codepath will hang forever. > > A much more detailed analysis of this situation exists in MESOS-6803. > > This commit will resolve this issue for `slave_tests.cpp` by changing > the tests to not use `Clock::pause` when we are waiting for Agent > registration. > > > Diffs > ----- > > src/tests/slave_tests.cpp fc6b56c074c71b827a9ee522cd715c0d15ecc7e3 > > Diff: https://reviews.apache.org/r/54803/diff/ > > > Testing > ------- > > Added `delay` to the call to `authenticate` in `Slave::detected`, ran tests > to find failing tests in `SlaveTest.*`, then fixed, then ran again. > > > Thanks, > > Alex Clemmer > >
