> On Dec. 20, 2016, 12:01 a.m., Greg Mann wrote: > > I was able to catch one flaky test by running the agent tests in > > repetition. For the other patches you're working on, I would recommend > > running the altered tests for a while with `--gtest_repeat=-1 > > --gtest_break_on_failure` to check for flakiness.
Thanks for the tip. This time I verified this solution with: ``` make mesos-tests -j4 && ./src/mesos-tests --gtest_repeat=1000 --gtest_break_on_failure --gtest_filter="SlaveTest.DuplicateTerminalUpdateBeforeAck:SlaveTest.MetricsSlaveLaunchErrors:SlaveTest.StateEndpoint:SlaveTest.PingTimeoutNoPings:SlaveTest.PingTimeoutSomePings:SlaveTest.ReregisterWithStatusUpdateTaskState" ``` - Alex ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/54803/#review159647 ----------------------------------------------------------- On Dec. 17, 2016, 11:01 p.m., Alex Clemmer wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/54803/ > ----------------------------------------------------------- > > (Updated Dec. 17, 2016, 11:01 p.m.) > > > Review request for mesos, Adam B, Andrew Schwartzmeyer, Daniel Pravat, Greg > Mann, John Kordich, Joseph Wu, and Vinod Kone. > > > Bugs: MESOS-6803 > https://issues.apache.org/jira/browse/MESOS-6803 > > > Repository: mesos > > > Description > ------- > > Currently, when `HAS_AUTHENTICATION` is undefined, the Agent will > use `delay` to schedule a random time in the future to register with the > Master, to avoid the thundering herd problem after a Master failover. > The authentication codepath, in contrast, schedules the registration > immediately. > > In tests where we have `Clock::pause`'d when we are supposed to be > registering the slave, the authention codepath will succeeed, while > no-authentication codepath will hang forever. > > A much more detailed analysis of this situation exists in MESOS-6803. > > This commit will resolve this issue for `slave_tests.cpp` by changing > the tests to not use `Clock::pause` when we are waiting for Agent > registration. > > > Diffs > ----- > > src/tests/slave_tests.cpp d956a326ef29bf29837e0587a14bae457147cbca > > Diff: https://reviews.apache.org/r/54803/diff/ > > > Testing > ------- > > Added `delay` to the call to `authenticate` in `Slave::detected`, ran tests > to find failing tests in `SlaveTest.*`, then fixed, then ran again. > > > Thanks, > > Alex Clemmer > >
