> On Dec. 20, 2016, 5:41 p.m., Neil Conway wrote: > > Can we add an expectation for exactly when we expect to receive the offer? > > i.e., another `WillOnce(FutureSatisfy(...))` and then wait for the future > > at the appropriate time. > > > > The thing I don't like about the `WillRepeatedly(...)` pattern is that it > > obscures whether additional offers are actually expected or not. Lots of > > tests have copied this pattern and use it, even when another offer is not > > actually expected. > > Benjamin Bannier wrote: > This test only uses a single offer, and we do not know when or even if a > second offer would arrive since the test does not run with a paused clock, > https://github.com/apache/mesos/blob/1e72605e9892eb4e518442ab9c1fe2a1a1696748/src/tests/fault_tolerance_tests.cpp#L853. > If execution on the thread running in the test body is delayed sufficiently > after the clock is resumed, additional offers might be made concurrently with > that part of the test body. > > It looks to me that as a general rule of thumb, if tests do not pause the > clock, one should always ignore subsequent calls to mock interfaces in order > to avoid this gmock assertion. > > Neil Conway wrote: > Right, although there are plenty of other situations in which _arbitrary_ > delays in tests will cause failures (e.g., exceeding the registry read/write > timeouts, or even causing "sleep 60" dummy tasks to exit). So I'm not sure > that this problem is unique to making offers. > > It would be good to explore pausing the clock by default for all tests > (MESOS-4101). But in the short term, what about making the default allocation > interval much higher for tests? Say 60 seconds. Test code should generally > not be depending on batch allocations anyway (because it needlessly slows > down the test), so this would make such poorly written tests easier to find.
Yes, that makes sense. Would you mind taking this over? You already started fixing this test (it wasn't pretty before, but also not flaky then), but we learnt in the meantime and it would be great if we would fix this test completely. Like I already mentioned to you offline, it would be a good idea to also think a little harder about your approach to comparing doubles (https://github.com/apache/mesos/blob/d2117362349ab4c383045720f77d42b2d9fd6871/src/tests/fault_tolerance_tests.cpp#L881-L888), see https://issues.apache.org/jira/browse/MESOS-4695 for the kind of failures this pattern can cause. - Benjamin ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/54889/#review159727 ----------------------------------------------------------- On Dec. 20, 2016, 11:15 p.m., Benjamin Bannier wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/54889/ > ----------------------------------------------------------- > > (Updated Dec. 20, 2016, 11:15 p.m.) > > > Review request for mesos, Alexander Rukletsov and Neil Conway. > > > Bugs: MESOS-6820 > https://issues.apache.org/jira/browse/MESOS-6820 > > > Repository: mesos > > > Description > ------- > > The test FaultToleranceTest.FrameworkReregister observes a single > offer in order to monitor progress and as a trigger to initiate other > actions later in the test. The change installs expectations for > possible additional offers. This allows for the thread running the > test body to be delayed with respect to the master thread which could > in the meantime make additional offers. > > > Diffs > ----- > > src/tests/fault_tolerance_tests.cpp > 05937a917a2c175aa53b52488febb7cfd8400a13 > > Diff: https://reviews.apache.org/r/54889/diff/ > > > Testing > ------- > > `make check` (OS X, clang-trunk, w/ optimizations, SSL) > > Before this fix `FaultToleranceTest.FrameworkReregister` failed for me > multiple times in ~10k iterations; after this patch it didn't fail in a > single run I stopped after 170k iterations. > > > Thanks, > > Benjamin Bannier > >
