-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54803/#review159647
-----------------------------------------------------------



I was able to catch one flaky test by running the agent tests in repetition. 
For the other patches you're working on, I would recommend running the altered 
tests for a while with `--gtest_repeat=-1 --gtest_break_on_failure` to check 
for flakiness.


src/tests/slave_tests.cpp (lines 203 - 204)
<https://reviews.apache.org/r/54803/#comment230669>

    When assigning, we usually break after the equal sign:
    
    ```
    Try<Owned<cluster::Slave>> slave =
      StartSlave(detector.get(), &containerizer, agentFlags);
    ```



src/tests/slave_tests.cpp (lines 3661 - 3662)
<https://reviews.apache.org/r/54803/#comment230684>

    Line break after equal sign



src/tests/slave_tests.cpp (lines 3691 - 3697)
<https://reviews.apache.org/r/54803/#comment230685>

    Instead of pausing the clock just after the status update arrives, let's 
pause the clock at the beginning of the test and advance it manually.
    
    The only thing depending on the clock above this point in the test is the 
offer being sent by the agent. So if you pause the clock at the beginning of 
the test (and create some `masterFlags` as you've done elsewhere), you could do:
    
    ```
    Future<Nothing> ___statusUpdate = FUTURE_DISPATCH(_, 
&Slave::___statusUpdate);
    
    driver.start();
    
    Clock::advance(masterFlags.allocation_interval);
    
    // Wait until TASK_RUNNING is sent to the master.
    AWAIT_READY(statusUpdateMessage);
    ```



src/tests/slave_tests.cpp (lines 3725 - 3726)
<https://reviews.apache.org/r/54803/#comment230668>

    This await appears to be flaky. After testing a bit, it looks like the 
agent sometimes fails to reregister because the clock gets advanced *before* 
`Slave::detected` calls `delay` to set a timer for the delayed reregistration.
    
    We can resolve this by running `Clock::settle()` here before we advance the 
clock.



src/tests/slave_tests.cpp (line 5626)
<https://reviews.apache.org/r/54803/#comment230686>

    s/batchallocation/batch allocation/


- Greg Mann


On Dec. 17, 2016, 11:01 p.m., Alex Clemmer wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54803/
> -----------------------------------------------------------
> 
> (Updated Dec. 17, 2016, 11:01 p.m.)
> 
> 
> Review request for mesos, Adam B, Andrew Schwartzmeyer, Daniel Pravat, Greg 
> Mann, John Kordich, Joseph Wu, and Vinod Kone.
> 
> 
> Bugs: MESOS-6803
>     https://issues.apache.org/jira/browse/MESOS-6803
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Currently, when `HAS_AUTHENTICATION` is undefined, the Agent will
> use `delay` to schedule a random time in the future to register with the
> Master, to avoid the thundering herd problem after a Master failover.
> The authentication codepath, in contrast, schedules the registration
> immediately.
> 
> In tests where we have `Clock::pause`'d when we are supposed to be
> registering the slave, the authention codepath will succeeed, while
> no-authentication codepath will hang forever.
> 
> A much more detailed analysis of this situation exists in MESOS-6803.
> 
> This commit will resolve this issue for `slave_tests.cpp` by changing
> the tests to not use `Clock::pause` when we are waiting for Agent
> registration.
> 
> 
> Diffs
> -----
> 
>   src/tests/slave_tests.cpp d956a326ef29bf29837e0587a14bae457147cbca 
> 
> Diff: https://reviews.apache.org/r/54803/diff/
> 
> 
> Testing
> -------
> 
> Added `delay` to the call to `authenticate` in `Slave::detected`, ran tests 
> to find failing tests in `SlaveTest.*`, then fixed, then ran again.
> 
> 
> Thanks,
> 
> Alex Clemmer
> 
>

Reply via email to