Re: Review Request 69267: Fixed flaky SchedulerTest.MasterFailover.

Mesos Reviewbot Windows Tue, 06 Nov 2018 19:17:32 -0800

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69267/#review210363
-----------------------------------------------------------




PASS: Mesos patch 69267 was successfully built and tested.

Reviews applied: `['69267']`

All the build artifacts available at: 
http://dcos-win.westus2.cloudapp.azure.com/artifacts/mesos-reviewbot-testing/2574/mesos-review-69267

- Mesos Reviewbot Windows


On Nov. 7, 2018, 1:26 a.m., Joseph Wu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69267/
> -----------------------------------------------------------
> 
> (Updated Nov. 7, 2018, 1:26 a.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov and Greg Mann.
> 
> 
> Bugs: MESOS-6949
>     https://issues.apache.org/jira/browse/MESOS-6949
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> This test was flaky because there is a double-master-detection race
> after the master fails over.  This test uses the Standalone master
> detector, which keeps a single Master PID in memory and always returns
> that one PID as the leader.  This means there is almost no delay
> between failing over the master and detecting a new leader.
> 
> The scheduler in this test tries to send a SUBSCRIBE call to the master
> as soon as the master is detected.  Normally, there will only be two
> total SUBSCRIBE calls during the test, before and after the master
> failover.  However, the test also manually appoints the leader after
> failing over the master.  This step races against the scheduler's own
> retry logic, and can potentially cause a third SUBSCRIBE if the second
> SUBSCRIBE has already started.
> 
> Because the scheduler in this test does not enable checkpointing, the
> third SUBSCRIBE will actively disconnect the framework, causing the
> master to remove the framework.  This removal also prevents the
> framework from ever registering again, and thereby times out the test.
> 
> This fixes the test to prevent excess master detection events.
> 
> We could also change the HTTP scheduler driver to ignore these extra
> master detection events when the master in question has not changed.
> 
> 
> Diffs
> -----
> 
>   src/tests/scheduler_tests.cpp 0ee5b77e5a667e37ac13553e15f634b2cb19ea65 
> 
> 
> Diff: https://reviews.apache.org/r/69267/diff/1/
> 
> 
> Testing
> -------
> 
> make check
> 
> GLOG_v=1 src/mesos-tests --gtest_filter="*SchedulerTest.MasterFailover*" 
> --gtest_repeat=-1 --gtest_break_on_failure --verbose
> 
> 
> Thanks,
> 
> Joseph Wu
> 
>

Re: Review Request 69267: Fixed flaky SchedulerTest.MasterFailover.

Reply via email to