-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51653/
-----------------------------------------------------------
Review request for mesos and Vinod Kone.
Bugs: MESOS-5965
https://issues.apache.org/jira/browse/MESOS-5965
Repository: mesos
Description
-------
Now that we wait for the agent to be removed from the registry before
stopping the SlaveObserver, it is possible for an agent to fail health
checks multiple times if the registry operation takes longer than
`agent_ping_timeout`.
This commit updates the master logic to handle this by ignoring health
check failures while the registry operation to mark the agent
unreachable is still in progress.
Diffs
-----
src/master/master.cpp b2a19a645528e8fc1fd48f5ac9929d38c9a76b49
src/tests/partition_tests.cpp f3142ad8d50daafcdb70ad9dbb2772f8ba30db00
Diff: https://reviews.apache.org/r/51653/diff/
Testing
-------
make check on OSX and Linux.
`./src/mesos-tests
--gtest_filter="Strict/PartitionTest.FailHealthChecksTwice/0"
--gtest_repeat=1000 --gtest_break_on_failure`
Thanks,
Neil Conway