> On Sept. 12, 2016, 11:01 p.m., Vinod Kone wrote: > > src/master/master.cpp, line 5835 > > <https://reviews.apache.org/r/51653/diff/4/?file=1496870#file1496870line5835> > > > > s/WARNING/INFO/ because this is expected?
I opted for `WARNING` because, although this situation can occur, we expect it to occur quite rarely in practice. So it doesn't _necessarily_ indicate a problem, but if you see it more than once in the logs, it probably bears investigating. In comparison to a lot of the stuff we log at `INFO`, which is generally not very important for admins to pay attention to. - Neil ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/51653/#review148619 ----------------------------------------------------------- On Sept. 12, 2016, 4:01 p.m., Neil Conway wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/51653/ > ----------------------------------------------------------- > > (Updated Sept. 12, 2016, 4:01 p.m.) > > > Review request for mesos and Vinod Kone. > > > Bugs: MESOS-5965 > https://issues.apache.org/jira/browse/MESOS-5965 > > > Repository: mesos > > > Description > ------- > > Now that we wait for the agent to be removed from the registry before > stopping the SlaveObserver, it is possible for an agent to fail health > checks multiple times if the registry operation takes longer than > `agent_ping_timeout`. > > This commit updates the master logic to handle this by ignoring health > check failures while the registry operation to mark the agent > unreachable is still in progress. > > > Diffs > ----- > > src/master/master.cpp 1dcce6cd66804990af238176c61aca03bb5c9471 > src/tests/partition_tests.cpp f3142ad8d50daafcdb70ad9dbb2772f8ba30db00 > > Diff: https://reviews.apache.org/r/51653/diff/ > > > Testing > ------- > > make check on OSX and Linux. > > `./src/mesos-tests > --gtest_filter="Strict/PartitionTest.FailHealthChecksTwice/0" > --gtest_repeat=1000 --gtest_break_on_failure` > > > Thanks, > > Neil Conway > >
