-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70325/
-----------------------------------------------------------
(Updated April 4, 2019, 12:10 a.m.)
Review request for mesos, Benjamin Mahler, Gastón Kleiman, Joseph Wu, and Meng
Zhu.
Bugs: MESOS-9635
https://issues.apache.org/jira/browse/MESOS-9635
Repository: mesos
Description
-------
This patch updates the master's framework recovery code to use
the allocator's `addAgentResources()` method rather than
`updateSlave()` when recovering orphan operations, which has the
benefit of tracking the allocation of the operations' consumed
resources, avoiding situations in which those resources would be
incorrectly offered to frameworks while the operation is still
in a pending state.
Diffs (updated)
-----
src/master/master.cpp cf5caa0893ba1387a1f3a9d129ecd7d974f776bd
Diff: https://reviews.apache.org/r/70325/diff/2/
Changes: https://reviews.apache.org/r/70325/diff/1-2/
Testing
-------
`make check`
To verify the flaky test fix, the following command was executed both before
and after the patches were applied, while `stress -c <num_cores_on_machine>`
was being run:
`bin/mesos-tests.sh --gtest_filter="*AgentPendingOperationAfterMasterFailover*"
--gtest_repeat=-1 --gtest_break_on_failure`
Before the patches were applied, the test would reliably fail after less than
50 repetitions. After the patches are applied, the test can be run for hundreds
of repetitions with no failures.
Thanks,
Greg Mann