----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/71698/ -----------------------------------------------------------
Review request for mesos, Benjamin Mahler and Meng Zhu. Bugs: MESOS-10015 https://issues.apache.org/jira/browse/MESOS-10015 Repository: mesos Description ------- This patch addresses poor performance of `HierarchicalAllocatorProcess::updateAllocation()` for agents with a huge number of non-addable resources in a many-framework case (see MESOS-10015). Sorter methods for totals tracking that modify `Resources` of an agent in the Sorter are replaced with methods that add/remove resource quantities of an agent as a whole (which was actually the only use case of the old methods). Thus, subtracting/adding `Resources` of a whole agent no longer occurs when updating resources of an agent in a Sorter. Further, this patch completely removes agent resource tracking logic from the random sorter (which by itself makes no use of them) by implementing cluster totals tracking in the allocator. Results of `*BENCHMARK_WithReservationParam.UpdateAllocation*` (for the DRF sorter): 1.7.x branch: Agent resources size: 200 (50 frameworks) Made 20 reserve and unreserve operations in 2.014081646secs Agent resources size: 400 (100 frameworks) Made 20 reserve and unreserve operations in 13.623513239secs Agent resources size: 800 (200 frameworks) Made 20 reserve and unreserve operations in 2.14100063438333mins Agent resources size: 1600 (400 frameworks) (killed after several minutes) 1.7.x branch + this patch: Agent resources size: 200 (50 frameworks) Made 20 reserve and unreserve operations in 236.706615ms Agent resources size: 400 (100 frameworks) Made 20 reserve and unreserve operations in 483.544585ms Agent resources size: 800 (200 frameworks) Made 20 reserve and unreserve operations in 1.095224322secs ... Agent resources size: 6400 (1600 frameworks) Made 20 reserve and unreserve operations in 50.369691741secs This is a backport of https://reviews.apache.org/r/71646 Diffs ----- src/master/allocator/mesos/hierarchical.hpp 1fce68fbdbb36edad0425dbd0d9c818f2cd0870e src/master/allocator/mesos/hierarchical.cpp 3e8a8ce728b4cf1f45947f8fb2814c87b6468d91 src/master/allocator/sorter/drf/sorter.hpp 75f90f331fbe2ec514daa3fe00b0b05ad55932e1 src/master/allocator/sorter/drf/sorter.cpp 43c97671d692675df6a347e4482126d83d7e3f24 src/master/allocator/sorter/random/sorter.hpp 2031cb234cc3e29723f07ec7d3a7e8671a26a97f src/master/allocator/sorter/random/sorter.cpp 6fcfc41f65bb6401cfb60af88866c2b02920887e src/master/allocator/sorter/sorter.hpp 25ad48dff7e624e7d25072958bdd20513ab83d12 src/tests/sorter_tests.cpp 1e2791f993af2fba592b0e76493864c096a0bb5f Diff: https://reviews.apache.org/r/71698/diff/1/ Testing ------- make check `*BENCHMARK_WithReservationParam.UpdateAllocation*`: **Before:** Agent resources size: 200 (50 roles, 1 reservations per role, 1 port ranges) Made 20 reserve and unreserve operations in 2.014081646secs Average UNRESERVE duration: 50.561677ms Average RESERVE duration: 50.142404ms Agent resources size: 400 (100 roles, 1 reservations per role, 1 port ranges) Made 20 reserve and unreserve operations in 13.623513239secs Average UNRESERVE duration: 341.008722ms Average RESERVE duration: 340.166939ms Agent resources size: 800 (200 roles, 1 reservations per role, 1 port ranges) Made 20 reserve and unreserve operations in 2.14100063438333mins Average UNRESERVE duration: 3.199787095secs Average RESERVE duration: 3.223214807secs Agent resources size: 1600 (400 roles, 1 reservations per role, 1 port ranges) (killed after several minutes) **After:** Agent resources size: 200 (50 roles, 1 reservations per role, 1 port ranges) Made 20 reserve and unreserve operations in 236.706615ms Average UNRESERVE duration: 5.908221ms Average RESERVE duration: 5.927109ms Agent resources size: 400 (100 roles, 1 reservations per role, 1 port ranges) Made 20 reserve and unreserve operations in 483.544585ms Average UNRESERVE duration: 12.637169ms Average RESERVE duration: 11.540059ms Agent resources size: 800 (200 roles, 1 reservations per role, 1 port ranges) Made 20 reserve and unreserve operations in 1.095224322secs Average UNRESERVE duration: 27.261353ms Average RESERVE duration: 27.499862ms Agent resources size: 1600 (400 roles, 1 reservations per role, 1 port ranges) Made 20 reserve and unreserve operations in 3.785458686secs Average UNRESERVE duration: 94.972666ms Average RESERVE duration: 94.300268ms Agent resources size: 3200 (800 roles, 1 reservations per role, 1 port ranges) Made 20 reserve and unreserve operations in 13.614374427secs Average UNRESERVE duration: 340.791016ms Average RESERVE duration: 339.927704ms Agent resources size: 6400 (1600 roles, 1 reservations per role, 1 port ranges) Made 20 reserve and unreserve operations in 50.369691741secs Average UNRESERVE duration: 1.261506421secs Average RESERVE duration: 1.256978165secs Thanks, Andrei Sekretenko