> On Jan. 20, 2016, 9:49 a.m., Qian Zhang wrote: > > One question: Say allocation interval is 10s, at the time 5s, framework > > sets a filter with 3s, so with this patch, we will expire the filter 10s > > (max(10, 3)) later, i.e., at the time 15s. Then at the time of 10s (the > > next allocation cycle), allocator will not allocate any resources to the > > framework due to the 10s filter which is good and is the issue that we > > intend to fix. And then in the time 12, a new slave joins, at this moment, > > allocator will not allocate any resources to the framework too due to the > > 10s filter, but maybe the new slave has the resources needed by the > > framework. So my question is whether this is a reasonable behavior, do we > > filter too much for the framework in this case? > > Qian Zhang wrote: > In this case, do we need to cancel the filter once it has taken effect > for one time and last for long enough time? > > Alexander Rukletsov wrote: > Your concern is valid and we indeed may filter too much. I wonder how > probable is your scenario in real-world setups. > > Our intention is "filter for X seconds but at least for one allocation > touching filtered agent". What we have here is more of a hack and I'd rather > remove `std::max()` in favor of a proper fix, which is allocating on resource > recovery (MESOS-3078). Does a TODO I left in the code explan it? > > Alexander Rukletsov wrote: > To clarify Qian's concern and my answer: filters are set per-agent basis, > so a new agent joining the cluster won't be filtered by any existing filters. > However, we indeed may filter longer than asked by a framework, but I think > being precise about the filter duration is less important than making the > refused resources available for other frameworks.
Yes, I agree it does make sense, thanks for the clarification! - Qian ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/42355/#review115315 ----------------------------------------------------------- On Jan. 20, 2016, 7:32 a.m., Alexander Rukletsov wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/42355/ > ----------------------------------------------------------- > > (Updated Jan. 20, 2016, 7:32 a.m.) > > > Review request for mesos, Ben Mahler and Joris Van Remoortere. > > > Bugs: MESOS-4302 > https://issues.apache.org/jira/browse/MESOS-4302 > > > Repository: mesos > > > Description > ------- > > Without the timeout, we rely on filter expiration only. This guarantees > that filter removal is scheduled after `allocate()` if the allocator is > backlogged given default parameters are used. Additionally we ensure the > filter timeout is at least as big as the allocation interval. > > > Diffs > ----- > > src/master/allocator/mesos/hierarchical.cpp > 48acde69b1a2f305b568a7e322a58708063dd30a > src/tests/hierarchical_allocator_tests.cpp > 9362dd306497ba01e0f387c3862456cdcac6f863 > > Diff: https://reviews.apache.org/r/42355/diff/ > > > Testing > ------- > > On Mac OS 10.10.4: > > `make check` > > `GTEST_FILTER="HierarchicalAllocatorTest.FilterTimeout" ./bin/mesos-tests.sh > --gtest_repeat=100 --gtest_break_on_failure` passes with the patch and fails > without. > > `GTEST_FILTER="HierarchicalAllocatorTest.*" ./bin/mesos-tests.sh > --gtest_repeat=100 --gtest_break_on_failure` > > > Thanks, > > Alexander Rukletsov > >