Re: Review Request 42355: Removed the timeout from the filter.

Qian Zhang Wed, 20 Jan 2016 19:50:15 -0800


> On Jan. 20, 2016, 9:49 a.m., Qian Zhang wrote:
> > One question: Say allocation interval is 10s, at the time 5s, framework 
> > sets a filter with 3s, so with this patch, we will expire the filter 10s 
> > (max(10, 3)) later, i.e., at the time 15s. Then at the time of 10s (the 
> > next allocation cycle), allocator will not allocate any resources to the 
> > framework due to the 10s filter which is good and is the issue that we 
> > intend to fix. And then in the time 12, a new slave joins, at this moment, 
> > allocator will not allocate any resources to the framework too due to the 
> > 10s filter, but maybe the new slave has the resources needed by the 
> > framework. So my question is whether this is a reasonable behavior, do we 
> > filter too much for the framework in this case?
> 
> Qian Zhang wrote:
>     In this case, do we need to cancel the filter once it has taken effect 
> for one time and last for long enough time?
> 
> Alexander Rukletsov wrote:
>     Your concern is valid and we indeed may filter too much. I wonder how 
> probable is your scenario in real-world setups.
>     
>     Our intention is "filter for X seconds but at least for one allocation 
> touching filtered agent". What we have here is more of a hack and I'd rather 
> remove `std::max()` in favor of a proper fix, which is allocating on resource 
> recovery (MESOS-3078). Does a TODO I left in the code explan it?
> 
> Alexander Rukletsov wrote:
>     To clarify Qian's concern and my answer: filters are set per-agent basis, 
> so a new agent joining the cluster won't be filtered by any existing filters. 
> However, we indeed may filter longer than asked by a framework, but I think 
> being precise about the filter duration is less important than making the 
> refused resources available for other frameworks.


Yes, I agree it does make sense, thanks for the clarification!


- Qian


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/42355/#review115315
-----------------------------------------------------------


On Jan. 20, 2016, 7:32 a.m., Alexander Rukletsov wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/42355/
> -----------------------------------------------------------
> 
> (Updated Jan. 20, 2016, 7:32 a.m.)
> 
> 
> Review request for mesos, Ben Mahler and Joris Van Remoortere.
> 
> 
> Bugs: MESOS-4302
>     https://issues.apache.org/jira/browse/MESOS-4302
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Without the timeout, we rely on filter expiration only. This guarantees
> that filter removal is scheduled after `allocate()` if the allocator is
> backlogged given default parameters are used. Additionally we ensure the
> filter timeout is at least as big as the allocation interval.
> 
> 
> Diffs
> -----
> 
>   src/master/allocator/mesos/hierarchical.cpp 
> 48acde69b1a2f305b568a7e322a58708063dd30a 
>   src/tests/hierarchical_allocator_tests.cpp 
> 9362dd306497ba01e0f387c3862456cdcac6f863 
> 
> Diff: https://reviews.apache.org/r/42355/diff/
> 
> 
> Testing
> -------
> 
> On Mac OS 10.10.4:
> 
> `make check`
> 
> `GTEST_FILTER="HierarchicalAllocatorTest.FilterTimeout" ./bin/mesos-tests.sh 
> --gtest_repeat=100 --gtest_break_on_failure` passes with the patch and fails 
> without.
> 
> `GTEST_FILTER="HierarchicalAllocatorTest.*" ./bin/mesos-tests.sh 
> --gtest_repeat=100 --gtest_break_on_failure`
> 
> 
> Thanks,
> 
> Alexander Rukletsov
> 
>

Re: Review Request 42355: Removed the timeout from the filter.

Reply via email to