> On July 12, 2016, 7:26 a.m., Jiang Yan Xu wrote: > > Adding to the previous partial review after discussion. > > > > ## Invariant checking and documentation > > > > An overall comment is that I think since we keep track of shared resource > > across Master, allocator and sorter, each for a different aspect (allocated > > vs. used vs. available, etc.) and with different semantics and expactations > > on uniquesness/number of copies etc., we should very explicitly document > > these variables and method arguments and check their invariants. > > > > Specifically, the following diagram helped me understand the relationship > > between these variables. > > > > Sorter > > | > > |- allocations: hashmap<std::string, Allocation> > > | > > |- resources: hashmap<SlaveID, > > Resources> > > |- scalarQuantities > > > > `Allocation.resources` expects 1 copy of each shared resource because it's > > tracking whether a client has been allocated this resource and doesn't care > > about how many times it's been allocated or used by the client. Therefore > > distinct shared resources is an invariant that we should check in > > `allocated()`. > > > > Allocator > > | > > |- slaves: hashmap<SlaveID, Slave> > > | > > |- allocated: Resources > > > > `Slave.allocated` expects 1 copy of each shared resource **per client** and > > it's grouped into one Resources object. > > > > Master > > | > > |- Slave > > | | > > | |- usedResources: hashmap<FrameworkID, Resources> > > | > > |- Framework > > | > > |- usedResources: hashmap<SlaveID, Resources> > > > > `Slave.usedResources` and `Framework.usedResources` expect 1 copy of each > > shared resource **per task** as they are tracking "usage".
Yes, I will update the comments to reflect scope of shared count (ie. copies) in the context of the `Resources` objects used in master, allocator and sorter. So, now with the current version: 1) Sorter=> allocations (resources, scalarQuantities): 1 copy of each shared resource per allocation (that has not been recovered) by a client. 2) Allocator=> slaves[].allocated: 1 copy of each shared resource per allocation (that has not been recovered) by a framework. 3) Master=> usedResources (in Slave and Framework): 1 copy of each shared resource per task as they are tracking "usage". > On July 12, 2016, 7:26 a.m., Jiang Yan Xu wrote: > > src/master/master.hpp, line 274 > > <https://reviews.apache.org/r/45961/diff/7/?file=1438751#file1438751line274> > > > > Chatted offline. We should refactor this into a > > `Master::recoverResources()` call to encapsulate this logic and call > > `allocator->recoverResources()` inside with adjusted arguments. Wrapped `Allocator::recoverResources()` within `Master::recoverResources()`. > On July 12, 2016, 7:26 a.m., Jiang Yan Xu wrote: > > src/master/allocator/mesos/hierarchical.cpp, lines 450-459 > > <https://reviews.apache.org/r/45961/diff/7/?file=1438747#file1438747line450> > > > > Seems like in `slaves[slaveId].allocated` we should be maintaining one > > copy of each shared resource **per framework** but this if condition here > > assumes one copy of each shared resource across framework. Yes, this was done when we maintained atmost a single copy of a shared resource. Now with 1 copy of shared resource per framework, we need to just sum up all resources. > On July 12, 2016, 7:26 a.m., Jiang Yan Xu wrote: > > src/tests/master_validation_tests.cpp, lines 630-632 > > <https://reviews.apache.org/r/45961/diff/7/?file=1438755#file1438755line630> > > > > Here we are testing that can't destroy a nonshared persistent volume > > `disk2` because it's in use? > > > > Perhaps it's still reasonable to test it (even thought it can't happen > > right now) but it's not relevant to this test? Ok. Since it is not a use case that happens in the `DESTROY` callflow, I think we can remove it. > On July 12, 2016, 7:26 a.m., Jiang Yan Xu wrote: > > src/tests/sorter_tests.cpp, lines 520-521 > > <https://reviews.apache.org/r/45961/diff/7/?file=1438756#file1438756line520> > > > > Even if a and b both use 1/10 of the persistent volume, shouldn't they > > each get 0.1 share? Yes, the shares is true based on when we were evenly distributing the shares across frameworks to which the shared resources were allocated to. I did not update the test in this commit so you do not see it. I will fix this to match the current behavior. > On July 12, 2016, 7:26 a.m., Jiang Yan Xu wrote: > > src/tests/sorter_tests.cpp, lines 499-500 > > <https://reviews.apache.org/r/45961/diff/7/?file=1438756#file1438756line499> > > > > This is not a realistic example. "id1" is already a volume, you > > shouldn't claim 1/10 of a volume. (I don't see us validating this though) Yes, a typo in the total which is now fixed. With regards to validating this case, it is captured within `internal::validateResourceUsage()` called in: ``` Option<Error> validate(const TaskInfo& task, Framework* framework, Slave* slave, const Resources& offered) ``` We do not go through that flow in this specific test; however, the real offer-accept cycle goes through that though. - Anindya ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/45961/#review141757 ----------------------------------------------------------- On July 17, 2016, 6:28 p.m., Anindya Sinha wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/45961/ > ----------------------------------------------------------- > > (Updated July 17, 2016, 6:28 p.m.) > > > Review request for mesos, Benjamin Mahler, Joris Van Remoortere, Michael > Park, and Jiang Yan Xu. > > > Bugs: MESOS-4431 > https://issues.apache.org/jira/browse/MESOS-4431 > > > Repository: mesos > > > Description > ------- > > o Each shared resource is accouted via its share count. This count is > updated based on the resource operations (such as add and subtract) > in various scenarios such as task launch and terminate at multiple > modules such as master, allocator, sorter, etc. > o Only allow DESTROY if there are no running or pending tasks using > the volume. However, if the volume is in a pending offer to one or > more frameworks, resind that offer before processing the DESTROY. > > > Diffs > ----- > > src/common/resources.cpp f6ff92b591c15bc8e93fd85e1896349c3a7bb968 > src/common/resources_utils.cpp 8e881a09500b0966b577e514077460f724868a8d > src/master/allocator/mesos/hierarchical.hpp > b72ba16277a3210e4d309b616d185a10e2029a66 > src/master/allocator/mesos/hierarchical.cpp > 7d4064535a20b93950f5a95eef1ad3f0d37d305b > src/master/allocator/sorter/drf/sorter.hpp > bc6bfb2d5d3b32d55be055a0514861b4e7d889bb > src/master/allocator/sorter/drf/sorter.cpp > ac85b327fc33d34246788e6a8c8bf5a486c61434 > src/master/http.cpp d1fe22bd2d9eb6bd82e32cf01720bbe3d8be26d5 > src/master/master.hpp ac998b1f5b305a9bff9d9e5cd205a6c3481f9b38 > src/master/master.cpp 61eaa4a92741a2d1e3f6624c555a40f6b9240d90 > src/master/quota_handler.cpp bf6a613a7bb3c62fd77d1ffde3170749d6c21fa2 > src/master/validation.hpp 43d876b00511d4ef652c2685d4950c994b5eef31 > src/master/validation.cpp 50ba372d1301d16d1738a3e4f4882b51f9ce06cd > src/master/weights_handler.cpp 9a901e71ba2fbf2ca1c02f658a72d44cfaa5ec62 > src/tests/master_validation_tests.cpp > 9eb82a569f7c48caa96d4d54a93b199ccac74385 > src/tests/sorter_tests.cpp b0e5ef8a55bbcca3553a221bd5691a9c801a04f7 > src/v1/resources.cpp 8c3f2d1c1529915a59d47fe37bb3fc7a3267079a > > Diff: https://reviews.apache.org/r/45961/diff/ > > > Testing > ------- > > Tests successful. > > > Thanks, > > Anindya Sinha > >