> On July 12, 2016, 7:26 a.m., Jiang Yan Xu wrote:
> > Adding to the previous partial review after discussion.
> > 
> > ## Invariant checking and documentation
> > 
> > An overall comment is that I think since we keep track of shared resource 
> > across Master, allocator and sorter, each for a different aspect (allocated 
> > vs. used vs. available, etc.) and with different semantics and expactations 
> > on uniquesness/number of copies etc., we should very explicitly document 
> > these variables and method arguments and check their invariants.
> > 
> > Specifically, the following diagram helped me understand the relationship 
> > between these variables.
> > 
> > Sorter
> > |
> > |- allocations: hashmap<std::string, Allocation>
> >                                      |
> >                                      |- resources: hashmap<SlaveID, 
> > Resources>
> >                                      |- scalarQuantities 
> > 
> > `Allocation.resources` expects 1 copy of each shared resource because it's 
> > tracking whether a client has been allocated this resource and doesn't care 
> > about how many times it's been allocated or used by the client. Therefore 
> > distinct shared resources is an invariant that we should check in 
> > `allocated()`.
> > 
> > Allocator
> > |
> > |- slaves: hashmap<SlaveID, Slave>
> >                             |
> >                             |- allocated: Resources
> >                             
> > `Slave.allocated` expects 1 copy of each shared resource **per client** and 
> > it's grouped into one Resources object.
> > 
> > Master
> > |
> > |- Slave
> > |  |
> > |  |- usedResources: hashmap<FrameworkID, Resources>
> > |
> > |- Framework
> >    |
> >    |- usedResources: hashmap<SlaveID, Resources>
> >    
> > `Slave.usedResources` and `Framework.usedResources` expect 1 copy of each 
> > shared resource **per task** as they are tracking "usage".

Yes, I will update the comments to reflect scope of shared count (ie. copies) 
in the context of the `Resources` objects used in master, allocator and sorter. 
So, now with the current version:

1) Sorter=> allocations (resources, scalarQuantities): 1 copy of each shared 
resource per allocation (that has not been recovered) by a client.
2) Allocator=> slaves[].allocated: 1 copy of each shared resource per 
allocation (that has not been recovered) by a framework.
3) Master=> usedResources (in Slave and Framework): 1 copy of each shared 
resource per task as they are tracking "usage".


> On July 12, 2016, 7:26 a.m., Jiang Yan Xu wrote:
> > src/master/master.hpp, line 274
> > <https://reviews.apache.org/r/45961/diff/7/?file=1438751#file1438751line274>
> >
> >     Chatted offline. We should refactor this into a 
> > `Master::recoverResources()` call to encapsulate this logic and call 
> > `allocator->recoverResources()` inside with adjusted arguments.

Wrapped `Allocator::recoverResources()` within `Master::recoverResources()`.


> On July 12, 2016, 7:26 a.m., Jiang Yan Xu wrote:
> > src/master/allocator/mesos/hierarchical.cpp, lines 450-459
> > <https://reviews.apache.org/r/45961/diff/7/?file=1438747#file1438747line450>
> >
> >     Seems like in `slaves[slaveId].allocated` we should be maintaining one 
> > copy of each shared resource **per framework** but this if condition here 
> > assumes one copy of each shared resource across framework.

Yes, this was done when we maintained atmost a single copy of a shared 
resource. Now with 1 copy of shared resource per framework, we need to just sum 
up all resources.


> On July 12, 2016, 7:26 a.m., Jiang Yan Xu wrote:
> > src/tests/master_validation_tests.cpp, lines 630-632
> > <https://reviews.apache.org/r/45961/diff/7/?file=1438755#file1438755line630>
> >
> >     Here we are testing that can't destroy a  nonshared persistent volume 
> > `disk2` because it's in use?
> >     
> >     Perhaps it's still reasonable to test it (even thought it can't happen 
> > right now) but it's not relevant to this test?

Ok. Since it is not a use case that happens in the `DESTROY` callflow, I think 
we can remove it.


> On July 12, 2016, 7:26 a.m., Jiang Yan Xu wrote:
> > src/tests/sorter_tests.cpp, lines 520-521
> > <https://reviews.apache.org/r/45961/diff/7/?file=1438756#file1438756line520>
> >
> >     Even if a and b both use 1/10 of the persistent volume, shouldn't they 
> > each get 0.1 share?

Yes, the shares is true based on when we were evenly distributing the shares 
across frameworks to which the shared resources were allocated to. I did not 
update the test in this commit so you do not see it. I will fix this to match 
the current behavior.


> On July 12, 2016, 7:26 a.m., Jiang Yan Xu wrote:
> > src/tests/sorter_tests.cpp, lines 499-500
> > <https://reviews.apache.org/r/45961/diff/7/?file=1438756#file1438756line499>
> >
> >     This is not a realistic example. "id1" is already a volume, you 
> > shouldn't claim 1/10 of a volume. (I don't see us validating this though)

Yes, a typo in the total which is now fixed. With regards to validating this 
case, it is captured within `internal::validateResourceUsage()` called in:

```
Option<Error> validate(const TaskInfo& task, Framework* framework, Slave* 
slave, const Resources& offered)
```

We do not go through that flow in this specific test; however, the real 
offer-accept cycle goes through that though.


- Anindya


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/45961/#review141757
-----------------------------------------------------------


On July 17, 2016, 6:28 p.m., Anindya Sinha wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/45961/
> -----------------------------------------------------------
> 
> (Updated July 17, 2016, 6:28 p.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Joris Van Remoortere, Michael 
> Park, and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-4431
>     https://issues.apache.org/jira/browse/MESOS-4431
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> o Each shared resource is accouted via its share count. This count is
>   updated based on the resource operations (such as add and subtract)
>   in various scenarios such as task launch and terminate at multiple
>   modules such as master, allocator, sorter, etc.
> o Only allow DESTROY if there are no running or pending tasks using
>   the volume. However, if the volume is in a pending offer to one or
>   more frameworks, resind that offer before processing the DESTROY.
> 
> 
> Diffs
> -----
> 
>   src/common/resources.cpp f6ff92b591c15bc8e93fd85e1896349c3a7bb968 
>   src/common/resources_utils.cpp 8e881a09500b0966b577e514077460f724868a8d 
>   src/master/allocator/mesos/hierarchical.hpp 
> b72ba16277a3210e4d309b616d185a10e2029a66 
>   src/master/allocator/mesos/hierarchical.cpp 
> 7d4064535a20b93950f5a95eef1ad3f0d37d305b 
>   src/master/allocator/sorter/drf/sorter.hpp 
> bc6bfb2d5d3b32d55be055a0514861b4e7d889bb 
>   src/master/allocator/sorter/drf/sorter.cpp 
> ac85b327fc33d34246788e6a8c8bf5a486c61434 
>   src/master/http.cpp d1fe22bd2d9eb6bd82e32cf01720bbe3d8be26d5 
>   src/master/master.hpp ac998b1f5b305a9bff9d9e5cd205a6c3481f9b38 
>   src/master/master.cpp 61eaa4a92741a2d1e3f6624c555a40f6b9240d90 
>   src/master/quota_handler.cpp bf6a613a7bb3c62fd77d1ffde3170749d6c21fa2 
>   src/master/validation.hpp 43d876b00511d4ef652c2685d4950c994b5eef31 
>   src/master/validation.cpp 50ba372d1301d16d1738a3e4f4882b51f9ce06cd 
>   src/master/weights_handler.cpp 9a901e71ba2fbf2ca1c02f658a72d44cfaa5ec62 
>   src/tests/master_validation_tests.cpp 
> 9eb82a569f7c48caa96d4d54a93b199ccac74385 
>   src/tests/sorter_tests.cpp b0e5ef8a55bbcca3553a221bd5691a9c801a04f7 
>   src/v1/resources.cpp 8c3f2d1c1529915a59d47fe37bb3fc7a3267079a 
> 
> Diff: https://reviews.apache.org/r/45961/diff/
> 
> 
> Testing
> -------
> 
> Tests successful.
> 
> 
> Thanks,
> 
> Anindya Sinha
> 
>

Reply via email to