> On May 11, 2016, 3:49 p.m., Jiang Yan Xu wrote:
> > Looks like the SHARE and UNSHARE operations are not in this patch?

Yes that is correct.


> On May 11, 2016, 3:49 p.m., Jiang Yan Xu wrote:
> > src/common/resources.cpp, line 1328
> > <https://reviews.apache.org/r/45961/diff/2/?file=1365141#file1365141line1328>
> >
> >     Should we require persistent volumes to be unshared before we destroy 
> > them?
> >     
> >     Here the concern is that we are clearing things in DESTROY that are not 
> > added by its pairing operation CREATE.

Since SHARE and UNSHARE apis are not supported yet, the only way to mark a 
resource as shared is in CREATE so clearing it in DESTROY seems correct.


> On May 11, 2016, 3:49 p.m., Jiang Yan Xu wrote:
> > src/common/resources_utils.cpp, lines 71-74
> > <https://reviews.apache.org/r/45961/diff/2/?file=1365143#file1365143line71>
> >
> >     Do we need `Resources::isShared(resource)`?
> >     
> >     If we don't need that method, here we can actually just 
> > `stripped.clear_shared()`.

I will remove the check for `if (Resources::isShareable(resource))` [now 
`Resources::isShared(resource)`] and always do a `stripped.clear_shared()`. I 
would keep the `Resources::isShared()` method since that is the pattern being 
used for `Resources::shared()` and `Resources::nonShared()`, which is how it is 
done for revocable resources as well.


> On May 11, 2016, 3:49 p.m., Jiang Yan Xu wrote:
> > include/mesos/resources.hpp, lines 424-430
> > <https://reviews.apache.org/r/45961/diff/2/?file=1365139#file1365139line424>
> >
> >     See my comment at the definition.

Refactored this within sorter.


> On May 11, 2016, 3:49 p.m., Jiang Yan Xu wrote:
> > src/common/resources.cpp, lines 1464-1496
> > <https://reviews.apache.org/r/45961/diff/2/?file=1365141#file1365141line1464>
> >
> >     This concept of weightedGet tightly couples with the sorter logic which 
> > is not used elsewhere so it shouldn't be pulled out.
> >     
> >     Here to have something that's more easily explainable in a generic way, 
> > perhaps we can introduce something like
> >     
> >     
> >     ```
> >     // Returns the count of the target Resource in the Resources.
> >     size_t Resources::count(const Resource& target) const
> >     {
> >       // Search the list of resources in the collection.
> >       // For shared resources, return the Resource_::sharedCount but 
> > because this
> >       // is strictly an internal optimzation, the caller just needs to 
> > refer to this
> >       // as "the count".
> >       // For nonshared resources, given how we collapse addable resources, 
> > the result
> >       // is at most 1.
> >     }
> >     ```
> >     
> >     I feel API is easiler to explain in a general sense for the callers and 
> > the implementation is easy to explain for readers of the cpp file without 
> > going to through the DRF logic.
> >     
> >     Chatted with Anindya offline and we are still debating whether we can 
> > avoid even having `count` altogether but I am writing down my thoughts here.
> >     
> >     As to whether we are leaking internal information, I don't feel like 
> > this is the case: we are not asking the caller to carefully manipulating 
> > the internal state in order to use share resource arithmetics correctly but 
> > rather this provides additional information for logic that doesn't deal 
> > with resource arithmetic. Therefore I don't think we are breaking 
> > abstractions but merely exposing a const `count` method.

Based on our f2f discussion, we decided to add `Resources::count()` which 
returns the shared count for shared resources, and 1 for non-shared resources. 
This API is being used within the sorter in calculation of resource share (in 
DRF).


> On May 11, 2016, 3:49 p.m., Jiang Yan Xu wrote:
> > src/common/resources_utils.cpp, lines 88-119
> > <https://reviews.apache.org/r/45961/diff/2/?file=1365143#file1365143line88>
> >
> >     In the most recent iteration we can just -= from the call sites right?

And hence this function is removed.


> On May 11, 2016, 3:49 p.m., Jiang Yan Xu wrote:
> > src/master/allocator/sorter/drf/sorter.cpp, lines 252-254
> > <https://reviews.apache.org/r/45961/diff/2/?file=1365146#file1365146line252>
> >
> >     What does "included in non shareable resources already." mean?

Removed the erraneous comment.


> On May 11, 2016, 3:49 p.m., Jiang Yan Xu wrote:
> > src/master/allocator/sorter/drf/sorter.cpp, lines 394-402
> > <https://reviews.apache.org/r/45961/diff/2/?file=1365146#file1365146line394>
> >
> >     `allocations[(*it).name].scalarQuantities` has only nonshared resources.

Refactored this to bring the calculation of total scalars within sorter. So, we 
no longer need to add a vector of scalars for each client.


> On May 11, 2016, 3:49 p.m., Jiang Yan Xu wrote:
> > src/master/allocator/sorter/drf/sorter.cpp, lines 427-432
> > <https://reviews.apache.org/r/45961/diff/2/?file=1365146#file1365146line427>
> >
> >     Since `scalarQuantities` doesn't include any shared resources, this is 
> > in fact the allocation for nonshared resources.
> >     
> >     To add shared resource allocation on top of this one way discussed is 
> > to use `contains`, something like:
> >     
> >     ```
> >     foreachpair (const SlaveID& slaveId, 
> >                  const Resources& resources, 
> >                  allocations[name].resources) {
> >       foreach (const Resource& resource, resources.shared()) {
> >         unsigned count = 0;
> >         foreach (const Client& client, clients) {
> >           if (allocations[client.name].resources.contains(slaveId) &&
> >               
> > allocations[client.name].resources[slaveId].contains(resource)) {
> >             ++count;
> >           }
> >         }
> >         CHECK(count > 0);
> >         allocation += resource.scalar() / count;
> >       }
> >     }
> >     ```
> >     
> >     It feels like it's not optimal that we need to go through all slaves 
> > here if very few/no slaves have shared resources on them. Alternatively if 
> > we maintain a separate `hashmap<SlaveID, Resources> sharedResources` in 
> > `DRFSorter::Allocation` and `DRFSorter::Total` and add a `count` method 
> > (described in another comment) in Resources we could do this:
> >     
> >     ```
> >     foreachpair (const SlaveID& slaveId, 
> >                  const Resources& resources, 
> >                  allocations[name].shareResources) {
> >       foreach (const Resource& resource, resources) {
> >         size_t count = total_.shareResources[slaveId].count(resource);
> >         CHECK(count > 0);
> >         allocation += resource.scalar() / count;
> >       }
> >     }
> >     ```
> >     
> >     This requires `allocated` and `unallocated` to update 
> > `total_.shareResources` though. What do you think?

Added `hashmap<SlaveID, Resources> sharedResources` in sorter (but not within 
Total or Allocation). This hashmap keeps track of the shared resources for a 
specific sorter. We add a shared resource into `sharedResources` in 
`allocated()` [if already not present], and remove a shared resource from 
`sharedResources` in `unallocated()`. This hashmap would contain the correct 
number of clients using a shared resource, which is retrieved via the 
`Resources::count()` api.


> On May 11, 2016, 3:49 p.m., Jiang Yan Xu wrote:
> > src/master/master.cpp, lines 3912-3915
> > <https://reviews.apache.org/r/45961/diff/2/?file=1365148#file1365148line3912>
> >
> >     It's not intuitive that you have to -= on part of the resources and += 
> > on others.
> >     
> >     Plus `_offeredResources` is what the master has offered to the 
> > framework and in this method we are subtracting what the task is using from 
> > the offered resources.
> >     
> >     Here let's keep the semantics unchanged.
> >     
> >     If so, then if a task is using a shared volume from an offer, it should 
> > still be taken out of the resources the same way as nonshared resources 
> > (i.e., -=).
> >     
> >     Note that when you call `Framework::removeOffer()` and 
> > `Slave::removeOffer`, shared resources operators should make sure 
> > `Framework::offeredResources`, `Framework::totalOfferedResources` and 
> > `Slave::offeredResources` are adjusted correctly for shared resources.

Refactored this such that `_offeredResources` either denotes a shared resource 
from the offer was not used (ie. `_offeredResources.contains(sharedResource)` 
is true), or a shared resource from the offer was used (ie. 
`_offeredResources.contains(sharedResource)` is false). We do not track the 
actual number of tasks using this shared resoure here (which is tracked in 
`Slave::usedResources`, since `Slave::addTask()` and `Slave::removeTask()` 
updates that and tracked in `hashmap<FrameworkID, Resources> usedResources`).


> On May 11, 2016, 3:49 p.m., Jiang Yan Xu wrote:
> > src/master/master.hpp, lines 163-183
> > <https://reviews.apache.org/r/45961/diff/2/?file=1365147#file1365147line163>
> >
> >     We don't need this in the most recent revision right?

Yes.


> On May 11, 2016, 3:49 p.m., Jiang Yan Xu wrote:
> > src/master/master.cpp, lines 3955-3964
> > <https://reviews.apache.org/r/45961/diff/2/?file=1365148#file1365148line3955>
> >
> >     Here I think this semantics is more straightforward:
> >     
> >     Return the shared resource to the allocator only when the frameowrk is 
> > no longer using a shared resource. i.e., 
> >     
> >     ```
> >       Resources recovered = _offeredResources.nonShared();
> >       foreach (const Resource& resource, _offeredResources.shared()) {
> >         if (!framework.totalUsedResources.contains(resource) &&
> >             !framework.totalOfferedResources.contains(resource) {
> >           recovered += resource;
> >         }
> >       }
> >     ```

Added a function `Resources filterUsedResources(const FrameworkID& frameworkId, 
const Resources& resources)` to achieve accounting based on shared resources 
used.


- Anindya


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/45961/#review129838
-----------------------------------------------------------


On May 22, 2016, 7:11 a.m., Anindya Sinha wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/45961/
> -----------------------------------------------------------
> 
> (Updated May 22, 2016, 7:11 a.m.)
> 
> 
> Review request for mesos and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-4431
>     https://issues.apache.org/jira/browse/MESOS-4431
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Each shared resource is accouted via its share count. This count is
> updated based on the resource operations (such as add and subtract)
> in various scenarios such as task launch and terminate at multiple
> modules such as master, allocator, sorter, etc.
> Allow DESTROY for shared volumes only if share count is 0.
> Since shared resources are available to multiple frameworks of the
> same role simultaneously, we normalize it with a weight equivalent
> to the number of frameworks to which this shared resource has been
> allocated to in the sorter.
> 
> 
> Diffs
> -----
> 
>   src/common/resources.cpp f6ff92b591c15bc8e93fd85e1896349c3a7bb968 
>   src/common/resources_utils.cpp 8e881a09500b0966b577e514077460f724868a8d 
>   src/master/allocator/mesos/hierarchical.cpp 
> 8b7b3afb5770c617918ec4864faaf8d8a7a81ef2 
>   src/master/allocator/sorter/drf/sorter.hpp 
> 05d5205d29ad74e01e07c508d88b6f8371541513 
>   src/master/allocator/sorter/drf/sorter.cpp 
> 4306973277b9d32356eed31ceabac09fb2a03e6c 
>   src/master/http.cpp b36b439a1fa07c52146deff2b90728f92676ade3 
>   src/master/master.hpp 1a875c32eddfb6d884e3d0dda7f5716ee53966c3 
>   src/master/master.cpp 0c05938de3e4eaeea2187559e81559f85318fa73 
>   src/master/validation.hpp f29f9753c75e5abc3402ed23381edcea26517ab3 
>   src/master/validation.cpp f490b899758bdac9676a6f6939918efa6ac52781 
>   src/tests/master_validation_tests.cpp 
> ca4442aa1ef0087a7d058d1b3aa430a1dbc16960 
>   src/tests/sorter_tests.cpp eb207a36c18198588cd8a98b3860a66c2ff7a641 
>   src/v1/resources.cpp 8c3f2d1c1529915a59d47fe37bb3fc7a3267079a 
> 
> Diff: https://reviews.apache.org/r/45961/diff/
> 
> 
> Testing
> -------
> 
> Tests successful.
> 
> 
> Thanks,
> 
> Anindya Sinha
> 
>

Reply via email to