> On May 11, 2016, 3:49 p.m., Jiang Yan Xu wrote: > > Looks like the SHARE and UNSHARE operations are not in this patch?
Yes that is correct. > On May 11, 2016, 3:49 p.m., Jiang Yan Xu wrote: > > src/common/resources.cpp, line 1328 > > <https://reviews.apache.org/r/45961/diff/2/?file=1365141#file1365141line1328> > > > > Should we require persistent volumes to be unshared before we destroy > > them? > > > > Here the concern is that we are clearing things in DESTROY that are not > > added by its pairing operation CREATE. Since SHARE and UNSHARE apis are not supported yet, the only way to mark a resource as shared is in CREATE so clearing it in DESTROY seems correct. > On May 11, 2016, 3:49 p.m., Jiang Yan Xu wrote: > > src/common/resources_utils.cpp, lines 71-74 > > <https://reviews.apache.org/r/45961/diff/2/?file=1365143#file1365143line71> > > > > Do we need `Resources::isShared(resource)`? > > > > If we don't need that method, here we can actually just > > `stripped.clear_shared()`. I will remove the check for `if (Resources::isShareable(resource))` [now `Resources::isShared(resource)`] and always do a `stripped.clear_shared()`. I would keep the `Resources::isShared()` method since that is the pattern being used for `Resources::shared()` and `Resources::nonShared()`, which is how it is done for revocable resources as well. > On May 11, 2016, 3:49 p.m., Jiang Yan Xu wrote: > > include/mesos/resources.hpp, lines 424-430 > > <https://reviews.apache.org/r/45961/diff/2/?file=1365139#file1365139line424> > > > > See my comment at the definition. Refactored this within sorter. > On May 11, 2016, 3:49 p.m., Jiang Yan Xu wrote: > > src/common/resources.cpp, lines 1464-1496 > > <https://reviews.apache.org/r/45961/diff/2/?file=1365141#file1365141line1464> > > > > This concept of weightedGet tightly couples with the sorter logic which > > is not used elsewhere so it shouldn't be pulled out. > > > > Here to have something that's more easily explainable in a generic way, > > perhaps we can introduce something like > > > > > > ``` > > // Returns the count of the target Resource in the Resources. > > size_t Resources::count(const Resource& target) const > > { > > // Search the list of resources in the collection. > > // For shared resources, return the Resource_::sharedCount but > > because this > > // is strictly an internal optimzation, the caller just needs to > > refer to this > > // as "the count". > > // For nonshared resources, given how we collapse addable resources, > > the result > > // is at most 1. > > } > > ``` > > > > I feel API is easiler to explain in a general sense for the callers and > > the implementation is easy to explain for readers of the cpp file without > > going to through the DRF logic. > > > > Chatted with Anindya offline and we are still debating whether we can > > avoid even having `count` altogether but I am writing down my thoughts here. > > > > As to whether we are leaking internal information, I don't feel like > > this is the case: we are not asking the caller to carefully manipulating > > the internal state in order to use share resource arithmetics correctly but > > rather this provides additional information for logic that doesn't deal > > with resource arithmetic. Therefore I don't think we are breaking > > abstractions but merely exposing a const `count` method. Based on our f2f discussion, we decided to add `Resources::count()` which returns the shared count for shared resources, and 1 for non-shared resources. This API is being used within the sorter in calculation of resource share (in DRF). > On May 11, 2016, 3:49 p.m., Jiang Yan Xu wrote: > > src/common/resources_utils.cpp, lines 88-119 > > <https://reviews.apache.org/r/45961/diff/2/?file=1365143#file1365143line88> > > > > In the most recent iteration we can just -= from the call sites right? And hence this function is removed. > On May 11, 2016, 3:49 p.m., Jiang Yan Xu wrote: > > src/master/allocator/sorter/drf/sorter.cpp, lines 252-254 > > <https://reviews.apache.org/r/45961/diff/2/?file=1365146#file1365146line252> > > > > What does "included in non shareable resources already." mean? Removed the erraneous comment. > On May 11, 2016, 3:49 p.m., Jiang Yan Xu wrote: > > src/master/allocator/sorter/drf/sorter.cpp, lines 394-402 > > <https://reviews.apache.org/r/45961/diff/2/?file=1365146#file1365146line394> > > > > `allocations[(*it).name].scalarQuantities` has only nonshared resources. Refactored this to bring the calculation of total scalars within sorter. So, we no longer need to add a vector of scalars for each client. > On May 11, 2016, 3:49 p.m., Jiang Yan Xu wrote: > > src/master/allocator/sorter/drf/sorter.cpp, lines 427-432 > > <https://reviews.apache.org/r/45961/diff/2/?file=1365146#file1365146line427> > > > > Since `scalarQuantities` doesn't include any shared resources, this is > > in fact the allocation for nonshared resources. > > > > To add shared resource allocation on top of this one way discussed is > > to use `contains`, something like: > > > > ``` > > foreachpair (const SlaveID& slaveId, > > const Resources& resources, > > allocations[name].resources) { > > foreach (const Resource& resource, resources.shared()) { > > unsigned count = 0; > > foreach (const Client& client, clients) { > > if (allocations[client.name].resources.contains(slaveId) && > > > > allocations[client.name].resources[slaveId].contains(resource)) { > > ++count; > > } > > } > > CHECK(count > 0); > > allocation += resource.scalar() / count; > > } > > } > > ``` > > > > It feels like it's not optimal that we need to go through all slaves > > here if very few/no slaves have shared resources on them. Alternatively if > > we maintain a separate `hashmap<SlaveID, Resources> sharedResources` in > > `DRFSorter::Allocation` and `DRFSorter::Total` and add a `count` method > > (described in another comment) in Resources we could do this: > > > > ``` > > foreachpair (const SlaveID& slaveId, > > const Resources& resources, > > allocations[name].shareResources) { > > foreach (const Resource& resource, resources) { > > size_t count = total_.shareResources[slaveId].count(resource); > > CHECK(count > 0); > > allocation += resource.scalar() / count; > > } > > } > > ``` > > > > This requires `allocated` and `unallocated` to update > > `total_.shareResources` though. What do you think? Added `hashmap<SlaveID, Resources> sharedResources` in sorter (but not within Total or Allocation). This hashmap keeps track of the shared resources for a specific sorter. We add a shared resource into `sharedResources` in `allocated()` [if already not present], and remove a shared resource from `sharedResources` in `unallocated()`. This hashmap would contain the correct number of clients using a shared resource, which is retrieved via the `Resources::count()` api. > On May 11, 2016, 3:49 p.m., Jiang Yan Xu wrote: > > src/master/master.cpp, lines 3912-3915 > > <https://reviews.apache.org/r/45961/diff/2/?file=1365148#file1365148line3912> > > > > It's not intuitive that you have to -= on part of the resources and += > > on others. > > > > Plus `_offeredResources` is what the master has offered to the > > framework and in this method we are subtracting what the task is using from > > the offered resources. > > > > Here let's keep the semantics unchanged. > > > > If so, then if a task is using a shared volume from an offer, it should > > still be taken out of the resources the same way as nonshared resources > > (i.e., -=). > > > > Note that when you call `Framework::removeOffer()` and > > `Slave::removeOffer`, shared resources operators should make sure > > `Framework::offeredResources`, `Framework::totalOfferedResources` and > > `Slave::offeredResources` are adjusted correctly for shared resources. Refactored this such that `_offeredResources` either denotes a shared resource from the offer was not used (ie. `_offeredResources.contains(sharedResource)` is true), or a shared resource from the offer was used (ie. `_offeredResources.contains(sharedResource)` is false). We do not track the actual number of tasks using this shared resoure here (which is tracked in `Slave::usedResources`, since `Slave::addTask()` and `Slave::removeTask()` updates that and tracked in `hashmap<FrameworkID, Resources> usedResources`). > On May 11, 2016, 3:49 p.m., Jiang Yan Xu wrote: > > src/master/master.hpp, lines 163-183 > > <https://reviews.apache.org/r/45961/diff/2/?file=1365147#file1365147line163> > > > > We don't need this in the most recent revision right? Yes. > On May 11, 2016, 3:49 p.m., Jiang Yan Xu wrote: > > src/master/master.cpp, lines 3955-3964 > > <https://reviews.apache.org/r/45961/diff/2/?file=1365148#file1365148line3955> > > > > Here I think this semantics is more straightforward: > > > > Return the shared resource to the allocator only when the frameowrk is > > no longer using a shared resource. i.e., > > > > ``` > > Resources recovered = _offeredResources.nonShared(); > > foreach (const Resource& resource, _offeredResources.shared()) { > > if (!framework.totalUsedResources.contains(resource) && > > !framework.totalOfferedResources.contains(resource) { > > recovered += resource; > > } > > } > > ``` Added a function `Resources filterUsedResources(const FrameworkID& frameworkId, const Resources& resources)` to achieve accounting based on shared resources used. - Anindya ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/45961/#review129838 ----------------------------------------------------------- On May 22, 2016, 7:11 a.m., Anindya Sinha wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/45961/ > ----------------------------------------------------------- > > (Updated May 22, 2016, 7:11 a.m.) > > > Review request for mesos and Jiang Yan Xu. > > > Bugs: MESOS-4431 > https://issues.apache.org/jira/browse/MESOS-4431 > > > Repository: mesos > > > Description > ------- > > Each shared resource is accouted via its share count. This count is > updated based on the resource operations (such as add and subtract) > in various scenarios such as task launch and terminate at multiple > modules such as master, allocator, sorter, etc. > Allow DESTROY for shared volumes only if share count is 0. > Since shared resources are available to multiple frameworks of the > same role simultaneously, we normalize it with a weight equivalent > to the number of frameworks to which this shared resource has been > allocated to in the sorter. > > > Diffs > ----- > > src/common/resources.cpp f6ff92b591c15bc8e93fd85e1896349c3a7bb968 > src/common/resources_utils.cpp 8e881a09500b0966b577e514077460f724868a8d > src/master/allocator/mesos/hierarchical.cpp > 8b7b3afb5770c617918ec4864faaf8d8a7a81ef2 > src/master/allocator/sorter/drf/sorter.hpp > 05d5205d29ad74e01e07c508d88b6f8371541513 > src/master/allocator/sorter/drf/sorter.cpp > 4306973277b9d32356eed31ceabac09fb2a03e6c > src/master/http.cpp b36b439a1fa07c52146deff2b90728f92676ade3 > src/master/master.hpp 1a875c32eddfb6d884e3d0dda7f5716ee53966c3 > src/master/master.cpp 0c05938de3e4eaeea2187559e81559f85318fa73 > src/master/validation.hpp f29f9753c75e5abc3402ed23381edcea26517ab3 > src/master/validation.cpp f490b899758bdac9676a6f6939918efa6ac52781 > src/tests/master_validation_tests.cpp > ca4442aa1ef0087a7d058d1b3aa430a1dbc16960 > src/tests/sorter_tests.cpp eb207a36c18198588cd8a98b3860a66c2ff7a641 > src/v1/resources.cpp 8c3f2d1c1529915a59d47fe37bb3fc7a3267079a > > Diff: https://reviews.apache.org/r/45961/diff/ > > > Testing > ------- > > Tests successful. > > > Thanks, > > Anindya Sinha > >
