> On July 11, 2016, 6:17 p.m., Jiang Yan Xu wrote: > > src/master/master.cpp, lines 3499-3502 > > <https://reviews.apache.org/r/45961/diff/4-7/?file=1416585#file1416585line3499> > > > > `task.has_executor()` doesn't always lead to additional resources being > > used/charged against the framework. See `Master::addTask()`. It will suck a > > lot if we have to duplicate that logic here though so we might have to > > consider doing the pending task refactor first...
Refactored `Master::addTask()` and moved calculation of resources for the task into a separate function `Master::totalTaskResources()`, and used this function to determine the resources for pending tasks (in `Slave::pendingResources`). > On July 11, 2016, 6:17 p.m., Jiang Yan Xu wrote: > > src/master/master.cpp, lines 3913-3916 > > <https://reviews.apache.org/r/45961/diff/4-7/?file=1416585#file1416585line3913> > > > > Need to check whether this executor actually consumes resources. Please refer to earlier comment regarding `Master::totalTaskResources()`. > On July 11, 2016, 6:17 p.m., Jiang Yan Xu wrote: > > src/master/master.cpp, lines 4291-4301 > > <https://reviews.apache.org/r/45961/diff/7/?file=1438752#file1438752line4291> > > > > Ditto about `task.has_executor()`. Please refer to earlier comment regarding `Master::totalTaskResources()`. > On July 11, 2016, 6:17 p.m., Jiang Yan Xu wrote: > > src/master/master.cpp, line 3497 > > <https://reviews.apache.org/r/45961/diff/7/?file=1438752#file1438752line3497> > > > > Checking `task.has_executor()` is insufficient as a condition to add > > the executor's resources because they may not consume additional resources. Please refer to earlier comment regarding `Master::totalTaskResources()`. > On July 11, 2016, 6:17 p.m., Jiang Yan Xu wrote: > > src/master/master.cpp, line 3911 > > <https://reviews.apache.org/r/45961/diff/7/?file=1438752#file1438752line3911> > > > > Ditto to my comment above, we don't know if the task's executor > > actually consumes resources by just checking `task.has_executor()`. Please refer to earlier comment regarding `Master::totalTaskResources()`. > On July 11, 2016, 6:17 p.m., Jiang Yan Xu wrote: > > src/master/master.cpp, lines 3601-3602 > > <https://reviews.apache.org/r/45961/diff/4-7/?file=1416585#file1416585line3601> > > > > Wrapped `Allocator::recoverResources()` into a new function `Master::recoverResources()` with the same arguments. Inside `Master::recoverResources()`, we check for `slave == nullptr` to determine what resources are recoverable, and call `Allocator::recoverResources()` thereafter. > On July 11, 2016, 6:17 p.m., Jiang Yan Xu wrote: > > src/master/master.hpp, line 274 > > <https://reviews.apache.org/r/45961/diff/7/?file=1438751#file1438751line274> > > > > Add some comment to help explain why we need this method: > > > > ``` > > // Return a subset of `resources` offered to `frameworkId` that can > > // be recovered. Right now we filter out shared resources that are > > // still used by `frameworkId`. > > // NOTE: A framework can reuse a shared resource to launch multiple > > // tasks and the allocator only recovers a shared resource allocated > > // to the framework if it's not used by any task of the framework. > > ``` Refactored this within `Master::recoverResources()` and added comments within that function. > On July 11, 2016, 6:17 p.m., Jiang Yan Xu wrote: > > src/master/master.cpp, line 3411 > > <https://reviews.apache.org/r/45961/diff/7/?file=1438752#file1438752line3411> > > > > Shouldn't this go through `recoverable()` as well? > > > > It'll be helpful if add some comments on the `recoverResources` to > > spell out what exactly is expected. Wrapped `Allocator::recoverResources()` into `Master::recoverResources()`. > On July 11, 2016, 6:17 p.m., Jiang Yan Xu wrote: > > src/master/master.cpp, lines 3599-3600 > > <https://reviews.apache.org/r/45961/diff/7/?file=1438752#file1438752line3599> > > > > At least we need a comment on why it's safe to recover all > > `offeredResources` if `slave == nullptr`: if slave is nullptr, no > > `usedResources` is on it so all resources should be recoverable. > > > > But honestly this looks odd and we need to invoke this conditional > > operator in multiple places. Adding the same comment to all places this is > > called is also very redundant. > > > > Perhaps `recoverable` can be moved to Master and hide this detail about > > `slave == nullptr`? Perhaps we should just wrap around > > `allocator->recoverResources` in Master to insert this logic. Wrapped `Allocator::recoverResources()` into `Master::recoverResources()`. > On July 11, 2016, 6:17 p.m., Jiang Yan Xu wrote: > > src/master/master.cpp, lines 3643-3644 > > <https://reviews.apache.org/r/45961/diff/7/?file=1438752#file1438752line3643> > > > > Ditto. Wrapped `Allocator::recoverResources()` into `Master::recoverResources()`. > On July 11, 2016, 6:17 p.m., Jiang Yan Xu wrote: > > src/master/master.cpp, lines 4147-4148 > > <https://reviews.apache.org/r/45961/diff/7/?file=1438752#file1438752line4147> > > > > Ditto to comments above about the same code. Wrapped `Allocator::recoverResources()` into `Master::recoverResources()`. > On July 11, 2016, 6:17 p.m., Jiang Yan Xu wrote: > > src/master/validation.hpp, lines 157-158 > > <https://reviews.apache.org/r/45961/diff/7/?file=1438753#file1438753line157> > > > > Don't we always have the two additional arugments, even thought they > > could be empty (but not None)? The reason is to satisfy a couple of (6 in total) test cases which tests the validation without involvement of framework's view of used and pending resources. On second thought, I updated the test cases and made these 2 args not have option. So, now we have: ``` Option<Error> validate( const Offer::Operation::Destroy& destroy, const Resources& checkpointedResources, const hashmap<FrameworkID, Resources>& usedResources, const hashmap<FrameworkID, Resources>& pendingResources); ``` > On July 11, 2016, 6:17 p.m., Jiang Yan Xu wrote: > > src/master/allocator/mesos/hierarchical.cpp, lines 1386-1400 > > <https://reviews.apache.org/r/45961/diff/7/?file=1438747#file1438747line1386> > > > > IIUC the comment still means "this is done to make sure shared > > resources are counted at most once in the framework's allocation". > > > > The last revision had: > > > > ``` > > slaves[slaveId].allocated -= resources.shared(); > > slaves[slaveId].allocated += resources; > > ``` > > > > How is this longer version different? The earlier code forces `slaves[slaveId].allocated` to have atmost a single copy of a shared resource. This version ensures that `slaves[slaveId].allocated` contains shared resources that are counted at most once for each framework that resource is allocated to. > On July 11, 2016, 6:17 p.m., Jiang Yan Xu wrote: > > src/master/allocator/mesos/hierarchical.cpp, lines 1596-1606 > > <https://reviews.apache.org/r/45961/diff/7/?file=1438747#file1438747line1596> > > > > Ditto to my comment on the same code above. Same comment as in the prior comment. The earlier code forces `slaves[slaveId].allocated` to have atmost a single copy of a shared resource. This version ensures that `slaves[slaveId].allocated` contains shared resources that are counted at most once for each framework that resource is allocated to. > On July 11, 2016, 6:17 p.m., Jiang Yan Xu wrote: > > src/master/allocator/mesos/hierarchical.cpp, lines 1331-1340 > > <https://reviews.apache.org/r/45961/diff/7/?file=1438747#file1438747line1331> > > > > Given this is the 1st stage of the allocate cycle there's no chance a > > shared resource could be in `offeredSharedResources` already? As discussed offline, we would need this snippet to address scenarios where we have multiple frameworks of the same role. > On July 11, 2016, 6:17 p.m., Jiang Yan Xu wrote: > > src/master/allocator/sorter/drf/sorter.cpp, lines 163-170 > > <https://reviews.apache.org/r/45961/diff/7/?file=1438749#file1438749line163> > > > > So `DRFSorter::allocated` makes sure there's no duplicate shared > > resources in `allocations` so the caller can call it without pruning out > > the redundant copies. However the `unallocated` counterpart doesn't do the > > same? Ideally usage of the pair of API should be symmetric. Moved responsibility of pruning of already allocated shared resources out of the sorter and into the allocator. - Anindya ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/45961/#review141305 ----------------------------------------------------------- On July 17, 2016, 6:28 p.m., Anindya Sinha wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/45961/ > ----------------------------------------------------------- > > (Updated July 17, 2016, 6:28 p.m.) > > > Review request for mesos, Benjamin Mahler, Joris Van Remoortere, Michael > Park, and Jiang Yan Xu. > > > Bugs: MESOS-4431 > https://issues.apache.org/jira/browse/MESOS-4431 > > > Repository: mesos > > > Description > ------- > > o Each shared resource is accouted via its share count. This count is > updated based on the resource operations (such as add and subtract) > in various scenarios such as task launch and terminate at multiple > modules such as master, allocator, sorter, etc. > o Only allow DESTROY if there are no running or pending tasks using > the volume. However, if the volume is in a pending offer to one or > more frameworks, resind that offer before processing the DESTROY. > > > Diffs > ----- > > src/common/resources.cpp f6ff92b591c15bc8e93fd85e1896349c3a7bb968 > src/common/resources_utils.cpp 8e881a09500b0966b577e514077460f724868a8d > src/master/allocator/mesos/hierarchical.hpp > b72ba16277a3210e4d309b616d185a10e2029a66 > src/master/allocator/mesos/hierarchical.cpp > 7d4064535a20b93950f5a95eef1ad3f0d37d305b > src/master/allocator/sorter/drf/sorter.hpp > bc6bfb2d5d3b32d55be055a0514861b4e7d889bb > src/master/allocator/sorter/drf/sorter.cpp > ac85b327fc33d34246788e6a8c8bf5a486c61434 > src/master/http.cpp d1fe22bd2d9eb6bd82e32cf01720bbe3d8be26d5 > src/master/master.hpp ac998b1f5b305a9bff9d9e5cd205a6c3481f9b38 > src/master/master.cpp 61eaa4a92741a2d1e3f6624c555a40f6b9240d90 > src/master/quota_handler.cpp bf6a613a7bb3c62fd77d1ffde3170749d6c21fa2 > src/master/validation.hpp 43d876b00511d4ef652c2685d4950c994b5eef31 > src/master/validation.cpp 50ba372d1301d16d1738a3e4f4882b51f9ce06cd > src/master/weights_handler.cpp 9a901e71ba2fbf2ca1c02f658a72d44cfaa5ec62 > src/tests/master_validation_tests.cpp > 9eb82a569f7c48caa96d4d54a93b199ccac74385 > src/tests/sorter_tests.cpp b0e5ef8a55bbcca3553a221bd5691a9c801a04f7 > src/v1/resources.cpp 8c3f2d1c1529915a59d47fe37bb3fc7a3267079a > > Diff: https://reviews.apache.org/r/45961/diff/ > > > Testing > ------- > > Tests successful. > > > Thanks, > > Anindya Sinha > >