> On June 14, 2016, 10:50 p.m., Jiang Yan Xu wrote: > > src/common/resources.cpp, line 1773 > > <https://reviews.apache.org/r/45961/diff/3/?file=1390645#file1390645line1773> > > > > Make sure we add this to the tests. > > > > So now the shared persistent volume could look like `disk(alice, > > hdfs-p, {foo: bar, foo})[hadoop:/hdfs:/data:rw]<SHARED>:1<6>` > > > > Would `<SHARED>` be a little redundant given that we have `<6>`?
I think having the `<SHARED>` makes it clearer. I would want to keep the `<SHARED>` while logging the `Resource` object. Test case `ResourcesTest.PrintingSharedResources` added in https://reviews.apache.org/r/45964/. > On June 14, 2016, 10:50 p.m., Jiang Yan Xu wrote: > > src/master/allocator/sorter/drf/sorter.hpp, line 171 > > <https://reviews.apache.org/r/45961/diff/3/?file=1390648#file1390648line171> > > > > Is `nonSharedScalarQuantities` a better name? I would prefer to not change the name of this existing member variable and change at a whole bunch of places in the sorter. I think a comment should suffice. > On June 14, 2016, 10:50 p.m., Jiang Yan Xu wrote: > > src/master/allocator/sorter/drf/sorter.hpp, line 181 > > <https://reviews.apache.org/r/45961/diff/3/?file=1390648#file1390648line181> > > > > I feel that we can find a better place/name for this field. > > `shareResources` doesn't say if it's the allocation or the total pool. > > > > I feel like it should be in `total_` as it's reflects a state of the > > total pool. If we leave it here then perhaps give it a clearer name? What > > do you think? I would rename it to `allocatedSharedResources`. I think putting it in total gives an inaccurate meaning since we are storing the allocated shared resources here (with n copies). > On June 14, 2016, 10:50 p.m., Jiang Yan Xu wrote: > > src/master/allocator/mesos/hierarchical.cpp, lines 898-901 > > <https://reviews.apache.org/r/45961/diff/3/?file=1390647#file1390647line898> > > > > Can't quite understand this comment as to providing an answer for the > > use of `.nonShared()` on `resources`? We always keep one copy of a specific shared resource in `slaves[slaveId].allocated`. Assume 2 offers go out to 2 different frameworks with the same shared resource. Let us say Framework 1 returns it back, so we subtract the shared resource from `slaves[slaveId].allocated`, hence the shared resource is removed from `slaves[slaveId].allocated` (since its shared count is always 1 if present). When Framework 2 also returns it, the `CHECK(slaves[slaveId].allocated.contains(resources))` would fail. That is why we only check against non shared resources. However, I think the following would be a better approach. We can move to a model of keeping n copies for shared resources in `slaves[slaveId].allocated` (where n is the number of frameworks who have been offered this shared resource). As a result, we can update the slave's allocated list without worrying about the shared count being 0 for shared resources. > On June 14, 2016, 10:50 p.m., Jiang Yan Xu wrote: > > src/master/allocator/mesos/hierarchical.cpp, line 1321 > > <https://reviews.apache.org/r/45961/diff/3/?file=1390647#file1390647line1321> > > > > If a framework explicitly turns down the shared resources we should not > > send them back again right? If we check for shared resource in `isFiltered()`, we should be able to filter resources correctly if the shared resource being offered in the current offer cycle was returned by the framework in the previous offer cycle (i.e. `OfferFilter` created in `recoverResources()` includes the shared resources. However, let us say we offered shared resource in previous offer cycle and the framework launched a task in the `ACCEPT` call using that shared resource (or the other case is although the framework returned the shared resource back, but it already has 1 or more tasks running from a previous accept cycle using that shared resource). In that case, the `OfferFilter` would not contain the shared resource. In the next offer cycle, we will have the shared resource in call to `isFiltered()` [since the shared resource is always added] which would result `isFiltered()` to be `false` which means the offer shall be sent to this framework every seond (instead of `refuse_seconds`). Since shared resource is always added in the calculation, using a shared resource in a task by a framework would result in too frequent offers when a task is using that shared resource, which does not seem correct since the shared resource is always added in the offer to applicable frameworks. On further discussion: Since we send shared resource only once, I think we should be fine with filtering on all resources since only 1 framework gets offered every 1 second and the remaining ones do not. > On June 14, 2016, 10:50 p.m., Jiang Yan Xu wrote: > > src/master/allocator/mesos/hierarchical.cpp, lines 1426-1428 > > <https://reviews.apache.org/r/45961/diff/3/?file=1390647#file1390647line1426> > > > > Thinking about this, we should discuss what the best behavior is for > > now: > > > > In the current design two offers from the same agent can already be > > created due to the two stage allocation algorithm. However the same > > resource are never sent in two offers. > > > > Here we have two options for shared resources for stage 2: > > > > 1. Replenish 'available' with a (potential) 2nd copy of the same shared > > resources, or > > 2. Don't add the 2nd copy if it's already in an offer in this cycle but > > DO add another copy if it's not in an offer. > > > > Eventually we may be sending the same shared resources to all eligible > > frameworks anyways but for now it feels that 2) is more in line with the > > current behavior: one resource in one offer in one offer cycle. > > > > Thoughts? To be consistent with the fact that the same resource is not offered more than once in the same offer cycle, we shall continue the same with shared resources. If a shared resource has been offered in this offer cycle, we will not offer that same shared resource in the same offer cycle to any framework. > On June 14, 2016, 10:50 p.m., Jiang Yan Xu wrote: > > src/master/allocator/sorter/drf/sorter.cpp, lines 448-449 > > <https://reviews.apache.org/r/45961/diff/3/?file=1390649#file1390649line448> > > > > Here the constrast between `total_.scalarQuantities` (includes shared > > resources) and `allocations[name].scalarQuantities` makes a good argument > > for renaming `allocations[name].scalarQuantities` as > > `allocations[name].nonSharedScalarQuantities` so it's clear that it's > > different from the way `total_.scalarQuantities` is calculated. As in the previous comment, I would prefer to not change the name of this existing member variable and change at a whole bunch of places in the sorter. Also, I think that even if we want to change the name of an existing member variable, it should be a different commit (and not in the same commit as the shared resources functionality). > On June 14, 2016, 10:50 p.m., Jiang Yan Xu wrote: > > src/master/master.cpp, line 3917 > > <https://reviews.apache.org/r/45961/diff/3/?file=1390652#file1390652line3917> > > > > The interesting scenario: if the framework receives one of shared > > resource and launches 3 tasks each using it, here `_offeredResources` has > > one copy of it and `-= addTask` is going to take it out after the first > > task is launched. > > > > The next task in this sequence then fails the validation due to the > > resource in taskinfo not in `_offeredResources`. > > > > Should the next task fail? I think we should allow multiple tasks to be launched in the same accept cycle. For validation, we will add the view of shared resources from the original offer to ensure validations for subsequent tasks are successful in the same accept cycle. > On June 14, 2016, 10:50 p.m., Jiang Yan Xu wrote: > > src/master/master.hpp, line 295 > > <https://reviews.apache.org/r/45961/diff/3/?file=1390651#file1390651line295> > > > > Here `const Resource&` is used and we have lost the count which means > > updated -= resource may not remove the shared resource. Shared resources extracted via `resources.shared()` have a count of 1 always (since the offer always has atmost 1 copy of the shared resource), otherwise shared resource cannot be in resources. So, `updated -= resource` essentially translates to: `updated -= Resource_(resource)` Since the `resource.has_shared()` is `true`, the `Resource_(resource)` morphs into a `Resource_` object with a `sharedCount` of 1. So we should be ok here. > On June 14, 2016, 10:50 p.m., Jiang Yan Xu wrote: > > src/master/master.cpp, line 3960 > > <https://reviews.apache.org/r/45961/diff/3/?file=1390652#file1390652line3960> > > > > Not just `used` and not just task launches in previous cycles right? > > > > If the Accept has three copies of the shared resources and two tasks > > use them. The remaining one cannot be recovered due to tasks launched now. Now we send only 1 copy of a specific shared resource across multiple offers in a single offer cycle. So the Accept can have a single copy of any shared resource. So this condition would not arise. > On June 14, 2016, 10:50 p.m., Jiang Yan Xu wrote: > > src/master/master.hpp, line 289 > > <https://reviews.apache.org/r/45961/diff/3/?file=1390651#file1390651line289> > > > > Multiple frameworks can use the same shared resource so passing in one > > frameworkId is not sufficient? I think it is sufficient (and not all frameworks are needed) since we want to update the allocations at the slave level and framework level based on this framework. To address this issue, we now maintain n copies of shared resources in slave's allocated list (but only 1 copy in role/framework's allocated list). So when a framework returns a shared resource in an offer cycle and that shared resource is not used in any task launched by that specific framework, we return those shared resources back to the slave's allocated list as well to the framework's allocated list. Since slave's allocated list contains n copies (if the same shared resource is being offered to n frameworks), it now becomes n-1 copies; but since framework has 1 copy always, that shared resource is removed from framework's list of allocated resources. > On June 14, 2016, 10:50 p.m., Jiang Yan Xu wrote: > > src/master/master.hpp, line 288 > > <https://reviews.apache.org/r/45961/diff/3/?file=1390651#file1390651line288> > > > > Besides used resources, what about offered resources and resources in > > pending tasks? Besides, the method name sounded like it's filtering things > > in but from the context in `_accept()` it's filtering things out... Modified the name of the method to `recoverable()`. > On June 14, 2016, 10:50 p.m., Jiang Yan Xu wrote: > > src/master/master.hpp, lines 292-302 > > <https://reviews.apache.org/r/45961/diff/3/?file=1390651#file1390651line292> > > > > How about: > > > > ``` > > // Return the subset of the specified resources on this agent that can > > // be recovered back to the allocator. > > // The criteria for recoverable resources: > > // - Non-shared resources. > > // - Shared resources that do not also exist in 'totalUsedResources' > > // or 'offeredResources'. > > Resources recoverable(const Resources& resources) > > { > > Resources recoverable = resources.nonShared(); > > foreach (const Resource& resource, resources.shared()) { > > if (!totalUsedResources.contains(resource) && > > !offeredResources.contains(resource)) { > > recoverable += resource; > > } > > } > > > > return recoverable; > > } > > ``` > > > > It looks like we'll have to maintain a `totalUsedResources` in `Slave` > > though. > > > > Another issue is that: when the task is being authorized, its resources > > are not in either `usedResources` or `offeredResources`. We have to address > > this too... This method should act on this specific framework only. For example, if this framework returned the shared resource but some other framework has launched a task (or has been offered), we should not remove this shared resource from the resources passed to `recoverResources()` since in that case, the framework's allocated list will continue to hold this shared resource which is incorrect as the DRF share would not be accurate. As in previous comment, we would maintain n copies of slave's allocated list and `recoverable()` acts on this specific framework only to check for any launched tasks using this shared resource. Based on that, we recover this resource from slave's allocated list (to become n-1 copies) and from framework's allocated list (will be removed). - Anindya ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/45961/#review136970 ----------------------------------------------------------- On June 23, 2016, 1:27 a.m., Anindya Sinha wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/45961/ > ----------------------------------------------------------- > > (Updated June 23, 2016, 1:27 a.m.) > > > Review request for mesos and Jiang Yan Xu. > > > Bugs: MESOS-4431 > https://issues.apache.org/jira/browse/MESOS-4431 > > > Repository: mesos > > > Description > ------- > > Each shared resource is accouted via its share count. This count is > updated based on the resource operations (such as add and subtract) > in various scenarios such as task launch and terminate at multiple > modules such as master, allocator, sorter, etc. > Only allow DESTROY if no task is using the volume, or has not been > offered to any framework. > Since shared resources are available to multiple frameworks of the > same role simultaneously, we normalize it with a weight equivalent > to the number of frameworks to which this shared resource has been > allocated to in the sorter. > > > Diffs > ----- > > src/common/resources.cpp f6ff92b591c15bc8e93fd85e1896349c3a7bb968 > src/common/resources_utils.cpp 8e881a09500b0966b577e514077460f724868a8d > src/master/allocator/mesos/hierarchical.cpp > 5b2331b3f99252fd16f2514123006dd4ba7f1c0d > src/master/allocator/sorter/drf/sorter.hpp > 35273b5dcf39938125a112c5e56b5a8a542157db > src/master/allocator/sorter/drf/sorter.cpp > 27d56f274c41bbabe6f2abbbcebd2c4eff52693e > src/master/http.cpp 7daaf12a4086635bbc5aba5e3375c95e8899ac6e > src/master/master.hpp fe57878dc59637459d5c5cdae0be2aa159133fa4 > src/master/master.cpp 8def7156f4a05b39590321ce7743f7385a68bed0 > src/master/validation.hpp 43d876b00511d4ef652c2685d4950c994b5eef31 > src/master/validation.cpp 9120b71fc7725bdf7094aac6619d8aadcc352df5 > src/tests/master_validation_tests.cpp > 9eb82a569f7c48caa96d4d54a93b199ccac74385 > src/tests/sorter_tests.cpp 6fc58829892dc0223140f1b47593a3e5853cace5 > src/v1/resources.cpp 8c3f2d1c1529915a59d47fe37bb3fc7a3267079a > > Diff: https://reviews.apache.org/r/45961/diff/ > > > Testing > ------- > > Tests successful. > > > Thanks, > > Anindya Sinha > >