-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/45961/#review136970
-----------------------------------------------------------



A high level comment is that when we are making significant changes to the 
allocator we have to measure the performance implication both when clusters 
don't use this feature and when clusters do in benchmarks.


src/common/resources.cpp (line 1225)
<https://reviews.apache.org/r/45961/#comment202089>

    A blank line after `}`.



src/common/resources.cpp (line 1266)
<https://reviews.apache.org/r/45961/#comment202090>

    A blank line after `}`.



src/common/resources.cpp (line 1773)
<https://reviews.apache.org/r/45961/#comment202088>

    Make sure we add this to the tests.
    
    So now the shared persistent volume could look like `disk(alice, hdfs-p, 
{foo: bar, foo})[hadoop:/hdfs:/data:rw]<SHARED>:1<6>`
    
    Would `<SHARED>` be a little redundant given that we have `<6>`?



src/master/allocator/mesos/hierarchical.cpp (lines 898 - 901)
<https://reviews.apache.org/r/45961/#comment202518>

    Can't quite understand this comment as to providing an answer for the use 
of `.nonShared()` on `resources`?



src/master/allocator/mesos/hierarchical.cpp (line 1282)
<https://reviews.apache.org/r/45961/#comment202525>

    "the difference in non-shared resources between total and allocated plus 
all shared resources on the slave"



src/master/allocator/mesos/hierarchical.cpp (lines 1283 - 1284)
<https://reviews.apache.org/r/45961/#comment202521>

    To clarify "we always have one copy of any shared resource", how about:
    
    ```
    Since shared resources are offerable even when they are in use, in stage 1 
of each allocation cycle we make one copy of the shared resources available 
regardless of the past allocations.
    ```



src/master/allocator/mesos/hierarchical.cpp (line 1321)
<https://reviews.apache.org/r/45961/#comment202526>

    If a framework explicitly turns down the shared resources we should not 
send them back again right?



src/master/allocator/mesos/hierarchical.cpp (lines 1426 - 1428)
<https://reviews.apache.org/r/45961/#comment202540>

    Thinking about this, we should discuss what the best behavior is for now:
    
    In the current design two offers from the same agent can already be created 
due to the two stage allocation algorithm. However the same resource are never 
sent in two offers.
    
    Here we have two options for shared resources for stage 2:
    
    1. Replenish 'available' with a (potential) 2nd copy of the same shared 
resources, or
    2. Don't add the 2nd copy if it's already in an offer in this cycle but DO 
add another copy if it's not in an offer.
    
    Eventually we may be sending the same shared resources to all eligible 
frameworks anyways but for now it feels that 2) is more in line with the 
current behavior: one resource in one offer in one offer cycle.
    
    Thoughts?



src/master/allocator/sorter/drf/sorter.hpp (line 157)
<https://reviews.apache.org/r/45961/#comment202556>

    Here also add:
    
    ```
    NOTE: Sharedness info is also stripped out when resource identities are 
omitted because shareness inherently refers to the identities of resources and 
not quantities.
    ```



src/master/allocator/sorter/drf/sorter.hpp (lines 164 - 165)
<https://reviews.apache.org/r/45961/#comment202782>

    At the end of the 1st sentences, add `(for non-shared resources only)`.



src/master/allocator/sorter/drf/sorter.hpp (lines 167 - 169)
<https://reviews.apache.org/r/45961/#comment202780>

    So far there is not a place that lays out the algorithm coherently about 
how DRF works with shared resources. We can describe it in detail in 
`calculateShare` and refer to it here.
    
    Here we can emphasize on how we structure the data members to make the 
algorithm work:
    
    ```
    We do not aggregate shared resources here since the client's "normalized 
allocation" of a shared resource can only be derived based on the its state in 
`total_.sharedResources` when `calculateShare()` is run. See `calculateShare()` 
for comments on the algorithm.
    ```



src/master/allocator/sorter/drf/sorter.hpp (line 171)
<https://reviews.apache.org/r/45961/#comment202668>

    Is `nonSharedScalarQuantities` a better name?



src/master/allocator/sorter/drf/sorter.hpp (lines 177 - 180)
<https://reviews.apache.org/r/45961/#comment202594>

    



src/master/allocator/sorter/drf/sorter.hpp (line 181)
<https://reviews.apache.org/r/45961/#comment202553>

    I feel that we can find a better place/name for this field. 
`shareResources` doesn't say if it's the allocation or the total pool.
    
    I feel like it should be in `total_` as it's reflects a state of the total 
pool. If we leave it here then perhaps give it a clearer name? What do you 
think?



src/master/allocator/sorter/drf/sorter.cpp (lines 206 - 207)
<https://reviews.apache.org/r/45961/#comment202699>

    If we agree to put `sharedResources` in `total`, move this to "Update the 
totals" section?



src/master/allocator/sorter/drf/sorter.cpp (lines 284 - 293)
<https://reviews.apache.org/r/45961/#comment202765>

    Group operations on `sharedResources` together.



src/master/allocator/sorter/drf/sorter.cpp (lines 448 - 449)
<https://reviews.apache.org/r/45961/#comment202700>

    Here the constrast between `total_.scalarQuantities` (includes shared 
resources) and `allocations[name].scalarQuantities` makes a good argument for 
renaming `allocations[name].scalarQuantities` as 
`allocations[name].nonSharedScalarQuantities` so it's clear that it's different 
from the way `total_.scalarQuantities` is calculated.



src/master/allocator/sorter/drf/sorter.cpp (lines 455 - 456)
<https://reviews.apache.org/r/45961/#comment202703>

    Here's the place we should expand on the algorithm:
    
    ```
    A shared resource can be allocated to multiple clients simultaneously. We 
split its quantity evenly across clients that it's allocated to when 
calculating one client's normalized allocation of it.
    ```



src/master/allocator/sorter/drf/sorter.cpp (lines 457 - 467)
<https://reviews.apache.org/r/45961/#comment202764>

    For this we need to
    
    1. Run existing benchmarks to see how much performance degradation is there 
for clusters (large number of agents or frameworks) that are not using shared 
resources. Ideally there should be negligible performance impact for such 
clusters.
    2. Create a benchmark that uses shared resources in a realistic way.
    
    In implementing this patch we do have the options to store shared resources 
separately in a few places (e.g., store shared resources outside of 
`allocations[name].resources`) that can help with performance but the trade off 
is that it may increase code complexity. Benchmarks can help us address these 
tradeoffs.



src/master/allocator/sorter/drf/sorter.cpp (line 461)
<https://reviews.apache.org/r/45961/#comment202704>

    If the name is the same as the scalar, we should `CHECK_EQ(resource.type(), 
Value::SCALAR)` right?



src/master/allocator/sorter/drf/sorter.cpp (line 463)
<https://reviews.apache.org/r/45961/#comment202701>

    Use `CHECK_GT`.



src/master/master.hpp (line 288)
<https://reviews.apache.org/r/45961/#comment202183>

    Besides used resources, what about offered resources and resources in 
pending tasks? Besides, the method name sounded like it's filtering things in 
but from the context in `_accept()` it's filtering things out...



src/master/master.hpp (line 289)
<https://reviews.apache.org/r/45961/#comment202186>

    Multiple frameworks can use the same shared resource so passing in one 
frameworkId is not sufficient?



src/master/master.hpp (lines 292 - 302)
<https://reviews.apache.org/r/45961/#comment202189>

    How about:
    
    ```
    // Return the subset of the specified resources on this agent that can
    // be recovered back to the allocator.
    // The criteria for recoverable resources:
    // - Non-shared resources.
    // - Shared resources that do not also exist in 'totalUsedResources'
    //   or 'offeredResources'.
    Resources recoverable(const Resources& resources)
    {
      Resources recoverable = resources.nonShared();
      foreach (const Resource& resource, resources.shared()) {
        if (!totalUsedResources.contains(resource) &&
            !offeredResources.contains(resource)) {
          recoverable += resource;
        }
      }
      
      return recoverable;
    }
    ```
    
    It looks like we'll have to maintain a `totalUsedResources` in `Slave` 
though. 
    
    Another issue is that: when the task is being authorized, its resources are 
not in either `usedResources` or `offeredResources`. We have to address this 
too...



src/master/master.hpp (line 295)
<https://reviews.apache.org/r/45961/#comment202146>

    Here `const Resource&` is used and we have lost the count which means 
updated -= resource may not remove the shared resource.



src/master/master.cpp (lines 3342 - 3355)
<https://reviews.apache.org/r/45961/#comment202142>

    This is to make sure `offeredResources` is not assigned when validation 
failed? The following avoids the duplication of two lines:
    
    ```
            if (error.isSome()) {
              allocator->recoverResources(
                  offer->framework_id(),
                  offer->slave_id(),
                  offer->resources(),
                  None());
            } else {
              slaveId = offer->slave_id();
              offeredResources += offer->resources();
            }
            
            removeOffer(offer);
            continue;
    ```
    
    We still have to address the `offer->resources()` thing in the above 
snippet though.



src/master/master.cpp (line 3347)
<https://reviews.apache.org/r/45961/#comment202181>

    So in here and other places that call `allocator->recoverResources()` it 
seems that we need to filter the resources to get the recoverable ones the same 
way we do it in `_accept()` right?



src/master/master.cpp (line 3915)
<https://reviews.apache.org/r/45961/#comment202218>

    The interesting scenario: if the framework receives one of shared resource 
and launches 3 tasks each using it, here `_offeredResources` has one copy of it 
and `-= addTask` is going to take it out after the first task is launched.
    
    The next task in this sequence then fails the validation due to the 
resource in taskinfo not in `_offeredResources`.
    
    Should the next task fail?



src/master/master.cpp (line 3958)
<https://reviews.apache.org/r/45961/#comment202217>

    Not just `used` and not just task launches in previous cycles right?
    
    If the Accept has three copies of the shared resources and two tasks use 
them. The remaining one cannot be recovered due to tasks launched now.



src/master/validation.cpp (lines 820 - 850)
<https://reviews.apache.org/r/45961/#comment202215>

    One question about this is that:
    
    When a DESTROY comes and the same shared resource is either in 
offeredResources or in pending tasks, should we destroy it?
    
    If we do, then we need to fail the pending tasks that are being authorized 
and remove the affected offers.
    
    It's probably more straighforward to understand if we fail the DESTROY?


- Jiang Yan Xu


On June 13, 2016, 12:20 a.m., Anindya Sinha wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/45961/
> -----------------------------------------------------------
> 
> (Updated June 13, 2016, 12:20 a.m.)
> 
> 
> Review request for mesos and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-4431
>     https://issues.apache.org/jira/browse/MESOS-4431
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Each shared resource is accouted via its share count. This count is
> updated based on the resource operations (such as add and subtract)
> in various scenarios such as task launch and terminate at multiple
> modules such as master, allocator, sorter, etc.
> Allow DESTROY for shared volumes only if share count is 0.
> Since shared resources are available to multiple frameworks of the
> same role simultaneously, we normalize it with a weight equivalent
> to the number of frameworks to which this shared resource has been
> allocated to in the sorter.
> 
> 
> Diffs
> -----
> 
>   src/common/resources.cpp f6ff92b591c15bc8e93fd85e1896349c3a7bb968 
>   src/common/resources_utils.cpp 8e881a09500b0966b577e514077460f724868a8d 
>   src/master/allocator/mesos/hierarchical.cpp 
> 8b7b3afb5770c617918ec4864faaf8d8a7a81ef2 
>   src/master/allocator/sorter/drf/sorter.hpp 
> 88bd9233e345ff679051703a40f3be47d7a86e1a 
>   src/master/allocator/sorter/drf/sorter.cpp 
> 65d473a5da0d846214c930c14d333040b2085b13 
>   src/master/http.cpp cd089cf6afd654b07608548f441a0e225df6425e 
>   src/master/master.hpp 2c45dab291a153b42809ab12e4252bf58559feeb 
>   src/master/master.cpp dd80367768a1edbb6ff94d40819b9755a8b8135f 
>   src/master/validation.hpp e1271bbaebaac52a078eedfdc2299c7c6f4474f8 
>   src/master/validation.cpp 7b9c2281b2ab1295211f0dd385cd77947fbd63be 
>   src/tests/master_validation_tests.cpp 
> 9eb82a569f7c48caa96d4d54a93b199ccac74385 
>   src/tests/sorter_tests.cpp eb207a36c18198588cd8a98b3860a66c2ff7a641 
>   src/v1/resources.cpp 8c3f2d1c1529915a59d47fe37bb3fc7a3267079a 
> 
> Diff: https://reviews.apache.org/r/45961/diff/
> 
> 
> Testing
> -------
> 
> Tests successful.
> 
> 
> Thanks,
> 
> Anindya Sinha
> 
>

Reply via email to