-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/45961/#review141757
-----------------------------------------------------------



Adding to the previous partial review after discussion.

## Invariant checking and documentation

An overall comment is that I think since we keep track of shared resource 
across Master, allocator and sorter, each for a different aspect (allocated vs. 
used vs. available, etc.) and with different semantics and expactations on 
uniquesness/number of copies etc., we should very explicitly document these 
variables and method arguments and check their invariants.

Specifically, the following diagram helped me understand the relationship 
between these variables.

Sorter
|
|- allocations: hashmap<std::string, Allocation>
                                     |
                                     |- resources: hashmap<SlaveID, Resources>
                                     |- scalarQuantities 

`Allocation.resources` expects 1 copy of each shared resource because it's 
tracking whether a client has been allocated this resource and doesn't care 
about how many times it's been allocated or used by the client. Therefore 
distinct shared resources is an invariant that we should check in `allocated()`.

Allocator
|
|- slaves: hashmap<SlaveID, Slave>
                            |
                            |- allocated: Resources
                            
`Slave.allocated` expects 1 copy of each shared resource **per client** and 
it's grouped into one Resources object.

Master
|
|- Slave
|  |
|  |- usedResources: hashmap<FrameworkID, Resources>
|
|- Framework
   |
   |- usedResources: hashmap<SlaveID, Resources>
   
`Slave.usedResources` and `Framework.usedResources` expect 1 copy of each 
shared resource **per task** as they are tracking "usage".


src/master/allocator/mesos/hierarchical.cpp (lines 450 - 459)
<https://reviews.apache.org/r/45961/#comment207146>

    Seems like in `slaves[slaveId].allocated` we should be maintaining one copy 
of each shared resource **per framework** but this if condition here assumes 
one copy of each shared resource across framework.



src/master/allocator/mesos/hierarchical.cpp (lines 1386 - 1400)
<https://reviews.apache.org/r/45961/#comment207150>

    Ignore my previous comment on this. It wasn't clear to me what invariant 
for `slaves[slaveId].allocated` is. Now I have looked again, we define shared 
resources in `slaves[slaveId].allocated` as: We maintain one copy of shared 
resource in `slaves[slaveId].allocated` **for each framework it is allocated 
to**. 
    
    The first question is why do we do this? I can understand that this way if 
a shared resource is allocated to N frameworks, when it's recovered from all N 
frameworks it disappears from `slaves[slaveId].allocated`.
    
    We should document this where this variable is defined and check this 
invariant wherever appropriate.
    
    And in here, is this equivalent to the block of code?
    
    ```
    slaves[slaveId].allocated += resources - allocation.shared();
    ```



src/master/allocator/mesos/hierarchical.cpp (lines 1406 - 1409)
<https://reviews.apache.org/r/45961/#comment207160>

    Since `allocated` above already excludes shared resources already allocated 
to this framework, why couldn't be just pass it along to 
`frameworkSorters[role]->add()|allocated()`?
    
    For `roleSorter->allocated()` and `quotaRoleSorter->allocated()` we can use 
`resources - slaves[slaveId].allocated.shared()` right?



src/master/allocator/mesos/hierarchical.cpp (lines 1570 - 1571)
<https://reviews.apache.org/r/45961/#comment207164>

    To expand on this: I understand that if shared resources are offered out in 
the 1st stage, it'll be exluded from `remainingClusterResources` so by 
excluding shared resources from `scalarQuantity` we are comparing apples to 
apples. 
    
    On the other hand it's possible for some shared resources to be not offered 
in the 1st stage so it'll remain in `remainingClusterResources`, if we  exclude 
these shared resources from `scalarQuantity` we could be over-allocating 
nonshared resources.
    
    Perhaps we should be consistently checking nonshared resources to prevent 
over-allocation in stage 2. i.e.,
    
    ```
    // We exlude shared resources from over-allocation check because shared 
resources are alwas allocatable.
    if (!remainingClusterResources.nonShared().contains(
        (allocatedStage2 + scalarQuantity).nonShared())) {
      continue;
    }
    ```



src/master/allocator/sorter/drf/sorter.hpp (line 168)
<https://reviews.apache.org/r/45961/#comment207137>

    Some comments about shared resources would be nice.
    
    ```
    // We maintain one copy of each shared resource allocated to a client here. 
A shared resource may be offered to the same client multiple times but here we 
are only concerned with whether it's in the client's allocation or not. 
    ```



src/master/allocator/sorter/drf/sorter.hpp (line 171)
<https://reviews.apache.org/r/45961/#comment207127>

    Add to this comment about omitting sharedness.



src/master/allocator/sorter/drf/sorter.cpp (lines 163 - 170)
<https://reviews.apache.org/r/45961/#comment207140>

    Adding to it, I wonder if it would be cleaner to require the caller not to 
call `allocated()` with shared resources already allocated to it.
    
    If so we should document this in the API and also check the invariant here:
    
    ```
    const Resources allocatedShared = 
      allocations[name].resources[slaveId].shared();
    foreach (const Resource& resource, resources.shared()) {
      CHECK(!allocatedShared.contains(resource));
    }
    ```
    
    If the invariant holds, then for `scalarQuantities` the below is trivially 
guranteed:
    
    ```
    allocations[name].scalarQuantities +=
      resources.createStrippedScalarQuantity();
    ```



src/master/master.hpp (line 274)
<https://reviews.apache.org/r/45961/#comment207215>

    Chatted offline. We should refactor this into a 
`Master::recoverResources()` call to encapsulate this logic and call 
`allocator->recoverResources()` inside with adjusted arguments.



src/master/master.hpp (line 343)
<https://reviews.apache.org/r/45961/#comment207217>

    As discussed, it takes some work to account for  `pendingResources` 
correctly when we could alternatively add task to `usedResources` and remove 
the task if validation fails. If we choose do it later as a separate review 
let's make sure we don't change too much and make it hard to refactor later and 
at least add a TODO here.
    
    ```
    TODO: In stead of tracking `pendingResources` separately, consider adding 
tasks (and consider it as STAGING when the accept call arrives and removing it 
if it fails authorization. This way we only need to keep track of 
`usedResources`.
    ```



src/master/validation.hpp (lines 157 - 158)
<https://reviews.apache.org/r/45961/#comment207252>

    Seems like we only need to change 4 lines in master_validation_tests.cpp 
due to adding the additional arguments as non-optional. Sounds reasonable to me 
to fix them.



src/tests/master_validation_tests.cpp (line 600)
<https://reviews.apache.org/r/45961/#comment207270>

    s/is done only if shared count is 0/is only valid when the volumes are no 
longer in use/.



src/tests/master_validation_tests.cpp (lines 605 - 607)
<https://reviews.apache.org/r/45961/#comment207261>

    Don't use duplicated persistent ID here?



src/tests/master_validation_tests.cpp (lines 630 - 632)
<https://reviews.apache.org/r/45961/#comment207269>

    Here we are testing that can't destroy a  nonshared persistent volume 
`disk2` because it's in use?
    
    Perhaps it's still reasonable to test it (even thought it can't happen 
right now) but it's not relevant to this test?



src/tests/sorter_tests.cpp (lines 493 - 494)
<https://reviews.apache.org/r/45961/#comment207278>

    The same volume is used a couple of times, use a variable instead of 
creating it each time?



src/tests/sorter_tests.cpp (lines 499 - 500)
<https://reviews.apache.org/r/45961/#comment207273>

    This is not a realistic example. "id1" is already a volume, you shouldn't 
claim 1/10 of a volume. (I don't see us validating this though)



src/tests/sorter_tests.cpp (lines 520 - 521)
<https://reviews.apache.org/r/45961/#comment207276>

    Even if a and b both use 1/10 of the persistent volume, shouldn't they each 
get 0.1 share?



src/tests/sorter_tests.cpp (line 541)
<https://reviews.apache.org/r/45961/#comment207280>

    Where's the volume gone?
    
    One blank line above.


- Jiang Yan Xu


On July 7, 2016, 8:44 a.m., Anindya Sinha wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/45961/
> -----------------------------------------------------------
> 
> (Updated July 7, 2016, 8:44 a.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Joris Van Remoortere, Michael 
> Park, and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-4431
>     https://issues.apache.org/jira/browse/MESOS-4431
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> o Each shared resource is accouted via its share count. This count is
>   updated based on the resource operations (such as add and subtract)
>   in various scenarios such as task launch and terminate at multiple
>   modules such as master, allocator, sorter, etc.
> o Only allow DESTROY if there are no running or pending tasks using
>   the volume. However, if the volume is in a pending offer to one or
>   more frameworks, resind that offer before processing the DESTROY.
> 
> 
> Diffs
> -----
> 
>   src/common/resources.cpp f6ff92b591c15bc8e93fd85e1896349c3a7bb968 
>   src/common/resources_utils.cpp 8e881a09500b0966b577e514077460f724868a8d 
>   src/master/allocator/mesos/hierarchical.cpp 
> c1e00039606164599e25ff5f76245e4d35ec3112 
>   src/master/allocator/sorter/drf/sorter.hpp 
> e29feebd70277c79f7c3f6fb233e7a36501cf220 
>   src/master/allocator/sorter/drf/sorter.cpp 
> 7df4dd641b21ea0705368861bf4679fed1ef078d 
>   src/master/http.cpp bff1fd53462bfec19e4a025c49a00e2855faf4f3 
>   src/master/master.hpp 60efd22280c62801b080365986fe9773737ca563 
>   src/master/master.cpp 79e3d78ba45060bc2f2532fdc3d105d1cc888d0f 
>   src/master/validation.hpp 43d876b00511d4ef652c2685d4950c994b5eef31 
>   src/master/validation.cpp 50ba372d1301d16d1738a3e4f4882b51f9ce06cd 
>   src/tests/master_validation_tests.cpp 
> 9eb82a569f7c48caa96d4d54a93b199ccac74385 
>   src/tests/sorter_tests.cpp 0a921347586808863e615ca3dcc23fae92b629f5 
>   src/v1/resources.cpp 8c3f2d1c1529915a59d47fe37bb3fc7a3267079a 
> 
> Diff: https://reviews.apache.org/r/45961/diff/
> 
> 
> Testing
> -------
> 
> Tests successful.
> 
> 
> Thanks,
> 
> Anindya Sinha
> 
>

Reply via email to