-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/42222/#review114628
-----------------------------------------------------------


Thanks Alex! I ended up going over the structure of the recover code and left 
some higher level comments. There also appears to be a bug that will crash the 
master that I marked as an issue :)

Have we convinced ourselves that we want to act on partial state when quota is 
*not* set? It seems to complicate things here, and acting on partial state w/o 
quota has some issues as well?


src/master/allocator/mesos/hierarchical.cpp (line 149)
<https://reviews.apache.org/r/42222/#comment175526>

    It now seems odd to start the allocation loop within initialize rather than 
within recover, we'll do the following:
    
    intialize -> unpauses, starts allocation loop but the allocation loop 
should do nothing
    recover -> pauses allocations, waits for condition to be met, resumes 
allocation.
    
    It seems a simpler way to think about this would be to just start 
allocating after the recovery condition is met. This would consist of (1) 
started = true* and (2) the initial delay to 'batch' that starts the delay loop.
    
    * Although keeping the pause concept may be useful if we want to support 
pausing/resuming allocation from endpoints.



src/master/allocator/mesos/hierarchical.cpp (lines 153 - 163)
<https://reviews.apache.org/r/42222/#comment175480>

    It seems a little odd to document at the function level when this interface 
function is already documented:
    
    ```
      /**
       * Informs the allocator of the recovered state from the master.
       *
       * Because it is hard to define recovery for a running allocator, this
       * method should be called after `initialize()`, but before actual
       * allocation starts (i.e. `addSlave()` is called).
       *
       * TODO(alexr): Consider extending the signature with expected
       * frameworks count once it is available upon the master failover.
       *
       * @param quotas A (possibly empty) collection of quotas, keyed by
       *     their role, known to the master.
       */
      virtual void recover(
          const int expectedAgentCount,
          const hashmap<std::string, Quota>& quotas) = 0;
    ```
    
    How about moving it in to the body after the checks, with some slight 
rephrasing:
    
    ```
    void HierarchicalAllocatorProcess::recover(
        const int _expectedAgentCount,
        const hashmap<string, Quota>& quotas)
    {
      // Recovery should start before actual allocation starts.
      CHECK(initialized);
      CHECK_EQ(0u, slaves.size());
      CHECK_EQ(0, quotaRoleSorter->count());
      CHECK(_expectedAgentCount >= 0);
    
      // If there is no quota, recovery is a no-op. Otherwise, we need
      // to delay allocations while agents are re-registering because 
      // otherwise we perform allocations on a partial view of resources!
      // We would consequently perform unnecessary allocations to satisfy
      // quota constraints, which can over-allocate non-revocable resources
      // to roles using quota. Then, frameworks in roles without quota can
      // be unnecessarily deprived of resources. We may also be unable to
      // satisfy all of the quota constraints. Repeated master failovers
      // exacerbate the issue.
      
      if (quotas.empty()) {
        VLOG(1) << "Skipping recovery of hierarchical allocator: "
                << "nothing to recover";
    
        return;
      }
      
      ...
      
    }
    ```
    
    It's great that we went through the exercise of writing this comment 
because it seems we should consider whether we want to avoid acting on partial 
state even when no quota is set.



src/master/allocator/mesos/hierarchical.cpp (line 165)
<https://reviews.apache.org/r/42222/#comment175485>

    I wonder if we can use a better word than "expected" here; something that 
conveys that this represents the count of agents that were registered prior to 
failover / persisted in the registry. For example, the following would be 
really clear:
    
    ```
    void HierarchicalAllocatorProcess::recover(
        const registry::Slaves& recoveredSlaves,
        const hashmap<string, Quota>& quotas)
    ```



src/master/allocator/mesos/hierarchical.cpp (line 168)
<https://reviews.apache.org/r/42222/#comment175510>

    Should this comment be moved down?
    
    I'm also not quite sure what it's expressing, is it a re-hash of the 
lifecycle documentation?
    
    ```
       * Because it is hard to define recovery for a running allocator, this
       * method should be called after `initialize()`, but before actual
       * allocation starts (i.e. `addSlave()` is called).
    ```
    
    For example:
    
    ```
    CHECK(initialized);
    
    // Recovery should be called after initialize and before other calls.
    CHECK_EQ(0u, slaves.size());
    CHECK_EQ(0, quotaRoleSorter->count());
    ```
    
    Could we extend our existing 'initialized' state boolean into an enum to 
express UNINITIALIZED, INITIALIZED, RECOVERED? As far as I can tell, the 
existing methods with `CHECK(initialized)` are also interested in checking that 
recover was called?



src/master/allocator/mesos/hierarchical.cpp (lines 171 - 172)
<https://reviews.apache.org/r/42222/#comment175481>

    Not for this review, but looks like we should update `Sorter::count` to 
return a size_t.
    
    The following CHECK implies that `_expectedAgentCount` should just be a 
size_t?
    
    ```
    CHECK(_expectedAgentCount >= 0);
    ```
    
    With a size_t we don't have to do this kind of check :)



src/master/allocator/mesos/hierarchical.cpp (lines 174 - 199)
<https://reviews.apache.org/r/42222/#comment175483>

    As a terminology thing, it seems a little strange to say that we're 
skipping recovery. Rather, we always recover the allocator, but in certain 
cases recovery might do different things. Still, not sure that we want to have 
the differing behavior depending on whether quota is set. For example, we could 
do the following:
    
    ```
    void HierarchicalAllocatorProcess::recover(
        const registry::Slaves& slaves,
        const hashmap<string, Quota>& quotas)
    {
      CHECK_EQ(INITIALIZED, state);
     
      recoveredSlaveCount = slaves.slaves().size();
     
      foreachpair (const string& role, const Quota& quota, quotas) {
        setQuota(role, quota.info);
      }
      
      // We'll start allocating once a sufficient percentage
      // of agents re-register (or the timeout elapses) to
      // avoid acting on partial state (which can lead to
      // a number of issues!).
      
      Duration recoveryTimeout = ALLOCATOR_RECOVERY_TIMEOUT;
     
      LOG(INFO) << "Allocation will start within " << recoveryTimeout
                << " once " << ALLOCATOR_RECOVERY_AGENT_THRESHOLD_PERCENT << "%"
                << " of the " << slaves.slaves().size()
                << " recovered slaves re-register";
    
      delay(recoveryTimeout, self(), &Self::_recover);
    
      state = RECOVERING;
    }
    ```
    
    We'll need to either check the threshold condition in addSlave (as is 
currently done) to trigger the completion of recovery, have some abstraction 
that satisfies a future when a threshold is met. The latter is tricky if we 
want to finish recovery when slaves.size() reaches the threshold (gauge-like 
abstraction but we have to know when to check the gauge) rather than when the 
count of addSlave calls reaches the threshold (counter-like abstraction, 
easier).



src/master/allocator/mesos/hierarchical.cpp (lines 205 - 206)
<https://reviews.apache.org/r/42222/#comment175525>

    It looks like if we trip the `resume` call in `addSlave`, this delayed 
resume will crash the master due to the `CHECK(paused)` that currently resides 
in `resume`.
    
    Something I'm missing?



src/master/allocator/mesos/hierarchical.cpp (line 468)
<https://reviews.apache.org/r/42222/#comment175533>

    Hm.. did you need the static cast here?


- Ben Mahler


On Jan. 12, 2016, 11:06 p.m., Alexander Rukletsov wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/42222/
> -----------------------------------------------------------
> 
> (Updated Jan. 12, 2016, 11:06 p.m.)
> 
> 
> Review request for mesos, Ben Mahler and Joris Van Remoortere.
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> See summary.
> 
> 
> Diffs
> -----
> 
>   src/master/allocator/mesos/hierarchical.cpp 
> df8bccaf2b8cfc0cb5ca18d4867371ae7a84c12f 
> 
> Diff: https://reviews.apache.org/r/42222/diff/
> 
> 
> Testing
> -------
> 
> None: not a functional change.
> 
> 
> Thanks,
> 
> Alexander Rukletsov
> 
>

Reply via email to