Re: Review Request 36617: Improved task reconciliation documentation.

Ben Mahler Mon, 27 Jul 2015 11:35:58 -0700

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/36617/#review93147
-----------------------------------------------------------


Ship it!


Thanks Jan! I've made the updates from the feedback and will get this committed 
shortly.


docs/reconciliation.md (lines 43 - 47)
<https://reviews.apache.org/r/36617/#comment147407>

    Let's clarify what we mean by "current" here. That is, "non-terminal".



docs/reconciliation.md (line 83)
<https://reviews.apache.org/r/36617/#comment147408>

    Hm.. what kind of failure? Let's clarify that this is relevant to master 
failover.



docs/reconciliation.md (lines 83 - 91)
<https://reviews.apache.org/r/36617/#comment147413>

    Thanks Jan! It might be a bit more clear if we move this below where we say 
that the algorithm uses retries, given this is why the retries are needed.
    
    Also, might be helpful to point out that this time is bounded by the 
--slave_reregister_timeout flag.



docs/reconciliation.md (lines 93 - 95)
<https://reviews.apache.org/r/36617/#comment147417>

    It seems a bit odd to prescribe that frameworks have to persist task 
information here, they are free not to as well. Perhaps we need a document 
which describes some recommendations on framework implementation?
    
    That document could point to reconciliation as one aspect of 
implementation, and could also talk about persistence as its own topic (e.g. 
write ahead storage, how to achieve high throughput, what are the implications 
of no persistence in the scheduler? what are the impliciations of 
non-replicated storage? etc)..
    
    Write-ahead storage means that with a strict registry the scheduler only 
needs to perform explicit reconciliation (although implicit would be prudent as 
a defense). I'm inclined to not mention storage here though because the 
registry is non-strict by default (so everyone should be doing implicit 
reconciliation).



docs/reconciliation.md (line 97)
<https://reviews.apache.org/r/36617/#comment147415>

    Why move this above?



docs/reconciliation.md (lines 98 - 99)
<https://reviews.apache.org/r/36617/#comment147414>

    This should be already captured by "non-terminal", right?



docs/reconciliation.md (line 100)
<https://reviews.apache.org/r/36617/#comment147416>

    Well, that's certainly one way to handle it, but it seems a bit odd to 
prescribe that here. For example, a scheduler could recover the task as well.



docs/reconciliation.md (lines 119 - 123)
<https://reviews.apache.org/r/36617/#comment147410>

    Hm.. I'm a bit confused by this addition, what time period are you 
referring to?
    
    One of the critical reasons for periodic reconcilition is that by default 
we don't use a strict registry. With a non-strict registry, the master does not 
enforce slave removal across master failovers. I'll add a note about this; as a 
result tasks may resurrect from a lost state (hence the need to discover them). 
We should probably also move this up out of the notes since its required by 
default.


- Ben Mahler


On July 23, 2015, 12:01 p.m., Jan Schlicht wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/36617/
> -----------------------------------------------------------
> 
> (Updated July 23, 2015, 12:01 p.m.)
> 
> 
> Review request for mesos and Joerg Schad.
> 
> 
> Bugs: MESOS-3127
>     https://issues.apache.org/jira/browse/MESOS-3127
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Improved task reconciliation documentation.
> 
> 
> Diffs
> -----
> 
>   docs/reconciliation.md 17537ba8420c95d833e64ccf82ff9bb4530497f0 
> 
> Diff: https://reviews.apache.org/r/36617/diff/
> 
> 
> Testing
> -------
> 
> https://gist.github.com/nfnt/73532d62fe39d27ff33d
> 
> 
> Thanks,
> 
> Jan Schlicht
> 
>

Re: Review Request 36617: Improved task reconciliation documentation.

Reply via email to