----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/36617/#review93147 -----------------------------------------------------------
Ship it! Thanks Jan! I've made the updates from the feedback and will get this committed shortly. docs/reconciliation.md (lines 43 - 47) <https://reviews.apache.org/r/36617/#comment147407> Let's clarify what we mean by "current" here. That is, "non-terminal". docs/reconciliation.md (line 83) <https://reviews.apache.org/r/36617/#comment147408> Hm.. what kind of failure? Let's clarify that this is relevant to master failover. docs/reconciliation.md (lines 83 - 91) <https://reviews.apache.org/r/36617/#comment147413> Thanks Jan! It might be a bit more clear if we move this below where we say that the algorithm uses retries, given this is why the retries are needed. Also, might be helpful to point out that this time is bounded by the --slave_reregister_timeout flag. docs/reconciliation.md (lines 93 - 95) <https://reviews.apache.org/r/36617/#comment147417> It seems a bit odd to prescribe that frameworks have to persist task information here, they are free not to as well. Perhaps we need a document which describes some recommendations on framework implementation? That document could point to reconciliation as one aspect of implementation, and could also talk about persistence as its own topic (e.g. write ahead storage, how to achieve high throughput, what are the implications of no persistence in the scheduler? what are the impliciations of non-replicated storage? etc).. Write-ahead storage means that with a strict registry the scheduler only needs to perform explicit reconciliation (although implicit would be prudent as a defense). I'm inclined to not mention storage here though because the registry is non-strict by default (so everyone should be doing implicit reconciliation). docs/reconciliation.md (line 97) <https://reviews.apache.org/r/36617/#comment147415> Why move this above? docs/reconciliation.md (lines 98 - 99) <https://reviews.apache.org/r/36617/#comment147414> This should be already captured by "non-terminal", right? docs/reconciliation.md (line 100) <https://reviews.apache.org/r/36617/#comment147416> Well, that's certainly one way to handle it, but it seems a bit odd to prescribe that here. For example, a scheduler could recover the task as well. docs/reconciliation.md (lines 119 - 123) <https://reviews.apache.org/r/36617/#comment147410> Hm.. I'm a bit confused by this addition, what time period are you referring to? One of the critical reasons for periodic reconcilition is that by default we don't use a strict registry. With a non-strict registry, the master does not enforce slave removal across master failovers. I'll add a note about this; as a result tasks may resurrect from a lost state (hence the need to discover them). We should probably also move this up out of the notes since its required by default. - Ben Mahler On July 23, 2015, 12:01 p.m., Jan Schlicht wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/36617/ > ----------------------------------------------------------- > > (Updated July 23, 2015, 12:01 p.m.) > > > Review request for mesos and Joerg Schad. > > > Bugs: MESOS-3127 > https://issues.apache.org/jira/browse/MESOS-3127 > > > Repository: mesos > > > Description > ------- > > Improved task reconciliation documentation. > > > Diffs > ----- > > docs/reconciliation.md 17537ba8420c95d833e64ccf82ff9bb4530497f0 > > Diff: https://reviews.apache.org/r/36617/diff/ > > > Testing > ------- > > https://gist.github.com/nfnt/73532d62fe39d27ff33d > > > Thanks, > > Jan Schlicht > >
