Thanks for writing this up Ben! I have a couple suggestions about additional details that could be helpful to explain.
First, could you go a little more in-depth about how this process works for terminated tasks? For example, how does reconciliation behave for tasks running on a slave that has become disconnected from the master? An overview of the various timeouts involved would also be really awesome. Second, what happens when a framework attempts to reconcile a task that is completely unknown to Mesos? An example scenario could be that a task died, the terminal status update was ACKed, but the scheduler failed over before this information could be persisted. What task status (if any) does Mesos respond with? -- Connor Doyle http://mesosphere.io On Oct 15, 2014, at 14:05, Benjamin Mahler <benjamin.mah...@gmail.com> wrote: > Hi all, > > I've sent a review out for a document describing reconciliation, you can see > the draft here: > https://gist.github.com/bmahler/18409fc4f052df43f403 > > Would love to gather high level feedback on it from framework developers. > Feel free to reply here, or on the review: > https://reviews.apache.org/r/26669/ > > Thanks! > Ben