Connor, Frameworks should only be reconciling tasks that they believe are non-terminal (but it's a race, of course). If you attempt to reconcile a terminal or unknown task, currently you will receive TASK_LOST. However, you will receive the actual terminal update before receiving this TASK_LOST.
When considering reconciliation and status update retries, the status update stream will look like the following: [STAGING, STAGING, ..., RUNNING, RUNNING, ..., FINISHED, FINISHED, ..., LOST, LOST, ...] The invariant here is that the stream is ordered, and may contain duplicates within the order. With reconciliation, you may now receive LOST updates at any point after you receive the initial terminal update. Given that a more formalized description of status update invariants is needed, I think a separate document is warranted. :) On Thu, Oct 16, 2014 at 6:04 PM, Connor Doyle <[email protected]> wrote: > Thanks for writing this up Ben! I have a couple suggestions about > additional details that could be helpful to explain. > > First, could you go a little more in-depth about how this process works > for terminated tasks? For example, how does reconciliation behave for tasks > running on a slave that has become disconnected from the master? An > overview of the various timeouts involved would also be really awesome. > > Second, what happens when a framework attempts to reconcile a task that is > completely unknown to Mesos? An example scenario could be that a task died, > the terminal status update was ACKed, but the scheduler failed over before > this information could be persisted. What task status (if any) does Mesos > respond with? > -- > Connor Doyle > http://mesosphere.io > > > On Oct 15, 2014, at 14:05, Benjamin Mahler <[email protected]> > wrote: > > > Hi all, > > > > I've sent a review out for a document describing reconciliation, you can > see the draft here: > > https://gist.github.com/bmahler/18409fc4f052df43f403 > > > > Would love to gather high level feedback on it from framework > developers. Feel free to reply here, or on the review: > > https://reviews.apache.org/r/26669/ > > > > Thanks! > > Ben > >

