Re: Question on resource offers and framework failover

Sharma Podila Fri, 16 May 2014 18:14:25 -0700

>
> I'm not sure these two cases are any different. The TASK_INVALID_OFFER
> would model a terminal state for the task. Afterwards, one still has to
> generate a new "TaskInfo" in so far as the TaskID should not be re-used
> across launch requests.


I was expecting to reuse the TaskID. If it can't be reused, then, agreed, I
do not see a difference in the two cases then.

Reconciliation currently occurs in three cases, from the master's
> perspective:
>   (1) If the slave is unknown, we send TASK_LOST.
>   (2) If the task is missing on the slave, we send TASK_LOST.
>   (3) If the task state differs, we send the latest state.
> In the absence of bugs or data loss, (1) is the only one that is strictly
> necessary for correctness. In your case, (1) or (2) would result in the
> master sending back TASK_LOST since the task (or possibly the slave) is no
> longer present from the Master's perspective.


My understanding is that the call from my framework to
SchedulerDriver.reconcileTasks() ends up in Mesos' master.cpp
reconcileTasks() at line 2142 (Mesos 0.18). In there there is no else case
for "if(slave != NULL)". There is never a TASK_LOST sent from there. Only
task updates for tasks and slaves Mesos knows currently that have different
status. I have seen this happen in my testing - if I were to ask
reconciliation for a task that doesn't exist, there is no answer. The
"doesn't exist" case occurs when my framework were to loose any previously
sent terminal status update for the task.

Is it possible that your description above refers to the reconciliation
that happens when framework registers with Mesos? Where as, my question is
about when an already registered framework calls for reconciliation. Or, is
the code location I refer to above incorrect?

I do agree that it would be nice if we provided a mechanism to reconcile
> these scenarios as well, given that bugs can and will occur! What are the
> operational causes you were referring to? I've filed 
> MESOS-1379<https://issues.apache.org/jira/browse/MESOS-1379> for
> this.
> Barring bugs or data loss, if the framework persists its intent before
> launching a task, then the set of tasks in the framework will always be a
> superset of the tasks in the Master/Slaves.


Thanks for filing that. Data loss is what I had in mind. Say a framework
restarts after its persistence store crashes hard and had to be rebuilt
from a backup/replica which maybe a bit behind. It would then be unaware of
some of the newer tasks.




On Thu, May 15, 2014 at 4:28 PM, Benjamin Mahler
<benjamin.mah...@gmail.com>wrote:

> Thanks for providing more details!
>
> I'm not sure these two cases are any different. The TASK_INVALID_OFFER
> would model a terminal state for the task. Afterwards, one still has to
> generate a new "TaskInfo" in so far as the TaskID should not be re-used
> across launch requests.
>
> *For example, what if reconciliation is requested on a task that completed
>> a long time ago? For which Mesos may have already sent a status of
>> completion and/or lost, but my framework somehow lost that. Hopefully, this
>> and other possible cases are addressed. *
>
>
> Reconciliation currently occurs in three cases, from the master's
> perspective:
>   (1) If the slave is unknown, we send TASK_LOST.
>   (2) If the task is missing on the slave, we send TASK_LOST.
>   (3) If the task state differs, we send the latest state.
>
> In the absence of bugs or data loss, (1) is the only one that is strictly
> necessary for correctness. In your case, (1) or (2) would result in the
> master sending back TASK_LOST since the task (or possibly the slave) is no
> longer present from the Master's perspective.
>
> *What about tasks that Mesos is running for my framework, but my framework
>> lost track of them (there could be some operational causes for this, even
>> if we assume my code is bug free)? How are frameworks handling such a
>> scenario?*
>
>
> I do agree that it would be nice if we provided a mechanism to reconcile
> these scenarios as well, given that bugs can and will occur! What are the
> operational causes you were referring to? I've filed 
> MESOS-1379<https://issues.apache.org/jira/browse/MESOS-1379> for
> this.
>
> Barring bugs or data loss, if the framework persists its intent before
> launching a task, then the set of tasks in the framework will always be a
> superset of the tasks in the Master/Slaves.
>
> On Wed, May 14, 2014 at 11:04 PM, Sharma Podila <spod...@netflix.com>wrote:
>
>> TASK_LOST is a good thing. I expect to deal with it now and in the
>> future. I was trying to distinguish this:
>>
>>    - case TASK_LOST:
>>       - persist state update to TASK_LOST
>>       - create new task submission request
>>       - schedule with next available offer
>>    - case TASK_INVALID_OFFER:
>>       - persist state update to PENDING (i.e., from Launched back to
>>       Pending)
>>       - schedule with next available offer
>>
>> The difference is "create new task submission request". Although this
>> would be undesirable, and an additional call into persistence state, I can
>> see that this is an unlikely event. In which case, introducing complexity
>> to differentiate the two cases may not be a critical need. As I was saying,
>> "strictly speaking" there's a difference.
>>
>> are you still planning to do this out-of-band reconciliation when Mesos
>>> provides complete reconciliation (thanks to the Registrar)? Mesos will
>>> ensure that the situation you describe is not possible (in 0.19.0
>>> optionally, and in 0.20.0 by default).
>>
>>
>> It would be nice to not have to do it. Depends on what complete
>> reconciliation entails. For example, what if reconciliation is requested on
>> a task that completed a long time ago? For which Mesos may have already
>> sent a status of completion and/or lost, but my framework somehow lost
>> that. Hopefully, this and other possible cases are addressed.
>>
>> This brings up another question: Reconciliation addresses tasks that my
>> framework knows about. What about tasks that Mesos is running for my
>> framework, but my framework lost track of them (there could be some
>> operational causes for this, even if we assume my code is bug free)? How
>> are frameworks handling such a scenario?
>>
>>
>>
>> On Wed, May 14, 2014 at 4:05 PM, Benjamin Mahler <
>> benjamin.mah...@gmail.com> wrote:
>>
>>> Where as, a TASK_LOST will make me (unnecessarily, in this case) try to
>>>> ensure that the task is actually lost, not running away on the slave that
>>>> got disconnected from Mesos master. Not all environments may need the
>>>> distinction, but at least some do.
>>>
>>>
>>> To be clear, are you still planning to do this out-of-band
>>> reconciliation when Mesos provides complete reconciliation (thanks to the
>>> Registrar)? Mesos will ensure that the situation you describe is not
>>> possible (in 0.19.0 optionally, and in 0.20.0 by default).
>>>
>>> Taking a step back, you will always have to deal with TASK_LOST as a
>>> status *regardless* of what the true status of the task was, this is the
>>> reality of failures in a distributed system. For example, let's say the
>>> Master fails right before we could send you the TASK_INVALID_OFFER update,
>>> or your framework fails right before it could persist the
>>> TASK_INVALID_OFFER update. In both cases, you will need to reconcile with
>>> the Master, and it will be TASK_LOST.
>>>
>>> Likewise, let's say your TASK_FINISHED on the slave, but the slave fails
>>> permanently before the update could reach the Master. Then when you
>>> reconcile this with the Master it will be TASK_LOST.
>>>
>>> For these reasons, we haven't yet found much value in providing more
>>> precise task states for various conditions.
>>>
>>>
>>> On Tue, May 13, 2014 at 10:10 AM, Sharma Podila <spod...@netflix.com>wrote:
>>>
>>>> Thanks for confirming that, Adam.
>>>> 
>>>>
>>>>> , but it would be a good Mesos FAQ topic.
>>>>
>>>> I was thinking it might be good to also add to doc in code, either in
>>>> mesos.proto or MesosSchedulerDriver (mesos.proto already refers to the
>>>> latter for failover at FrameworkID definition).
>>>>
>>>> If you were to try to persist the 'ephemeral' offers to another
>>>>> framework instance, and call launchTasks with one of the old offers, the
>>>>> master will respond with TASK_LOST ("Task launched with invalid offers"),
>>>>> since the master no longer knows about that offer
>>>>
>>>> Strictly speaking, shouldn't this produce some kind of an 'invalid
>>>> offer' response instead of task being lost? A TASK_LOST response is handled
>>>> differently in my scheduler, for example, compared to what I'd do for an
>>>> invalid offer response. An invalid offer would just be a simple discard
>>>> offer and retry of launch with a more recent offer. Where as, a TASK_LOST
>>>> will make me (unnecessarily, in this case) try to ensure that the task is
>>>> actually lost, not running away on the slave that got disconnected from
>>>> Mesos master. Not all environments may need the distinction, but at least
>>>> some do.
>>>>
>>>>
>>>>
>>>> On Mon, May 12, 2014 at 11:12 PM, Adam Bordelon <a...@mesosphere.io>wrote:
>>>>
>>>>> Correct, Sharma. I don't think this is documented anywhere yet, but it
>>>>> would be a good Mesos FAQ topic.
>>>>> When the master notices that the framework has exited or is
>>>>> deactivated, it disables the framework in the allocator so no new offers
>>>>> will be made to that framework, and removes any outstanding offers (but
>>>>> does not send a RescindResourceOfferMessage to the framework, since the
>>>>> framework is presumably failing over). When a framework reregisters, it is
>>>>> reactivated in the allocator and will start receiving new offers again.
>>>>> If you were to try to persist the 'ephemeral' offers to another
>>>>> framework instance, and call launchTasks with one of the old offers, the
>>>>> master will respond with TASK_LOST ("Task launched with invalid offers"),
>>>>> since the master no longer knows about that offer. So don't bother trying.
>>>>> :)
>>>>> Already running tasks (used offers) continue running, unless the
>>>>> framework failover timeout is exceeded.
>>>>>
>>>>>
>>>>> On Mon, May 12, 2014 at 5:38 PM, Sharma Podila <spod...@netflix.com>wrote:
>>>>>
>>>>>> My understanding is that when a framework fails over (either new
>>>>>> instance starts after previous one fails, or the same instance restarts),
>>>>>> Mesos master would automatically cancel any unused offers it had given to
>>>>>> the previous framework instance. This is a good thing. Can someone 
>>>>>> confirm
>>>>>> this to be the case? Is such an expectation documented somewhere? I did
>>>>>> look at master.cpp and I hope I interpreted it right.
>>>>>>
>>>>>> Effectively then, the offers are 'ephemeral' and don't need to be
>>>>>> persisted by the framework scheduler to pass along to another of its
>>>>>> instance that may failover as the leader.
>>>>>>
>>>>>> Thank you.
>>>>>>
>>>>>> Sharma
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Question on resource offers and framework failover

Reply via email to