Re: RFC: Partition Awareness

Megha Sharma Wed, 21 Jun 2017 10:17:02 -0700

Thank you all for the feedback.
To summarize, not killing tasks for non-Partition Aware frameworks will make 
the schedulers see a higher volume of non terminal updates for tasks for which 
they have already received a TASK_LOST but nothing new that they are not seeing 
today. So, this shouldn’t be a breaking change for frameworks and this will 
make the partition awareness logic simpler. I will update MESOS-7215 
<https://issues.apache.org/jira/browse/MESOS-7215> with the details once the 
design is ready.


Thanks
Megha Sharma

On Jun 1, 2017, at 2:56 PM, Vinod Kone <vinodk...@apache.org> wrote:

On Thu, Jun 1, 2017 at 2:22 PM, Benjamin Mahler <bmah...@apache.org> wrote:

If I understood correctly, the proposal is to not kill the tasks for
non-partition aware frameworks? That seems like a pretty big change for
frameworks that are not partition aware and expect the old killing
semantics.


Adding to what Neil said, I think most (if not all) non-PA frameworks
would've already rescheduled the task after seeing a TASK_LOST. The
difference is that previously such tasks can come back to TASK_RUNNING iff
master fails over and non-strict registry (default) is used. Now, we are
saying tasks can come back to TASK_RUNNING irrespective of master fail
over. The assumption/hope is that this shouldn't break existing frameworks
in a catastrophic way.

> On Jun 1, 2017, at 2:30 PM, Neil Conway <neil.con...@gmail.com> wrote:
> 
> Hi Ben,
> 
> The argument for changing the semantics is that correct frameworks
> should _always_ have accounted for the possibility that TASK_LOST
> tasks would go back to running (due to the non-strict registry
> semantics). The proposed change would just increase the probability of
> this behavior occurring. From a certain POV, this change would
> actually make it easier to write correct frameworks because the
> TASK_LOST scenario will be less of a corner case :)
> 
> Implementing the task-killing behavior is a bit tricky, because the
> task might continue to run on the agent for a considerable period of
> time. During that time, we can either:
> 
> (a) omit the being-killed task from the master's memory (current
> behavior). That means that any resources used by the task appear to be
> unused, so there might be a concurrent task launch that attempts to
> use them and fails.
> 
> (b) track the being-killed task in the master's memory. This ensures
> the task's resources are not re-offered until the task is actually
> terminated. The concern here is that this "being-killed" task is in a
> weird state -- what task status should it have? When it finally dies,
> we don't want to report a terminal status update back to frameworks
> (for backward compatibility).
> 
> Neither of those approaches seemed ideal, hence we are wondering
> whether we really need to implement this backward compatibility
> behavior in the first place.
> 
> Neil
> 
> On Thu, Jun 1, 2017 at 2:22 PM, Benjamin Mahler <bmah...@apache.org> wrote:
>> If I understood correctly, the proposal is to not kill the tasks for
>> non-partition aware frameworks? That seems like a pretty big change for
>> frameworks that are not partition aware and expect the old killing
>> semantics.
>> 
>> It seems like we should just directly fix the issue, do you have a sense of
>> what the difficulty is there? Is it the re-use of the existing framework
>> shutdown message to kill the tasks that makes this problematic?
>> 
>> On Fri, May 26, 2017 at 3:19 PM, Megha Sharma <mshar...@apple.com> wrote:
>>> 
>>> Hi All,
>>> 
>>> We are working on fixing a potential issue MESOS-7215 with partition
>>> awareness which happens when an unreachable agent, with tasks for
>>> non-Partition Aware frameworks, attempts to re-register with the master.
>>> Before the support for partition-aware frameworks, which was introduced in
>>> Mesos 1.1.0 MESOS-5344,  if an agent partitioned from the master attempted
>>> to re-register, then it will be shut down and all the tasks on the agent
>>> would be terminated. With this feature, the partitioned agents were no
>>> longer shut down by the master when they re-registered but to keep the old
>>> behavior the tasks on these agents were still shutdown if the corresponding
>>> framework didn’t opt-in to partition awareness.
>>> 
>>> One of the possible solutions to address the issue mentioned in MESOS-7215
>>> is to change master’s behavior to not kill the tasks for non-Partition aware
>>> frameworks when an unreachable agent re-registers with the master. When an
>>> agent goes unreachable i.e. fails the masters health check ping for
>>> max_agent_ping_timeouts then the master sends TASK_LOST status updates for
>>> all the tasks on this agent which have been launched by non-Partition Aware
>>> frameworks. So, if such tasks are no longer killed by the master then upon
>>> agent re-registration the frameworks will see a non-terminal status updates
>>> for tasks for which they already received a TASK_LOST.
>>> This change will hopefully not break any schedulers since it could have
>>> happened in the past with non-strict registry as well and schedulers are
>>> expected to be resilient enough to handle this scenario.
>>> 
>>> For the proposed solution we wanted to get feedback from the community to
>>> ensure that this change doesn’t break or cause any side effects for the
>>> schedulers. Looking forward to any feedbacks/comments.
>>> 
>>> Many Thanks
>>> Megha
>>> 
>>> 
>>

Re: RFC: Partition Awareness

Reply via email to