Thank you all for the feedback. To summarize, not killing tasks for non-Partition Aware frameworks will make the schedulers see a higher volume of non terminal updates for tasks for which they have already received a TASK_LOST but nothing new that they are not seeing today. So, this shouldn’t be a breaking change for frameworks and this will make the partition awareness logic simpler. I will update MESOS-7215 <https://issues.apache.org/jira/browse/MESOS-7215> with the details once the design is ready.
Thanks Megha Sharma On Jun 1, 2017, at 2:56 PM, Vinod Kone <vinodk...@apache.org> wrote: On Thu, Jun 1, 2017 at 2:22 PM, Benjamin Mahler <bmah...@apache.org> wrote: If I understood correctly, the proposal is to not kill the tasks for non-partition aware frameworks? That seems like a pretty big change for frameworks that are not partition aware and expect the old killing semantics. Adding to what Neil said, I think most (if not all) non-PA frameworks would've already rescheduled the task after seeing a TASK_LOST. The difference is that previously such tasks can come back to TASK_RUNNING iff master fails over and non-strict registry (default) is used. Now, we are saying tasks can come back to TASK_RUNNING irrespective of master fail over. The assumption/hope is that this shouldn't break existing frameworks in a catastrophic way. > On Jun 1, 2017, at 2:30 PM, Neil Conway <neil.con...@gmail.com> wrote: > > Hi Ben, > > The argument for changing the semantics is that correct frameworks > should _always_ have accounted for the possibility that TASK_LOST > tasks would go back to running (due to the non-strict registry > semantics). The proposed change would just increase the probability of > this behavior occurring. From a certain POV, this change would > actually make it easier to write correct frameworks because the > TASK_LOST scenario will be less of a corner case :) > > Implementing the task-killing behavior is a bit tricky, because the > task might continue to run on the agent for a considerable period of > time. During that time, we can either: > > (a) omit the being-killed task from the master's memory (current > behavior). That means that any resources used by the task appear to be > unused, so there might be a concurrent task launch that attempts to > use them and fails. > > (b) track the being-killed task in the master's memory. This ensures > the task's resources are not re-offered until the task is actually > terminated. The concern here is that this "being-killed" task is in a > weird state -- what task status should it have? When it finally dies, > we don't want to report a terminal status update back to frameworks > (for backward compatibility). > > Neither of those approaches seemed ideal, hence we are wondering > whether we really need to implement this backward compatibility > behavior in the first place. > > Neil > > On Thu, Jun 1, 2017 at 2:22 PM, Benjamin Mahler <bmah...@apache.org> wrote: >> If I understood correctly, the proposal is to not kill the tasks for >> non-partition aware frameworks? That seems like a pretty big change for >> frameworks that are not partition aware and expect the old killing >> semantics. >> >> It seems like we should just directly fix the issue, do you have a sense of >> what the difficulty is there? Is it the re-use of the existing framework >> shutdown message to kill the tasks that makes this problematic? >> >> On Fri, May 26, 2017 at 3:19 PM, Megha Sharma <mshar...@apple.com> wrote: >>> >>> Hi All, >>> >>> We are working on fixing a potential issue MESOS-7215 with partition >>> awareness which happens when an unreachable agent, with tasks for >>> non-Partition Aware frameworks, attempts to re-register with the master. >>> Before the support for partition-aware frameworks, which was introduced in >>> Mesos 1.1.0 MESOS-5344, if an agent partitioned from the master attempted >>> to re-register, then it will be shut down and all the tasks on the agent >>> would be terminated. With this feature, the partitioned agents were no >>> longer shut down by the master when they re-registered but to keep the old >>> behavior the tasks on these agents were still shutdown if the corresponding >>> framework didn’t opt-in to partition awareness. >>> >>> One of the possible solutions to address the issue mentioned in MESOS-7215 >>> is to change master’s behavior to not kill the tasks for non-Partition aware >>> frameworks when an unreachable agent re-registers with the master. When an >>> agent goes unreachable i.e. fails the masters health check ping for >>> max_agent_ping_timeouts then the master sends TASK_LOST status updates for >>> all the tasks on this agent which have been launched by non-Partition Aware >>> frameworks. So, if such tasks are no longer killed by the master then upon >>> agent re-registration the frameworks will see a non-terminal status updates >>> for tasks for which they already received a TASK_LOST. >>> This change will hopefully not break any schedulers since it could have >>> happened in the past with non-strict registry as well and schedulers are >>> expected to be resilient enough to handle this scenario. >>> >>> For the proposed solution we wanted to get feedback from the community to >>> ensure that this change doesn’t break or cause any side effects for the >>> schedulers. Looking forward to any feedbacks/comments. >>> >>> Many Thanks >>> Megha >>> >>> >>