Re: RFC: Partition Awareness

2017-10-05 Thread James Peach
> On Jun 21, 2017, at 10:16 AM, Megha Sharma wrote: > > Thank you all for the feedback. > To summarize, not killing tasks for non-Partition Aware frameworks will make > the schedulers see a higher volume of non terminal updates for tasks for > which they have already

Re: RFC: Partition Awareness

2017-06-21 Thread Megha Sharma
Thank you all for the feedback. To summarize, not killing tasks for non-Partition Aware frameworks will make the schedulers see a higher volume of non terminal updates for tasks for which they have already received a TASK_LOST but nothing new that they are not seeing today. So, this shouldn’t

Re: RFC: Partition Awareness

2017-06-01 Thread Vinod Kone
On Thu, Jun 1, 2017 at 2:22 PM, Benjamin Mahler wrote: > If I understood correctly, the proposal is to not kill the tasks for > non-partition aware frameworks? That seems like a pretty big change for > frameworks that are not partition aware and expect the old killing >

Re: RFC: Partition Awareness

2017-06-01 Thread Neil Conway
Hi Ben, The argument for changing the semantics is that correct frameworks should _always_ have accounted for the possibility that TASK_LOST tasks would go back to running (due to the non-strict registry semantics). The proposed change would just increase the probability of this behavior

Re: RFC: Partition Awareness

2017-06-01 Thread Benjamin Mahler
If I understood correctly, the proposal is to not kill the tasks for non-partition aware frameworks? That seems like a pretty big change for frameworks that are not partition aware and expect the old killing semantics. It seems like we should just directly fix the issue, do you have a sense of

RFC: Partition Awareness

2017-05-26 Thread Megha Sharma
Hi All, We are working on fixing a potential issue MESOS-7215 with partition awareness which happens when an unreachable agent, with tasks for non-Partition Aware frameworks, attempts to re-register with the master. Before the support for