> On Aug. 22, 2017, 11:42 p.m., Vinod Kone wrote:
> > src/slave/slave.cpp
> > Lines 4451 (patched)
> > <https://reviews.apache.org/r/61645/diff/2/?file=1801801#file1801801line4501>
> >
> >     Can you add a note/TODO here that if the terminal update for the 
> > pending task is retried by the status update manager, it might potentially 
> > get dropped if the framework gets removed? Ideally, we should transition 
> > the pending task to terminated tasks instead of removing it from the map 
> > and forgetting about it.

Added a TODO that we should track it like we do within the executor struct, but 
didn't mention that issue since `Slave::forward` seems to be ok with a removed 
framework.


- Benjamin


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/61645/#review183502
-----------------------------------------------------------


On Aug. 23, 2017, 9:29 p.m., Benjamin Mahler wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/61645/
> -----------------------------------------------------------
> 
> (Updated Aug. 23, 2017, 9:29 p.m.)
> 
> 
> Review request for mesos, Anand Mazumdar and Vinod Kone.
> 
> 
> Bugs: MESOS-7783 and MESOS-7863
>     https://issues.apache.org/jira/browse/MESOS-7783
>     https://issues.apache.org/jira/browse/MESOS-7863
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Per the description of MESOS-7863, there is currently an assumption
> that when a pending task is killed, the framework will be stored in
> the agent when the launch proceeds for the killed task. When this
> assumption does not hold, the TASK_KILLED update will be dropped
> due to the frameowrk being unknown when the launch proceeds. This
> assumption doesn't hold in two cases:
> 
>   (1) Another pending task was killed and we removed the framework
>       in 'Slave::run' thinking it was idle, because pending tasks
>       was empty (we remove from pending tasks when processing the
>       kill). (MESOS-7783 is an example instance of this).
> 
>   (2) The last executor terminated without tasks to send terminal
>       updates for, or the last terminated executor received its
>       last acknowledgement. At this point, we remove the framework
>       thinking there were no pending tasks if the task was killed
>       (removed from pending).
> 
> The fix applied here is to send the status updates from the kill
> path rather than the launch path, to be consistent with how we kill
> tasks queued within the Executor struct. We ensure that the task
> is removed synchronously within the kill path to prevent its launch.
> 
> 
> Diffs
> -----
> 
>   src/slave/slave.hpp 0e07a1af733003bb897cbebb7c1f64437063353d 
>   src/slave/slave.cpp 50d2a10cd68f6611efd4e691e5325e6e0c06f33a 
>   src/tests/slave_tests.cpp a8b46a450c4f3986139390aeed5544e7f9091781 
> 
> 
> Diff: https://reviews.apache.org/r/61645/diff/3/
> 
> 
> Testing
> -------
> 
> Added a test in a subsequent patch.
> 
> 
> Thanks,
> 
> Benjamin Mahler
> 
>

Reply via email to