-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/61645/#review183502
-----------------------------------------------------------




src/slave/slave.hpp
Lines 883 (patched)
<https://reviews.apache.org/r/61645/#comment259514>

    s/the/a pending/ ?



src/slave/slave.hpp
Lines 884 (patched)
<https://reviews.apache.org/r/61645/#comment259515>

    s/getPendingTaskGroup/getTaskGroupForPendingTask/



src/slave/slave.cpp
Line 1796 (original), 1796 (patched)
<https://reviews.apache.org/r/61645/#comment259516>

    Update the outdated comment.



src/slave/slave.hpp
Lines 879 (patched)
<https://reviews.apache.org/r/61645/#comment259554>

    Can you add a comment here that adding a taskgroup also adds the 
corresponding tasks into `pendingTasks` ?



src/slave/slave.cpp
Line 1909 (original), 1912 (patched)
<https://reviews.apache.org/r/61645/#comment259557>

    s/Present/Pending/ ?



src/slave/slave.cpp
Lines 1923 (patched)
<https://reviews.apache.org/r/61645/#comment259558>

    can you log the `taskOrTaskgroup` for easier debugging?



src/slave/slave.cpp
Line 2070 (original), 2051 (patched)
<https://reviews.apache.org/r/61645/#comment259556>

    s/Present/Pending/ ?



src/slave/slave.cpp
Lines 2062 (patched)
<https://reviews.apache.org/r/61645/#comment259560>

    can you log the `taskOrTaskgroup` for easier debugging?



src/slave/slave.cpp
Lines 4451 (patched)
<https://reviews.apache.org/r/61645/#comment259567>

    Can you add a note/TODO here that if the terminal update for the pending 
task is retried by the status update manager, it might potentially get dropped 
if the framework gets removed? Ideally, we should transition the pending task 
to terminated tasks instead of removing it from the map and forgetting about it.



src/slave/slave.cpp
Lines 4459 (patched)
<https://reviews.apache.org/r/61645/#comment259568>

    Ideally if the executor is available, you should do a checkpointing update; 
if not available you can do a non-checkpointing update.



src/slave/slave.cpp
Lines 7512 (patched)
<https://reviews.apache.org/r/61645/#comment259563>

    s/the task/the pending task/



src/slave/slave.cpp
Lines 7545-7570 (patched)
<https://reviews.apache.org/r/61645/#comment259569>

    This is a bit hard to read but am not sure how to improve it yet.


- Vinod Kone


On Aug. 22, 2017, 8:19 p.m., Benjamin Mahler wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/61645/
> -----------------------------------------------------------
> 
> (Updated Aug. 22, 2017, 8:19 p.m.)
> 
> 
> Review request for mesos, Anand Mazumdar and Vinod Kone.
> 
> 
> Bugs: MESOS-7783 and MESOS-7863
>     https://issues.apache.org/jira/browse/MESOS-7783
>     https://issues.apache.org/jira/browse/MESOS-7863
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Per the description of MESOS-7863, there is currently an assumption
> that when a pending task is killed, the framework will be stored in
> the agent when the launch proceeds for the killed task. When this
> assumption does not hold, the TASK_KILLED update will be dropped
> due to the frameowrk being unknown when the launch proceeds. This
> assumption doesn't hold in two cases:
> 
>   (1) Another pending task was killed and we removed the framework
>       in 'Slave::run' thinking it was idle, because pending tasks
>       was empty (we remove from pending tasks when processing the
>       kill). (MESOS-7783 is an example instance of this).
> 
>   (2) The last executor terminated without tasks to send terminal
>       updates for, or the last terminated executor received its
>       last acknowledgement. At this point, we remove the framework
>       thinking there were no pending tasks if the task was killed
>       (removed from pending).
> 
> The fix applied here is to send the status updates from the kill
> path rather than the launch path, to be consistent with how we kill
> tasks queued within the Executor struct. We ensure that the task
> is removed synchronously within the kill path to prevent its launch.
> 
> 
> Diffs
> -----
> 
>   src/slave/slave.hpp 0e07a1af733003bb897cbebb7c1f64437063353d 
>   src/slave/slave.cpp 50d2a10cd68f6611efd4e691e5325e6e0c06f33a 
>   src/tests/slave_tests.cpp a8b46a450c4f3986139390aeed5544e7f9091781 
> 
> 
> Diff: https://reviews.apache.org/r/61645/diff/2/
> 
> 
> Testing
> -------
> 
> Added a test in a subsequent patch.
> 
> 
> Thanks,
> 
> Benjamin Mahler
> 
>

Reply via email to