-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64940/#review194794
-----------------------------------------------------------




src/master/master.cpp
Lines 10037-10056 (original), 10039-10062 (patched)
<https://reviews.apache.org/r/64940/#comment273834>

    I think we shouldn't create a TASK_UNREACHABLE status update and call 
`updateTask` or `forward` for a terminal task at all. . Also, `forward` sends 
TASK_UNREACHABLE update for terminal task to the framework which looks 
incorrect.
    
    Ideally, we want terminal but unacknowledged tasks to still be marked 
unreachable in some way, either via task state being TASK_UNREACHABLE or task 
being present in `unreachableTasks`. This allows, for example, the WebUI to not 
show sandbox links for unreachable tasks irrespective of whether they were 
terminal or not before going unreachable. 
    
    But doing this is tricky for various reasons:
    
    --> `updateTask()` doesn't allow a terminal state to be transitioned to 
TASK_UNREACHABLE. Right now when we call `updateTask` for a terminal task, it 
adds TASK_UNREACHABLE status to `Task.statuses` and also sends it to operator 
API stream subscribers which looks incorrect. The fact that `updateTask` 
internally deals with already terminal tasks is a bad design decision in 
retrospect. I think the callers shouldn't call it for terminal tasks instead.
    
    --> It's not clear to our users what a `completed` task means. The 
intention was for this to hold a cache of terminal and acknowledged tasks for 
storing recent history. The users of the WebUI probably equate "Completed 
Tasks" to terminal tasks irrespective of their acknowledgement status, which is 
why it is confusing for them to see terminal but unacknowledged tasks in the 
"Active tasks" section in the WebUI.
    
    --> When a framework reconciles the state of a task on an unreachable 
agent, master replies with TASK_UNREACHABLE irrespective of whether the task 
was in a non-terminal state or terminal but un-acknowledged state or terminal 
and acknowledged state when the agent went unreachable.  
    
    I think the direction we want to go towards is
    
    --> Completed tasks should consist of terminal unacknowledged and terminal 
acknowled tasks, likely in two different data structures.
    --> Unreachable tasks should consist of all non-complete tasks on an 
unreachable agent.  All the tasks in this map should be in TASK_UNREACHABLE 
state.
    
    Given all the above is a very involved change, I would recommend keeping 
what you have here but with a giant TODO (your current comment in #10058 
doesn't go into enough detail about the complexity here) that talks about the 
above stuff. Your change at least keeps the parity with the (broken) semantics 
that we have in 1.4 and earlier so that's a bit better.



src/master/master.cpp
Line 10273 (original), 10279-10283 (patched)
<https://reviews.apache.org/r/64940/#comment273836>

    I would kill this CHECK in favor of the CHECK that already exists in the 
`Framework::removeTask`. AFAICT `Master::removeTask` shouldn't care about it.



src/tests/master_tests.cpp
Lines 7601 (patched)
<https://reviews.apache.org/r/64940/#comment273838>

    How about:
    
    // This test verifies that when the master removes an agent any 
unacknowledged but terminal tasks
    // on the agent are moved to `completed` but keep their original terminal 
state.



src/tests/master_tests.cpp
Lines 7630 (patched)
<https://reviews.apache.org/r/64940/#comment273843>

    `the task`, which task? are both the tasks using the same executor?



src/tests/master_tests.cpp
Lines 7630-7638 (patched)
<https://reviews.apache.org/r/64940/#comment273879>

    This seems a bit complicated. You could alternatively just start the 
scheduler driver with explicit acknowledgements and just not acknowledge 
terminal updates when you receive them.



src/tests/master_tests.cpp
Lines 7645-7653 (patched)
<https://reviews.apache.org/r/64940/#comment273841>

    Why are you launching 2 tasks instead of just one? You just want to test 
the unacknowledged terminal task case right? Doesn't really matter if it is 
TASK_FINISHED or TASK_LOST right?



src/tests/master_tests.cpp
Lines 7692-7711 (patched)
<https://reviews.apache.org/r/64940/#comment273845>

    Are you verifying that the latest state of the task has been preserved? I 
would recommend hitting "/tasks" endpoint instead of "state-summary" to be more 
direct.


- Vinod Kone


On Jan. 4, 2018, 1:35 a.m., James Peach wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64940/
> -----------------------------------------------------------
> 
> (Updated Jan. 4, 2018, 1:35 a.m.)
> 
> 
> Review request for mesos, Gaston Kleiman, Jie Yu, and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-8337
>     https://issues.apache.org/jira/browse/MESOS-8337
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> If an agent is lost, we try to remove all the tasks that might
> have been lost. However, if a task is already terminal, it hasn't
> really been lost so we should not be tracking it in the framework's
> unreachable tasks list.
> 
> 
> Diffs
> -----
> 
>   src/master/master.hpp 130f6e28cc62a8912aac66ecfbf014fe1ee444e3 
>   src/master/master.cpp 28d8be3a4769b418b61cff0b95845e4232135bc7 
>   src/tests/partition_tests.cpp 3813139f576ea01db0197f0fe8a73597db1bb69a 
> 
> 
> Diff: https://reviews.apache.org/r/64940/diff/3/
> 
> 
> Testing
> -------
> 
> make check (Fedora 27)
> 
> 
> Thanks,
> 
> James Peach
> 
>

Reply via email to