James Peach created MESOS-8405:
----------------------------------

             Summary: Update master task loss handling.
                 Key: MESOS-8405
                 URL: https://issues.apache.org/jira/browse/MESOS-8405
             Project: Mesos
          Issue Type: Bug
            Reporter: James Peach


>From [~agentvindo.dev] in [r/64940|https://reviews.apache.org/r/64940/]:

{quote}
Ideally, we want terminal but unacknowledged tasks to still be marked 
unreachable in some way, either via task state being TASK_UNREACHABLE or task 
being present in unreachableTasks. This allows, for example, the WebUI to not 
show sandbox links for unreachable tasks irrespective of whether they were 
terminal or not before going unreachable. 

But doing this is tricky for various reasons:

--> updateTask() doesn't allow a terminal state to be transitioned to 
TASK_UNREACHABLE. Right now when we call updateTask for a terminal task, it 
adds TASK_UNREACHABLE status to Task.statuses and also sends it to operator API 
stream subscribers which looks incorrect. The fact that updateTask internally 
deals with already terminal tasks is a bad design decision in retrospect. I 
think the callers shouldn't call it for terminal tasks instead.

--> It's not clear to our users what a completed task means. The intention was 
for this to hold a cache of terminal and acknowledged tasks for storing recent 
history. The users of the WebUI probably equate "Completed Tasks" to terminal 
tasks irrespective of their acknowledgement status, which is why it is 
confusing for them to see terminal but unacknowledged tasks in the "Active 
tasks" section in the WebUI.

--> When a framework reconciles the state of a task on an unreachable agent, 
master replies with TASK_UNREACHABLE irrespective of whether the task was in a 
non-terminal state or terminal but un-acknowledged state or terminal and 
acknowledged state when the agent went unreachable.  

I think the direction we want to go towards is

--> Completed tasks should consist of terminal unacknowledged and terminal 
acknowled tasks, likely in two different data structures.
--> Unreachable tasks should consist of all non-complete tasks on an 
unreachable agent.  All the tasks in this map should be in TASK_UNREACHABLE 
state.
{quote}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to