> On Nov. 4, 2015, 2:43 a.m., Ben Mahler wrote:
> > src/master/master.cpp, line 6039
> > <https://reviews.apache.org/r/39873/diff/2/?file=1114216#file1114216line6039>
> >
> >     Why not have the same logic here to be defensive? Or do you intend to 
> > guard against it? Just seems weird to allow it in one of the cases but not 
> > the other.
> 
> Vinod Kone wrote:
>     i didn't do it in the master's case because it would be a bug in the code 
> if the master is trying to call updateTask() for an already terminated task! 
> we have to do it for updates from slave because we don't control the 
> executor's updates. i could make it a CHECK for the master case though.

Yeah I know it's a bug in the if it happens :)

Also, if you leave as it without a guard or the same logic, what happens when 
we introduce the bug in the master that transitions to another terminal state? 
I'm fine with either the guard (adding a CHECK) or using the same logic.


> On Nov. 4, 2015, 2:43 a.m., Ben Mahler wrote:
> > src/master/master.cpp, lines 6004-6013
> > <https://reviews.apache.org/r/39873/diff/2/?file=1114216#file1114216line6004>
> >
> >     Just looking at the patch that introduced this bug, why are we removing 
> > the out-of-order update prevention? Don't see any mention of this in the 
> > stuff around https://issues.apache.org/jira/browse/MESOS-2864.
> 
> Vinod Kone wrote:
>     There are couple reasons for this
>     
>     1) it's hard to figure out if an update is an out of order update. for 
> example, if TASK_STARTING, TASK_RUNNING and TASK_FINISHED are all enqued on 
> the slave, the master might receive non-terminal (TASK_RUNNING) and terminal 
> (TASK_FINISHED) after it transitioned the task to terminal state 
> (TASK_STARTING update w/ latest state as TASK_FINISHED).
>     
>     2) when updateTask() is called, typically an update is being forwarded to 
> the scheduler. so while it might be ok to not transition the (latest) state 
> of the task, it is important to set the task.status_update_state and 
> task.status_update_uuid correctly. so we shouldn't short-circuit the function 
> without setting these.
>     
>     makes sense?

Isn't the reason that the check was just wrong? Or did we break it when we 
started storing the latest state? It looks wrong to me here, since the 
task->state is the latest state: https://reviews.apache.org/r/38051/diff/6#0


- Ben


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/39873/#review105027
-----------------------------------------------------------


On Nov. 2, 2015, 10:34 p.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/39873/
> -----------------------------------------------------------
> 
> (Updated Nov. 2, 2015, 10:34 p.m.)
> 
> 
> Review request for mesos and Ben Mahler.
> 
> 
> Bugs: MESOS-2864
>     https://issues.apache.org/jira/browse/MESOS-2864
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> The master now doesn't change the latest state of a task if it has already 
> terminated. But it still updates the status update state and uuid.
> 
> 
> Diffs
> -----
> 
>   src/master/master.cpp 2bc5a97a5b50c8a8a9902c47b2e9e3b5216d97ea 
> 
> Diff: https://reviews.apache.org/r/39873/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> Ran the "RecoverCompletedExecutor" test 100 times as it was failing most of 
> the time without this fix.
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>

Reply via email to