Re: Review Request 39873: Fixed master to properly handle status updates when multiple of them are enqueued on the slave.

2015-11-05 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/39873/
---

(Updated Nov. 5, 2015, 4:26 p.m.)


Review request for mesos and Ben Mahler.


Changes
---

updated comments per benm.


Bugs: MESOS-2864
https://issues.apache.org/jira/browse/MESOS-2864


Repository: mesos


Description
---

The master now doesn't change the latest state of a task if it has already 
terminated. But it still updates the status update state and uuid.


Diffs (updated)
-

  src/master/master.cpp cbae27e7a4059a72bc69e152ec8adaf4ef725965 

Diff: https://reviews.apache.org/r/39873/diff/


Testing
---

make check

Ran the "RecoverCompletedExecutor" test 100 times as it was failing most of the 
time without this fix.


Thanks,

Vinod Kone



Re: Review Request 39873: Fixed master to properly handle status updates when multiple of them are enqueued on the slave.

2015-11-05 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/39873/
---

(Updated Nov. 5, 2015, 6:04 p.m.)


Review request for mesos and Ben Mahler.


Changes
---

rebased to kick off another reviewbot run.


Bugs: MESOS-2864
https://issues.apache.org/jira/browse/MESOS-2864


Repository: mesos


Description
---

The master now doesn't change the latest state of a task if it has already 
terminated. But it still updates the status update state and uuid.


Diffs (updated)
-

  src/master/master.cpp cbae27e7a4059a72bc69e152ec8adaf4ef725965 

Diff: https://reviews.apache.org/r/39873/diff/


Testing
---

make check

Ran the "RecoverCompletedExecutor" test 100 times as it was failing most of the 
time without this fix.


Thanks,

Vinod Kone



Re: Review Request 39873: Fixed master to properly handle status updates when multiple of them are enqueued on the slave.

2015-11-05 Thread Guangya Liu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/39873/#review105370
---

Ship it!


Ship It!

- Guangya Liu


On 十一月 5, 2015, 6:04 p.m., Vinod Kone wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/39873/
> ---
> 
> (Updated 十一月 5, 2015, 6:04 p.m.)
> 
> 
> Review request for mesos and Ben Mahler.
> 
> 
> Bugs: MESOS-2864
> https://issues.apache.org/jira/browse/MESOS-2864
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> The master now doesn't change the latest state of a task if it has already 
> terminated. But it still updates the status update state and uuid.
> 
> 
> Diffs
> -
> 
>   src/master/master.cpp cbae27e7a4059a72bc69e152ec8adaf4ef725965 
> 
> Diff: https://reviews.apache.org/r/39873/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> Ran the "RecoverCompletedExecutor" test 100 times as it was failing most of 
> the time without this fix.
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>



Re: Review Request 39873: Fixed master to properly handle status updates when multiple of them are enqueued on the slave.

2015-11-04 Thread Ben Mahler

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/39873/#review105191
---



src/master/master.cpp (lines 6004 - 6008)


Does this comment belong here? It seems redundant with the comments in the 
protobuf files. Should we just point the reader there instead if you think this 
warrants a comment? Seems like the reader will already refer to the 
'latest_update' field definition to read more.

Also would love to avoid wrapping comments at 80 when they are paragraphs 
(if you compare the paragraph added here vs. the one above 'if 
(update.has_uuid())', the latter seems easier on the eyes when I'm reading the 
code outside of reviewboard).


- Ben Mahler


On Nov. 2, 2015, 10:34 p.m., Vinod Kone wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/39873/
> ---
> 
> (Updated Nov. 2, 2015, 10:34 p.m.)
> 
> 
> Review request for mesos and Ben Mahler.
> 
> 
> Bugs: MESOS-2864
> https://issues.apache.org/jira/browse/MESOS-2864
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> The master now doesn't change the latest state of a task if it has already 
> terminated. But it still updates the status update state and uuid.
> 
> 
> Diffs
> -
> 
>   src/master/master.cpp 2bc5a97a5b50c8a8a9902c47b2e9e3b5216d97ea 
> 
> Diff: https://reviews.apache.org/r/39873/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> Ran the "RecoverCompletedExecutor" test 100 times as it was failing most of 
> the time without this fix.
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>



Re: Review Request 39873: Fixed master to properly handle status updates when multiple of them are enqueued on the slave.

2015-11-04 Thread Ben Mahler


> On Nov. 4, 2015, 2:43 a.m., Ben Mahler wrote:
> > src/master/master.cpp, line 6039
> > 
> >
> > Why not have the same logic here to be defensive? Or do you intend to 
> > guard against it? Just seems weird to allow it in one of the cases but not 
> > the other.
> 
> Vinod Kone wrote:
> i didn't do it in the master's case because it would be a bug in the code 
> if the master is trying to call updateTask() for an already terminated task! 
> we have to do it for updates from slave because we don't control the 
> executor's updates. i could make it a CHECK for the master case though.

Yeah I know it's a bug in the if it happens :)

Also, if you leave as it without a guard or the same logic, what happens when 
we introduce the bug in the master that transitions to another terminal state? 
I'm fine with either the guard (adding a CHECK) or using the same logic.


> On Nov. 4, 2015, 2:43 a.m., Ben Mahler wrote:
> > src/master/master.cpp, lines 6004-6013
> > 
> >
> > Just looking at the patch that introduced this bug, why are we removing 
> > the out-of-order update prevention? Don't see any mention of this in the 
> > stuff around https://issues.apache.org/jira/browse/MESOS-2864.
> 
> Vinod Kone wrote:
> There are couple reasons for this
> 
> 1) it's hard to figure out if an update is an out of order update. for 
> example, if TASK_STARTING, TASK_RUNNING and TASK_FINISHED are all enqued on 
> the slave, the master might receive non-terminal (TASK_RUNNING) and terminal 
> (TASK_FINISHED) after it transitioned the task to terminal state 
> (TASK_STARTING update w/ latest state as TASK_FINISHED).
> 
> 2) when updateTask() is called, typically an update is being forwarded to 
> the scheduler. so while it might be ok to not transition the (latest) state 
> of the task, it is important to set the task.status_update_state and 
> task.status_update_uuid correctly. so we shouldn't short-circuit the function 
> without setting these.
> 
> makes sense?

Isn't the reason that the check was just wrong? Or did we break it when we 
started storing the latest state? It looks wrong to me here, since the 
task->state is the latest state: https://reviews.apache.org/r/38051/diff/6#0


- Ben


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/39873/#review105027
---


On Nov. 2, 2015, 10:34 p.m., Vinod Kone wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/39873/
> ---
> 
> (Updated Nov. 2, 2015, 10:34 p.m.)
> 
> 
> Review request for mesos and Ben Mahler.
> 
> 
> Bugs: MESOS-2864
> https://issues.apache.org/jira/browse/MESOS-2864
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> The master now doesn't change the latest state of a task if it has already 
> terminated. But it still updates the status update state and uuid.
> 
> 
> Diffs
> -
> 
>   src/master/master.cpp 2bc5a97a5b50c8a8a9902c47b2e9e3b5216d97ea 
> 
> Diff: https://reviews.apache.org/r/39873/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> Ran the "RecoverCompletedExecutor" test 100 times as it was failing most of 
> the time without this fix.
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>



Re: Review Request 39873: Fixed master to properly handle status updates when multiple of them are enqueued on the slave.

2015-11-04 Thread Vinod Kone


> On Nov. 4, 2015, 2:43 a.m., Ben Mahler wrote:
> > src/master/master.cpp, lines 6004-6013
> > 
> >
> > Just looking at the patch that introduced this bug, why are we removing 
> > the out-of-order update prevention? Don't see any mention of this in the 
> > stuff around https://issues.apache.org/jira/browse/MESOS-2864.

There are couple reasons for this

1) it's hard to figure out if an update is an out of order update. for example, 
if TASK_STARTING, TASK_RUNNING and TASK_FINISHED are all enqued on the slave, 
the master might receive non-terminal (TASK_RUNNING) and terminal 
(TASK_FINISHED) after it transitioned the task to terminal state (TASK_STARTING 
update w/ latest state as TASK_FINISHED).

2) when updateTask() is called, typically an update is being forwarded to the 
scheduler. so while it might be ok to not transition the (latest) state of the 
task, it is important to set the task.status_update_state and 
task.status_update_uuid correctly. so we shouldn't short-circuit the function 
without setting these.

makes sense?


> On Nov. 4, 2015, 2:43 a.m., Ben Mahler wrote:
> > src/master/master.cpp, line 6039
> > 
> >
> > Why not have the same logic here to be defensive? Or do you intend to 
> > guard against it? Just seems weird to allow it in one of the cases but not 
> > the other.

i didn't do it in the master's case because it would be a bug in the code if 
the master is trying to call updateTask() for an already terminated task! we 
have to do it for updates from slave because we don't control the executor's 
updates. i could make it a CHECK for the master case though.


- Vinod


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/39873/#review105027
---


On Nov. 2, 2015, 10:34 p.m., Vinod Kone wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/39873/
> ---
> 
> (Updated Nov. 2, 2015, 10:34 p.m.)
> 
> 
> Review request for mesos and Ben Mahler.
> 
> 
> Bugs: MESOS-2864
> https://issues.apache.org/jira/browse/MESOS-2864
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> The master now doesn't change the latest state of a task if it has already 
> terminated. But it still updates the status update state and uuid.
> 
> 
> Diffs
> -
> 
>   src/master/master.cpp 2bc5a97a5b50c8a8a9902c47b2e9e3b5216d97ea 
> 
> Diff: https://reviews.apache.org/r/39873/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> Ran the "RecoverCompletedExecutor" test 100 times as it was failing most of 
> the time without this fix.
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>



Re: Review Request 39873: Fixed master to properly handle status updates when multiple of them are enqueued on the slave.

2015-11-03 Thread Ben Mahler

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/39873/#review105027
---



src/master/master.cpp (lines 6004 - 6008)


Just looking at the patch that introduced this bug, why are we removing the 
out-of-order update prevention? Don't see any mention of this in the stuff 
around https://issues.apache.org/jira/browse/MESOS-2864.



src/master/master.cpp (line 6034)


Why not have the same logic here to be defensive? Or do you intend to guard 
against it? Just seems weird to allow it in one of the cases but not the other.


- Ben Mahler


On Nov. 2, 2015, 10:34 p.m., Vinod Kone wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/39873/
> ---
> 
> (Updated Nov. 2, 2015, 10:34 p.m.)
> 
> 
> Review request for mesos and Ben Mahler.
> 
> 
> Bugs: MESOS-2864
> https://issues.apache.org/jira/browse/MESOS-2864
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> The master now doesn't change the latest state of a task if it has already 
> terminated. But it still updates the status update state and uuid.
> 
> 
> Diffs
> -
> 
>   src/master/master.cpp 2bc5a97a5b50c8a8a9902c47b2e9e3b5216d97ea 
> 
> Diff: https://reviews.apache.org/r/39873/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> Ran the "RecoverCompletedExecutor" test 100 times as it was failing most of 
> the time without this fix.
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>



Re: Review Request 39873: Fixed master to properly handle status updates when multiple of them are enqueued on the slave.

2015-11-02 Thread Ben Mahler

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/39873/#review104779
---



src/master/master.cpp (lines 6024 - 6028)


The movement around this seems to be the key part of this patch, is it 
possible to isolate the fix in a patch, on top of a patch where the logging / 
0.21.0 bits are updated?


- Ben Mahler


On Nov. 2, 2015, 7:36 p.m., Vinod Kone wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/39873/
> ---
> 
> (Updated Nov. 2, 2015, 7:36 p.m.)
> 
> 
> Review request for mesos and Ben Mahler.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> The master now doesn't change the latest state of a task if it has already 
> terminated. But it still updates the status update state and uuid.
> 
> 
> Diffs
> -
> 
>   src/master/master.cpp 9f4586e668a2141f4937497d42853fbdea7751a5 
> 
> Diff: https://reviews.apache.org/r/39873/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> Ran the "RecoverCompletedExecutor" test 100 times as it was failing most of 
> the time without this fix.
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>



Review Request 39873: Fixed master to properly handle status updates when multiple of them are enqueued on the slave.

2015-11-02 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/39873/
---

Review request for mesos and Ben Mahler.


Repository: mesos


Description
---

The master now doesn't change the latest state of a task if it has already 
terminated. But it still updates the status update state and uuid.


Diffs
-

  src/master/master.cpp 9f4586e668a2141f4937497d42853fbdea7751a5 

Diff: https://reviews.apache.org/r/39873/diff/


Testing
---

make check

Ran the "RecoverCompletedExecutor" test 100 times as it was failing most of the 
time without this fix.


Thanks,

Vinod Kone



Re: Review Request 39873: Fixed master to properly handle status updates when multiple of them are enqueued on the slave.

2015-11-02 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/39873/
---

(Updated Nov. 2, 2015, 9:50 p.m.)


Review request for mesos and Ben Mahler.


Changes
---

split 0.21.0 slave related changes into its own dependent patch.


Repository: mesos


Description
---

The master now doesn't change the latest state of a task if it has already 
terminated. But it still updates the status update state and uuid.


Diffs (updated)
-

  src/master/master.cpp 2bc5a97a5b50c8a8a9902c47b2e9e3b5216d97ea 

Diff: https://reviews.apache.org/r/39873/diff/


Testing
---

make check

Ran the "RecoverCompletedExecutor" test 100 times as it was failing most of the 
time without this fix.


Thanks,

Vinod Kone



Re: Review Request 39873: Fixed master to properly handle status updates when multiple of them are enqueued on the slave.

2015-11-02 Thread Vinod Kone


> On Nov. 2, 2015, 8:13 p.m., Ben Mahler wrote:
> > src/master/master.cpp, lines 6032-6036
> > 
> >
> > The movement around this seems to be the key part of this patch, is it 
> > possible to isolate the fix in a patch, on top of a patch where the logging 
> > / 0.21.0 bits are updated?

done.


- Vinod


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/39873/#review104779
---


On Nov. 2, 2015, 7:36 p.m., Vinod Kone wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/39873/
> ---
> 
> (Updated Nov. 2, 2015, 7:36 p.m.)
> 
> 
> Review request for mesos and Ben Mahler.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> The master now doesn't change the latest state of a task if it has already 
> terminated. But it still updates the status update state and uuid.
> 
> 
> Diffs
> -
> 
>   src/master/master.cpp 9f4586e668a2141f4937497d42853fbdea7751a5 
> 
> Diff: https://reviews.apache.org/r/39873/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> Ran the "RecoverCompletedExecutor" test 100 times as it was failing most of 
> the time without this fix.
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>



Re: Review Request 39873: Fixed master to properly handle status updates when multiple of them are enqueued on the slave.

2015-11-02 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/39873/
---

(Updated Nov. 2, 2015, 10:34 p.m.)


Review request for mesos and Ben Mahler.


Changes
---

attached the bug.


Bugs: MESOS-2864
https://issues.apache.org/jira/browse/MESOS-2864


Repository: mesos


Description
---

The master now doesn't change the latest state of a task if it has already 
terminated. But it still updates the status update state and uuid.


Diffs
-

  src/master/master.cpp 2bc5a97a5b50c8a8a9902c47b2e9e3b5216d97ea 

Diff: https://reviews.apache.org/r/39873/diff/


Testing
---

make check

Ran the "RecoverCompletedExecutor" test 100 times as it was failing most of the 
time without this fix.


Thanks,

Vinod Kone



Re: Review Request 39873: Fixed master to properly handle status updates when multiple of them are enqueued on the slave.

2015-11-02 Thread Mesos ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/39873/#review104836
---


Patch looks great!

Reviews applied: [39791, 39792, 39827, 39878, 39873]

All tests passed.

- Mesos ReviewBot


On Nov. 2, 2015, 10:34 p.m., Vinod Kone wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/39873/
> ---
> 
> (Updated Nov. 2, 2015, 10:34 p.m.)
> 
> 
> Review request for mesos and Ben Mahler.
> 
> 
> Bugs: MESOS-2864
> https://issues.apache.org/jira/browse/MESOS-2864
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> The master now doesn't change the latest state of a task if it has already 
> terminated. But it still updates the status update state and uuid.
> 
> 
> Diffs
> -
> 
>   src/master/master.cpp 2bc5a97a5b50c8a8a9902c47b2e9e3b5216d97ea 
> 
> Diff: https://reviews.apache.org/r/39873/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> Ran the "RecoverCompletedExecutor" test 100 times as it was failing most of 
> the time without this fix.
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>