Re: Review Request 50723: Fixed the master to recover resources/update state for orphan tasks.

2016-08-04 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50723/#review144848
---


Ship it!




Ship It!

- Vinod Kone


On Aug. 4, 2016, 7:17 p.m., Anand Mazumdar wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50723/
> ---
> 
> (Updated Aug. 4, 2016, 7:17 p.m.)
> 
> 
> Review request for mesos, Adam B, Neil Conway, and Vinod Kone.
> 
> 
> Bugs: MESOS-5930
> https://issues.apache.org/jira/browse/MESOS-5930
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> The master's status handler function used to ignore the status updates
> from the agents for frameworks not yet re-connected with the master
> upon a failover. This change modifies that logic to still update
> the local state and not bail out early.
> 
> 
> Diffs
> -
> 
>   src/master/master.cpp 17d5c5875647bb35e2518cc2bd9e134b020c05bf 
>   src/tests/master_tests.cpp 6709818d599c068c289bcb714446018577082d8b 
> 
> Diff: https://reviews.apache.org/r/50723/diff/
> 
> 
> Testing
> ---
> 
> make check (gtest_repeat=100)
> 
> 
> Thanks,
> 
> Anand Mazumdar
> 
>



Re: Review Request 50723: Fixed the master to recover resources/update state for orphan tasks.

2016-08-04 Thread Mesos ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50723/#review144835
---



Patch looks great!

Reviews applied: [50815, 50723]

Passed command: export OS='ubuntu:14.04' BUILDTOOL='autotools' COMPILER='gcc' 
CONFIGURATION='--verbose' ENVIRONMENT='GLOG_v=1 MESOS_VERBOSE=1'; 
./support/docker_build.sh

- Mesos ReviewBot


On Aug. 4, 2016, 7:17 p.m., Anand Mazumdar wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50723/
> ---
> 
> (Updated Aug. 4, 2016, 7:17 p.m.)
> 
> 
> Review request for mesos, Adam B, Neil Conway, and Vinod Kone.
> 
> 
> Bugs: MESOS-5930
> https://issues.apache.org/jira/browse/MESOS-5930
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> The master's status handler function used to ignore the status updates
> from the agents for frameworks not yet re-connected with the master
> upon a failover. This change modifies that logic to still update
> the local state and not bail out early.
> 
> 
> Diffs
> -
> 
>   src/master/master.cpp 17d5c5875647bb35e2518cc2bd9e134b020c05bf 
>   src/tests/master_tests.cpp 6709818d599c068c289bcb714446018577082d8b 
> 
> Diff: https://reviews.apache.org/r/50723/diff/
> 
> 
> Testing
> ---
> 
> make check (gtest_repeat=100)
> 
> 
> Thanks,
> 
> Anand Mazumdar
> 
>



Re: Review Request 50723: Fixed the master to recover resources/update state for orphan tasks.

2016-08-04 Thread Anand Mazumdar

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50723/
---

(Updated Aug. 4, 2016, 7:17 p.m.)


Review request for mesos, Adam B, Neil Conway, and Vinod Kone.


Changes
---

Added logic for also verifying if the task state has been updated on master


Bugs: MESOS-5930
https://issues.apache.org/jira/browse/MESOS-5930


Repository: mesos


Description
---

The master's status handler function used to ignore the status updates
from the agents for frameworks not yet re-connected with the master
upon a failover. This change modifies that logic to still update
the local state and not bail out early.


Diffs (updated)
-

  src/master/master.cpp 17d5c5875647bb35e2518cc2bd9e134b020c05bf 
  src/tests/master_tests.cpp 6709818d599c068c289bcb714446018577082d8b 

Diff: https://reviews.apache.org/r/50723/diff/


Testing
---

make check (gtest_repeat=100)


Thanks,

Anand Mazumdar



Re: Review Request 50723: Fixed the master to recover resources/update state for orphan tasks.

2016-08-04 Thread Anand Mazumdar


> On Aug. 3, 2016, 6:12 p.m., Neil Conway wrote:
> > src/tests/master_tests.cpp, line 4665
> > 
> >
> > Should we also check that the task correctly transitions to a terminal 
> > state in the master?
> 
> Anand Mazumdar wrote:
> I had punted on this for now and just ensuring if the resources have been 
> correctly recovered. I left a TODO to address this for later.
> 
> Neil Conway wrote:
> The observed behavior in the bug report is about task statuses, not 
> resource allocation; those are fairly different things. I'd personally prefer 
> to see a test for task statuses be included as part of this change.

hmm, They might be fairly different things from the view point of the user but 
are closely aligned from the perspective of our code.

1. This is how the general flow of the code is in `updateTask()`:

```cpp
task.set_state(state);

if (terminated(state)) {
  allocator->recoverResources(...);
}
```

Hence, what the test currently does is a _strict_ superset by checking for 
recovered resources. Of course, the end user who is not familiar with the 
internals would only look for what they see on the UI via the `/state` call!

2. The only way I could think of to ensure that the task was _updated_ was to 
make another call to `/state` endpoint. That seemed un-necessary due to 1. and 
too verbose and hence I left a TODO.

But then, I thought that I would just go ahead and address the `TODO` now and 
found another bug in the `GET_TASKS` implementation in the master. CR: 
https://reviews.apache.org/r/50815 :(


- Anand


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50723/#review144633
---


On Aug. 3, 2016, 11:14 p.m., Anand Mazumdar wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50723/
> ---
> 
> (Updated Aug. 3, 2016, 11:14 p.m.)
> 
> 
> Review request for mesos, Adam B, Neil Conway, and Vinod Kone.
> 
> 
> Bugs: MESOS-5930
> https://issues.apache.org/jira/browse/MESOS-5930
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> The master's status handler function used to ignore the status updates
> from the agents for frameworks not yet re-connected with the master
> upon a failover. This change modifies that logic to still update
> the local state and not bail out early.
> 
> 
> Diffs
> -
> 
>   src/master/master.cpp 17d5c5875647bb35e2518cc2bd9e134b020c05bf 
>   src/tests/master_tests.cpp 6709818d599c068c289bcb714446018577082d8b 
> 
> Diff: https://reviews.apache.org/r/50723/diff/
> 
> 
> Testing
> ---
> 
> make check (gtest_repeat=100)
> 
> 
> Thanks,
> 
> Anand Mazumdar
> 
>



Re: Review Request 50723: Fixed the master to recover resources/update state for orphan tasks.

2016-08-04 Thread Mesos ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50723/#review144748
---



Patch looks great!

Reviews applied: [50723]

Passed command: export OS='ubuntu:14.04' BUILDTOOL='autotools' COMPILER='gcc' 
CONFIGURATION='--verbose' ENVIRONMENT='GLOG_v=1 MESOS_VERBOSE=1'; 
./support/docker_build.sh

- Mesos ReviewBot


On Aug. 3, 2016, 11:14 p.m., Anand Mazumdar wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50723/
> ---
> 
> (Updated Aug. 3, 2016, 11:14 p.m.)
> 
> 
> Review request for mesos, Adam B, Neil Conway, and Vinod Kone.
> 
> 
> Bugs: MESOS-5930
> https://issues.apache.org/jira/browse/MESOS-5930
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> The master's status handler function used to ignore the status updates
> from the agents for frameworks not yet re-connected with the master
> upon a failover. This change modifies that logic to still update
> the local state and not bail out early.
> 
> 
> Diffs
> -
> 
>   src/master/master.cpp 17d5c5875647bb35e2518cc2bd9e134b020c05bf 
>   src/tests/master_tests.cpp 6709818d599c068c289bcb714446018577082d8b 
> 
> Diff: https://reviews.apache.org/r/50723/diff/
> 
> 
> Testing
> ---
> 
> make check (gtest_repeat=100)
> 
> 
> Thanks,
> 
> Anand Mazumdar
> 
>



Re: Review Request 50723: Fixed the master to recover resources/update state for orphan tasks.

2016-08-04 Thread Neil Conway


> On Aug. 3, 2016, 6:12 p.m., Neil Conway wrote:
> > src/tests/master_tests.cpp, line 4665
> > 
> >
> > Should we also check that the task correctly transitions to a terminal 
> > state in the master?
> 
> Anand Mazumdar wrote:
> I had punted on this for now and just ensuring if the resources have been 
> correctly recovered. I left a TODO to address this for later.

The observed behavior in the bug report is about task statuses, not resource 
allocation; those are fairly different things. I'd personally prefer to see a 
test for task statuses be included as part of this change.


- Neil


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50723/#review144633
---


On Aug. 3, 2016, 11:14 p.m., Anand Mazumdar wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50723/
> ---
> 
> (Updated Aug. 3, 2016, 11:14 p.m.)
> 
> 
> Review request for mesos, Adam B, Neil Conway, and Vinod Kone.
> 
> 
> Bugs: MESOS-5930
> https://issues.apache.org/jira/browse/MESOS-5930
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> The master's status handler function used to ignore the status updates
> from the agents for frameworks not yet re-connected with the master
> upon a failover. This change modifies that logic to still update
> the local state and not bail out early.
> 
> 
> Diffs
> -
> 
>   src/master/master.cpp 17d5c5875647bb35e2518cc2bd9e134b020c05bf 
>   src/tests/master_tests.cpp 6709818d599c068c289bcb714446018577082d8b 
> 
> Diff: https://reviews.apache.org/r/50723/diff/
> 
> 
> Testing
> ---
> 
> make check (gtest_repeat=100)
> 
> 
> Thanks,
> 
> Anand Mazumdar
> 
>



Re: Review Request 50723: Fixed the master to recover resources/update state for orphan tasks.

2016-08-03 Thread Anand Mazumdar

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50723/
---

(Updated Aug. 3, 2016, 11:14 p.m.)


Review request for mesos, Adam B, Neil Conway, and Vinod Kone.


Changes
---

Review comments


Bugs: MESOS-5930
https://issues.apache.org/jira/browse/MESOS-5930


Repository: mesos


Description
---

The master's status handler function used to ignore the status updates
from the agents for frameworks not yet re-connected with the master
upon a failover. This change modifies that logic to still update
the local state and not bail out early.


Diffs (updated)
-

  src/master/master.cpp 17d5c5875647bb35e2518cc2bd9e134b020c05bf 
  src/tests/master_tests.cpp 6709818d599c068c289bcb714446018577082d8b 

Diff: https://reviews.apache.org/r/50723/diff/


Testing
---

make check (gtest_repeat=100)


Thanks,

Anand Mazumdar



Re: Review Request 50723: Fixed the master to recover resources/update state for orphan tasks.

2016-08-03 Thread Anand Mazumdar


> On Aug. 3, 2016, 7:58 p.m., Adam B wrote:
> > src/master/master.cpp, lines 5164-5166
> > 
> >
> > Maybe increment `metrics->invalid_status_updates` in the else case here?
> > And log here, as before?
> 
> Anand Mazumdar wrote:
> This was pointed out by Neil earlier too. Continuing the discussion here. 
> 
> To me, it looked like an oversight in the prior code to treat status 
> updates from agents for frameworks that have not yet re-registered upon a 
> master failover as invalid and increment the metric?
> 
> Adam B wrote:
> Ok, then maybe we need a new `metrics->dropped_status_updates`?

hmm, looks like in `acknowledge()` we increment the metric when the agent is 
not connected. I would go ahead and increment the metric here too for 
consistency.
https://github.com/apache/mesos/blob/master/src/master/master.cpp#L4258


- Anand


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50723/#review144662
---


On Aug. 3, 2016, 8:44 p.m., Anand Mazumdar wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50723/
> ---
> 
> (Updated Aug. 3, 2016, 8:44 p.m.)
> 
> 
> Review request for mesos, Adam B, Neil Conway, and Vinod Kone.
> 
> 
> Bugs: MESOS-5930
> https://issues.apache.org/jira/browse/MESOS-5930
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> The master's status handler function used to ignore the status updates
> from the agents for frameworks not yet re-connected with the master
> upon a failover. This change modifies that logic to still update
> the local state and not bail out early.
> 
> 
> Diffs
> -
> 
>   src/master/master.cpp 060dc7f9730808c7fd9b8f9ecdbde0aac14d135c 
>   src/tests/master_tests.cpp 6709818d599c068c289bcb714446018577082d8b 
> 
> Diff: https://reviews.apache.org/r/50723/diff/
> 
> 
> Testing
> ---
> 
> make check (gtest_repeat=100)
> 
> 
> Thanks,
> 
> Anand Mazumdar
> 
>



Re: Review Request 50723: Fixed the master to recover resources/update state for orphan tasks.

2016-08-03 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50723/#review144677
---


Fix it, then Ship it!





src/master/master.cpp (lines 5201 - 5203)


How about

// The task might not exist in master's memory (e.g., failed task 
validation).



src/master/master.cpp (lines 5206 - 5209)


I think this part of the comment is a bit confusing. I would just kill it. 
Keep the last sentence "Note that master..."



src/tests/master_tests.cpp (line 4795)


i.e.,



src/tests/master_tests.cpp (line 4824)


s/send a retry/retry/


- Vinod Kone


On Aug. 3, 2016, 8:44 p.m., Anand Mazumdar wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50723/
> ---
> 
> (Updated Aug. 3, 2016, 8:44 p.m.)
> 
> 
> Review request for mesos, Adam B, Neil Conway, and Vinod Kone.
> 
> 
> Bugs: MESOS-5930
> https://issues.apache.org/jira/browse/MESOS-5930
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> The master's status handler function used to ignore the status updates
> from the agents for frameworks not yet re-connected with the master
> upon a failover. This change modifies that logic to still update
> the local state and not bail out early.
> 
> 
> Diffs
> -
> 
>   src/master/master.cpp 060dc7f9730808c7fd9b8f9ecdbde0aac14d135c 
>   src/tests/master_tests.cpp 6709818d599c068c289bcb714446018577082d8b 
> 
> Diff: https://reviews.apache.org/r/50723/diff/
> 
> 
> Testing
> ---
> 
> make check (gtest_repeat=100)
> 
> 
> Thanks,
> 
> Anand Mazumdar
> 
>



Re: Review Request 50723: Fixed the master to recover resources/update state for orphan tasks.

2016-08-03 Thread Adam B


> On Aug. 3, 2016, 12:58 p.m., Adam B wrote:
> > src/master/master.cpp, lines 5164-5166
> > 
> >
> > Maybe increment `metrics->invalid_status_updates` in the else case here?
> > And log here, as before?
> 
> Anand Mazumdar wrote:
> This was pointed out by Neil earlier too. Continuing the discussion here. 
> 
> To me, it looked like an oversight in the prior code to treat status 
> updates from agents for frameworks that have not yet re-registered upon a 
> master failover as invalid and increment the metric?

Ok, then maybe we need a new `metrics->dropped_status_updates`?


- Adam


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50723/#review144662
---


On Aug. 3, 2016, 1:44 p.m., Anand Mazumdar wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50723/
> ---
> 
> (Updated Aug. 3, 2016, 1:44 p.m.)
> 
> 
> Review request for mesos, Adam B, Neil Conway, and Vinod Kone.
> 
> 
> Bugs: MESOS-5930
> https://issues.apache.org/jira/browse/MESOS-5930
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> The master's status handler function used to ignore the status updates
> from the agents for frameworks not yet re-connected with the master
> upon a failover. This change modifies that logic to still update
> the local state and not bail out early.
> 
> 
> Diffs
> -
> 
>   src/master/master.cpp 060dc7f9730808c7fd9b8f9ecdbde0aac14d135c 
>   src/tests/master_tests.cpp 6709818d599c068c289bcb714446018577082d8b 
> 
> Diff: https://reviews.apache.org/r/50723/diff/
> 
> 
> Testing
> ---
> 
> make check (gtest_repeat=100)
> 
> 
> Thanks,
> 
> Anand Mazumdar
> 
>



Re: Review Request 50723: Fixed the master to recover resources/update state for orphan tasks.

2016-08-03 Thread Anand Mazumdar


> On Aug. 3, 2016, 7:58 p.m., Adam B wrote:
> > src/master/master.cpp, lines 5164-5166
> > 
> >
> > Maybe increment `metrics->invalid_status_updates` in the else case here?
> > And log here, as before?

This was pointed out by Neil earlier too. Continuing the discussion here. 

To me, it looked like an oversight in the prior code to treat status updates 
from agents for frameworks that have not yet re-registered upon a master 
failover as invalid and increment the metric?


- Anand


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50723/#review144662
---


On Aug. 3, 2016, 8:44 p.m., Anand Mazumdar wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50723/
> ---
> 
> (Updated Aug. 3, 2016, 8:44 p.m.)
> 
> 
> Review request for mesos and Vinod Kone.
> 
> 
> Bugs: MESOS-5930
> https://issues.apache.org/jira/browse/MESOS-5930
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> The master's status handler function used to ignore the status updates
> from the agents for frameworks not yet re-connected with the master
> upon a failover. This change modifies that logic to still update
> the local state and not bail out early.
> 
> 
> Diffs
> -
> 
>   src/master/master.cpp 060dc7f9730808c7fd9b8f9ecdbde0aac14d135c 
>   src/tests/master_tests.cpp 6709818d599c068c289bcb714446018577082d8b 
> 
> Diff: https://reviews.apache.org/r/50723/diff/
> 
> 
> Testing
> ---
> 
> make check (gtest_repeat=100)
> 
> 
> Thanks,
> 
> Anand Mazumdar
> 
>



Re: Review Request 50723: Fixed the master to recover resources/update state for orphan tasks.

2016-08-03 Thread Anand Mazumdar

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50723/
---

(Updated Aug. 3, 2016, 8:44 p.m.)


Review request for mesos and Vinod Kone.


Changes
---

Review comments from Neil and Adam.

Also, deferred the update of task status update in `forward()` instead of 
`updateTask()`. See comments for more details on why this is needed.


Bugs: MESOS-5930
https://issues.apache.org/jira/browse/MESOS-5930


Repository: mesos


Description
---

The master's status handler function used to ignore the status updates
from the agents for frameworks not yet re-connected with the master
upon a failover. This change modifies that logic to still update
the local state and not bail out early.


Diffs (updated)
-

  src/master/master.cpp 060dc7f9730808c7fd9b8f9ecdbde0aac14d135c 
  src/tests/master_tests.cpp 6709818d599c068c289bcb714446018577082d8b 

Diff: https://reviews.apache.org/r/50723/diff/


Testing
---

make check (gtest_repeat=100)


Thanks,

Anand Mazumdar



Re: Review Request 50723: Fixed the master to recover resources/update state for orphan tasks.

2016-08-03 Thread Anand Mazumdar


> On Aug. 3, 2016, 6:12 p.m., Neil Conway wrote:
> > src/master/master.cpp, line 5142
> > 
> >
> > Seems like we should still be logging when we get a status update from 
> > an unknown framework.

+1, my bad.


> On Aug. 3, 2016, 6:12 p.m., Neil Conway wrote:
> > src/tests/master_tests.cpp, line 4665
> > 
> >
> > Should we also check that the task correctly transitions to a terminal 
> > state in the master?

I had punted on this for now and just ensuring if the resources have been 
correctly recovered. I left a TODO to address this for later.


> On Aug. 3, 2016, 6:12 p.m., Neil Conway wrote:
> > src/master/master.cpp, line 5163
> > 
> >
> > `getFramework()` might return non-`nullptr` for a disconnected (but 
> > still registered) framework, so I think you should rephrase this.

I had copied this comment over from similar pre-existing comments in this file. 
Would file a follow up JIRA issue to revisit some of them.


> On Aug. 3, 2016, 6:12 p.m., Neil Conway wrote:
> > src/master/master.cpp, line 5185
> > 
> >
> > Should we call such a status update "valid"? We didn't before, now we 
> > will. Not sure new behavior is wrong but seems worth discussing.

+1, Continuing this discussion on a follow up review comment from Adam.


- Anand


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50723/#review144633
---


On Aug. 3, 2016, 8:44 p.m., Anand Mazumdar wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50723/
> ---
> 
> (Updated Aug. 3, 2016, 8:44 p.m.)
> 
> 
> Review request for mesos and Vinod Kone.
> 
> 
> Bugs: MESOS-5930
> https://issues.apache.org/jira/browse/MESOS-5930
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> The master's status handler function used to ignore the status updates
> from the agents for frameworks not yet re-connected with the master
> upon a failover. This change modifies that logic to still update
> the local state and not bail out early.
> 
> 
> Diffs
> -
> 
>   src/master/master.cpp 060dc7f9730808c7fd9b8f9ecdbde0aac14d135c 
>   src/tests/master_tests.cpp 6709818d599c068c289bcb714446018577082d8b 
> 
> Diff: https://reviews.apache.org/r/50723/diff/
> 
> 
> Testing
> ---
> 
> make check (gtest_repeat=100)
> 
> 
> Thanks,
> 
> Anand Mazumdar
> 
>



Re: Review Request 50723: Fixed the master to recover resources/update state for orphan tasks.

2016-08-03 Thread Adam B

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50723/#review144662
---




src/master/master.cpp (lines 5154 - 5156)


Maybe increment `metrics->invalid_status_updates` in the else case here?
And log here, as before?


- Adam B


On Aug. 2, 2016, 2:45 p.m., Anand Mazumdar wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50723/
> ---
> 
> (Updated Aug. 2, 2016, 2:45 p.m.)
> 
> 
> Review request for mesos and Vinod Kone.
> 
> 
> Bugs: MESOS-5930
> https://issues.apache.org/jira/browse/MESOS-5930
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> The master's status handler function used to ignore the status updates
> from the agents for frameworks not yet re-connected with the master
> upon a failover. This change modifies that logic to still update
> the local state and not bail out early.
> 
> 
> Diffs
> -
> 
>   src/master/master.cpp 060dc7f9730808c7fd9b8f9ecdbde0aac14d135c 
>   src/tests/master_tests.cpp 6709818d599c068c289bcb714446018577082d8b 
> 
> Diff: https://reviews.apache.org/r/50723/diff/
> 
> 
> Testing
> ---
> 
> make check (gtest_repeat=100)
> 
> 
> Thanks,
> 
> Anand Mazumdar
> 
>



Re: Review Request 50723: Fixed the master to recover resources/update state for orphan tasks.

2016-08-03 Thread Neil Conway

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50723/#review144633
---




src/master/master.cpp 


Seems like we should still be logging when we get a status update from an 
unknown framework.



src/master/master.cpp (line 5153)


`getFramework()` might return non-`nullptr` for a disconnected (but still 
registered) framework, so I think you should rephrase this.



src/master/master.cpp (line 5175)


Should we call such a status update "valid"? We didn't before, now we will. 
Not sure new behavior is wrong but seems worth discussing.



src/tests/master_tests.cpp (line 4665)


Should we also check that the task correctly transitions to a terminal 
state in the master?


- Neil Conway


On Aug. 2, 2016, 9:45 p.m., Anand Mazumdar wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50723/
> ---
> 
> (Updated Aug. 2, 2016, 9:45 p.m.)
> 
> 
> Review request for mesos and Vinod Kone.
> 
> 
> Bugs: MESOS-5930
> https://issues.apache.org/jira/browse/MESOS-5930
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> The master's status handler function used to ignore the status updates
> from the agents for frameworks not yet re-connected with the master
> upon a failover. This change modifies that logic to still update
> the local state and not bail out early.
> 
> 
> Diffs
> -
> 
>   src/master/master.cpp 060dc7f9730808c7fd9b8f9ecdbde0aac14d135c 
>   src/tests/master_tests.cpp 6709818d599c068c289bcb714446018577082d8b 
> 
> Diff: https://reviews.apache.org/r/50723/diff/
> 
> 
> Testing
> ---
> 
> make check (gtest_repeat=100)
> 
> 
> Thanks,
> 
> Anand Mazumdar
> 
>



Re: Review Request 50723: Fixed the master to recover resources/update state for orphan tasks.

2016-08-03 Thread Mesos ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50723/#review144591
---



Patch looks great!

Reviews applied: [50723]

Passed command: export OS='ubuntu:14.04' BUILDTOOL='autotools' COMPILER='gcc' 
CONFIGURATION='--verbose' ENVIRONMENT='GLOG_v=1 MESOS_VERBOSE=1'; 
./support/docker_build.sh

- Mesos ReviewBot


On Aug. 2, 2016, 9:45 p.m., Anand Mazumdar wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50723/
> ---
> 
> (Updated Aug. 2, 2016, 9:45 p.m.)
> 
> 
> Review request for mesos and Vinod Kone.
> 
> 
> Bugs: MESOS-5930
> https://issues.apache.org/jira/browse/MESOS-5930
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> The master's status handler function used to ignore the status updates
> from the agents for frameworks not yet re-connected with the master
> upon a failover. This change modifies that logic to still update
> the local state and not bail out early.
> 
> 
> Diffs
> -
> 
>   src/master/master.cpp 060dc7f9730808c7fd9b8f9ecdbde0aac14d135c 
>   src/tests/master_tests.cpp 6709818d599c068c289bcb714446018577082d8b 
> 
> Diff: https://reviews.apache.org/r/50723/diff/
> 
> 
> Testing
> ---
> 
> make check (gtest_repeat=100)
> 
> 
> Thanks,
> 
> Anand Mazumdar
> 
>



Review Request 50723: Fixed the master to recover resources/update state for orphan tasks.

2016-08-02 Thread Anand Mazumdar

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50723/
---

Review request for mesos and Vinod Kone.


Bugs: MESOS-5930
https://issues.apache.org/jira/browse/MESOS-5930


Repository: mesos


Description
---

The master's status handler function used to ignore the status updates
from the agents for frameworks not yet re-connected with the master
upon a failover. This change modifies that logic to still update
the local state and not bail out early.


Diffs
-

  src/master/master.cpp 060dc7f9730808c7fd9b8f9ecdbde0aac14d135c 
  src/tests/master_tests.cpp 6709818d599c068c289bcb714446018577082d8b 

Diff: https://reviews.apache.org/r/50723/diff/


Testing
---

make check (gtest_repeat=100)


Thanks,

Anand Mazumdar