Re: Review Request 47082: LostSlaveMessage should be sent to affected frameworks only.

2017-03-01 Thread Joris Van Remoortere

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/47082/#review167322
---



Closing this review due to inactivity. Please see our 
[guidelines](https://github.com/apache/mesos/blob/master/docs/reopening-reviews.md)
 for reopening reviews.

- Joris Van Remoortere


On June 9, 2016, 1:08 a.m., Anindya Sinha wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/47082/
> ---
> 
> (Updated June 9, 2016, 1:08 a.m.)
> 
> 
> Review request for mesos and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-5143
> https://issues.apache.org/jira/browse/MESOS-5143
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> When a slave is removed, master sends a LostSlaveMessage to affected
> frameworks only (instead of all registered frameworks). An affected
> framework is a framework which satisfied one or more conditions of
> the following:
> 
> 1. There are tasks on this slave belonging to the framework.
> 2. There are pending tasks on this slave belonging to the framework.
> 3. Reserved resources on the slave have a matching role with the
>role of the framework.
> 4. There are pending offers or pending inverse offers from this slave
>for the framework.
> 
> 
> Diffs
> -
> 
>   src/master/master.hpp 790da3ce686401c378ad9c62d497d60893c4ce41 
>   src/master/master.cpp 21ec70491ce6b79be57ff8db51d4b2fa682b32ce 
>   src/tests/master_tests.cpp 34be015aa314a7574e9065efb7b1bb8e1570c5b7 
> 
> 
> Diff: https://reviews.apache.org/r/47082/diff/3/
> 
> 
> Testing
> ---
> 
> All existing and modified tests passed.
> 
> 
> Thanks,
> 
> Anindya Sinha
> 
>



Re: Review Request 47082: LostSlaveMessage should be sent to affected frameworks only.

2016-06-10 Thread Jiang Yan Xu


> On June 10, 2016, 5:47 a.m., Neil Conway wrote:
> > Something else to consider here: this change means that frameworks will 
> > miss a `slaveLost()` signal that they care about in some circumstances. For 
> > example, suppose an agent has a persistent volume on an agent; the master 
> > fails over, and the agent fails to reregister with the master. We'll remove 
> > the agent, but we _won't_ send `SlaveLostMessage` to the framework in this 
> > case, because the master doesn't know the framework has a persistent volume 
> > on the agent.
> > 
> > Since `slaveLost()` is unreliable to begin with, I don't think this is a 
> > show-stopper, but it's a bit unfortunate...

Yeah in this case we'll have to fall back to broadcasting, also a bit 
unfortunate but there's not a better way IMO.


- Jiang Yan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/47082/#review136992
---


On June 8, 2016, 6:08 p.m., Anindya Sinha wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/47082/
> ---
> 
> (Updated June 8, 2016, 6:08 p.m.)
> 
> 
> Review request for mesos and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-5143
> https://issues.apache.org/jira/browse/MESOS-5143
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> When a slave is removed, master sends a LostSlaveMessage to affected
> frameworks only (instead of all registered frameworks). An affected
> framework is a framework which satisfied one or more conditions of
> the following:
> 
> 1. There are tasks on this slave belonging to the framework.
> 2. There are pending tasks on this slave belonging to the framework.
> 3. Reserved resources on the slave have a matching role with the
>role of the framework.
> 4. There are pending offers or pending inverse offers from this slave
>for the framework.
> 
> 
> Diffs
> -
> 
>   src/master/master.hpp 790da3ce686401c378ad9c62d497d60893c4ce41 
>   src/master/master.cpp 21ec70491ce6b79be57ff8db51d4b2fa682b32ce 
>   src/tests/master_tests.cpp 34be015aa314a7574e9065efb7b1bb8e1570c5b7 
> 
> Diff: https://reviews.apache.org/r/47082/diff/
> 
> 
> Testing
> ---
> 
> All existing and modified tests passed.
> 
> 
> Thanks,
> 
> Anindya Sinha
> 
>



Re: Review Request 47082: LostSlaveMessage should be sent to affected frameworks only.

2016-06-10 Thread Neil Conway

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/47082/#review136992
---



Something else to consider here: this change means that frameworks will miss a 
`slaveLost()` signal that they care about in some circumstances. For example, 
suppose an agent has a persistent volume on an agent; the master fails over, 
and the agent fails to reregister with the master. We'll remove the agent, but 
we _won't_ send `SlaveLostMessage` to the framework in this case, because the 
master doesn't know the framework has a persistent volume on the agent.

Since `slaveLost()` is unreliable to begin with, I don't think this is a 
show-stopper, but it's a bit unfortunate...

- Neil Conway


On June 9, 2016, 1:08 a.m., Anindya Sinha wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/47082/
> ---
> 
> (Updated June 9, 2016, 1:08 a.m.)
> 
> 
> Review request for mesos and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-5143
> https://issues.apache.org/jira/browse/MESOS-5143
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> When a slave is removed, master sends a LostSlaveMessage to affected
> frameworks only (instead of all registered frameworks). An affected
> framework is a framework which satisfied one or more conditions of
> the following:
> 
> 1. There are tasks on this slave belonging to the framework.
> 2. There are pending tasks on this slave belonging to the framework.
> 3. Reserved resources on the slave have a matching role with the
>role of the framework.
> 4. There are pending offers or pending inverse offers from this slave
>for the framework.
> 
> 
> Diffs
> -
> 
>   src/master/master.hpp 790da3ce686401c378ad9c62d497d60893c4ce41 
>   src/master/master.cpp 21ec70491ce6b79be57ff8db51d4b2fa682b32ce 
>   src/tests/master_tests.cpp 34be015aa314a7574e9065efb7b1bb8e1570c5b7 
> 
> Diff: https://reviews.apache.org/r/47082/diff/
> 
> 
> Testing
> ---
> 
> All existing and modified tests passed.
> 
> 
> Thanks,
> 
> Anindya Sinha
> 
>



Re: Review Request 47082: LostSlaveMessage should be sent to affected frameworks only.

2016-06-09 Thread Neil Conway

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/47082/#review136774
---




src/master/master.cpp (line 6858)


`nullptr`



src/tests/master_tests.cpp (line 1807)


I wonder if `Times(0)` would be equivalent and more concise here?


Can we have a new test case (maybe more than one?) to cover some of the corner 
cases here -- e.g., that we expect to see `LostSlaveMessage` when a framework 
isn't running any tasks on an agent but has a pending offer or reserved 
resource for that agent.

We should also update the docs to describe this behavior.

- Neil Conway


On June 9, 2016, 1:08 a.m., Anindya Sinha wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/47082/
> ---
> 
> (Updated June 9, 2016, 1:08 a.m.)
> 
> 
> Review request for mesos and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-5143
> https://issues.apache.org/jira/browse/MESOS-5143
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> When a slave is removed, master sends a LostSlaveMessage to affected
> frameworks only (instead of all registered frameworks). An affected
> framework is a framework which satisfied one or more conditions of
> the following:
> 
> 1. There are tasks on this slave belonging to the framework.
> 2. There are pending tasks on this slave belonging to the framework.
> 3. Reserved resources on the slave have a matching role with the
>role of the framework.
> 4. There are pending offers or pending inverse offers from this slave
>for the framework.
> 
> 
> Diffs
> -
> 
>   src/master/master.hpp 790da3ce686401c378ad9c62d497d60893c4ce41 
>   src/master/master.cpp 21ec70491ce6b79be57ff8db51d4b2fa682b32ce 
>   src/tests/master_tests.cpp 34be015aa314a7574e9065efb7b1bb8e1570c5b7 
> 
> Diff: https://reviews.apache.org/r/47082/diff/
> 
> 
> Testing
> ---
> 
> All existing and modified tests passed.
> 
> 
> Thanks,
> 
> Anindya Sinha
> 
>



Re: Review Request 47082: LostSlaveMessage should be sent to affected frameworks only.

2016-06-08 Thread Mesos ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/47082/#review136759
---



Patch looks great!

Reviews applied: [48453, 47082]

Passed command: export OS='ubuntu:14.04' BUILDTOOL='autotools' COMPILER='gcc' 
CONFIGURATION='--verbose' ENVIRONMENT='GLOG_v=1 MESOS_VERBOSE=1'; 
./support/docker_build.sh

- Mesos ReviewBot


On June 9, 2016, 1:08 a.m., Anindya Sinha wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/47082/
> ---
> 
> (Updated June 9, 2016, 1:08 a.m.)
> 
> 
> Review request for mesos and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-5143
> https://issues.apache.org/jira/browse/MESOS-5143
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> When a slave is removed, master sends a LostSlaveMessage to affected
> frameworks only (instead of all registered frameworks). An affected
> framework is a framework which satisfied one or more conditions of
> the following:
> 
> 1. There are tasks on this slave belonging to the framework.
> 2. There are pending tasks on this slave belonging to the framework.
> 3. Reserved resources on the slave have a matching role with the
>role of the framework.
> 4. There are pending offers or pending inverse offers from this slave
>for the framework.
> 
> 
> Diffs
> -
> 
>   src/master/master.hpp 790da3ce686401c378ad9c62d497d60893c4ce41 
>   src/master/master.cpp 21ec70491ce6b79be57ff8db51d4b2fa682b32ce 
>   src/tests/master_tests.cpp 34be015aa314a7574e9065efb7b1bb8e1570c5b7 
> 
> Diff: https://reviews.apache.org/r/47082/diff/
> 
> 
> Testing
> ---
> 
> All existing and modified tests passed.
> 
> 
> Thanks,
> 
> Anindya Sinha
> 
>



Re: Review Request 47082: LostSlaveMessage should be sent to affected frameworks only.

2016-06-08 Thread Anindya Sinha

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/47082/
---

(Updated June 9, 2016, 1:08 a.m.)


Review request for mesos and Jiang Yan Xu.


Bugs: MESOS-5143
https://issues.apache.org/jira/browse/MESOS-5143


Repository: mesos


Description (updated)
---

When a slave is removed, master sends a LostSlaveMessage to affected
frameworks only (instead of all registered frameworks). An affected
framework is a framework which satisfied one or more conditions of
the following:

1. There are tasks on this slave belonging to the framework.
2. There are pending tasks on this slave belonging to the framework.
3. Reserved resources on the slave have a matching role with the
   role of the framework.
4. There are pending offers or pending inverse offers from this slave
   for the framework.


Diffs (updated)
-

  src/master/master.hpp 790da3ce686401c378ad9c62d497d60893c4ce41 
  src/master/master.cpp 21ec70491ce6b79be57ff8db51d4b2fa682b32ce 
  src/tests/master_tests.cpp 34be015aa314a7574e9065efb7b1bb8e1570c5b7 

Diff: https://reviews.apache.org/r/47082/diff/


Testing
---

All existing and modified tests passed.


Thanks,

Anindya Sinha



Re: Review Request 47082: LostSlaveMessage should be sent to affected frameworks only.

2016-06-08 Thread Anindya Sinha


> On June 6, 2016, 5:43 a.m., Jiang Yan Xu wrote:
> > src/tests/master_tests.cpp, line 1760
> > 
> >
> > So the following two tests actually caught something we didn't 
> > anticipate, so instead of "fixing" the tests, we should fix our code:
> > 
> > The scheduler driver remembers the slave pids for framework messages 
> > and old entries are removed when it receives `LostSlaveMessage`s. The way 
> > we are doing it right now can cause them to be nevered removed because they 
> > don't receive the `LostSlaveMessage`s!
> > 
> > Note that there are two cases:
> > 
> > 1. If the master fails over, it doesn't know anything about the lost 
> > agent so it doesn't know whehter it should send `LostSlaveMessage`s but it 
> > perahps should.
> > 2. If the master doesn't fail over and a framework's tasks have all 
> > completed before the agent goes lost, it doesn't send `LostSlaveMessage` to 
> > it but then the save pid is not erased on the driver.
> > 
> > Let's chat about this further.

Fixed case 2. See https://reviews.apache.org/r/48453/.
UPIDs are tracked for each executor by adjusting that entry's life to be from 
the time the executor is launched till the time the executor terminates. As a 
result, not sending LostSlaveMessage to frameworks who have no tasks or 
reservations on that agent would not interfere with cleaning up of slave UPIDs.

Regarding case 1: When the master fails over, we send a LostSlaveMessage if it 
has a task or reservation or pending offer to the framework, and not otherwise. 
I think that should be fine. Let us chat about this case a bit more.


- Anindya


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/47082/#review135384
---


On May 26, 2016, 11:56 p.m., Anindya Sinha wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/47082/
> ---
> 
> (Updated May 26, 2016, 11:56 p.m.)
> 
> 
> Review request for mesos and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-5143
> https://issues.apache.org/jira/browse/MESOS-5143
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> When a slave is removed, master sends a LostSlaveMessage to affected
> frameworks only (instead of all registered frameworks). An affected
> framework is a framework which satisfied one or more conditions of
> the following:
> 
> 1. There are running tasks on this slave belonging to the framework.
> 2. There are pending tasks on this slave belonging to the framework.
> 3. Reserved resources on the slave have a matching role with the
>role of the framework.
> 4. There are pending offers or pending inverse offers from this slave
>for the framework.
> 
> 
> Diffs
> -
> 
>   src/master/master.hpp 1a875c32eddfb6d884e3d0dda7f5716ee53966c3 
>   src/master/master.cpp 0005a29caabcc6a3776037cf86a2b12660e6377b 
>   src/tests/master_tests.cpp 34be015aa314a7574e9065efb7b1bb8e1570c5b7 
> 
> Diff: https://reviews.apache.org/r/47082/diff/
> 
> 
> Testing
> ---
> 
> All existing and modified tests passed.
> 
> 
> Thanks,
> 
> Anindya Sinha
> 
>



Re: Review Request 47082: LostSlaveMessage should be sent to affected frameworks only.

2016-06-05 Thread Jiang Yan Xu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/47082/#review135384
---



See the comment I had on the tests: I think we should revisit the semantics for 
this message (when it should be sent).

A few more points:
1. Is there a test that covers "the basic flow": `slaveLost()` is being called 
because it does have reserevd resources or tasks on the agent (in the same test 
we can have a idle framework which doesn't get notified)?
2. Since this changes Mesos behavior observable by the frameworks, we should 
update documentation for scheduler API `slaveLost()` to reflect this.
3. For the reason in 2), we should send an email to the dev and user list about 
this change before we commit it.


src/master/master.hpp (lines 321 - 323)


1. Would it make sense to use `multihashmap` to avoid the manual cleanups 
of empty `hashset` etc.?
2. Just so it's clearer to the readers let's comment on why we need to 
track them here.

Perhaps

```
  // Tasks that have not yet been launched because they are currently
  // being authorized. Similar to 'pendingTasks' in Framwork but we 
  // track them separately here to determine which frameworks are
  // affected when an agent is removed.
```



src/master/master.cpp (line 3455)


The **what** in this comment feels unnecessary as the code is 
self-explanatory. If we add some comment here perhaps comment on **why**.

Maybe: 

```
// Add to the slave's list of pending tasks. This is
// necessary to track frameworks who own any pending
// task affected by a lost agent.
```



src/master/master.cpp (lines 3456 - 3459)


If we are using `multihashmap` this could be:

```
if (!slave->pendingTasks.contains(framework->id(), task.task_id()) {
  slave->pendingTasks.put(framework->id(), task.task_id());
}
```



src/master/master.cpp (lines 3838 - 3841)


We can keep the original comment which can cover both. No need to comment 
separately since the statements here are pretty short and easy to understand.



src/master/master.cpp (line 4119)


Let's use `slave != nullptr` for consistentcy with the rest of the file. 
(Note that MESOS-3243 was recently committed)



src/master/master.cpp (line 6577)


s/if/if there are/

So the bullet points are more consistently constructed.



src/master/master.cpp (line 6578)


```
1. Reserved resources on this slave that belong to the framework's role; or
```

Note that there are also statically reserved resources that are on this 
slave which aren't checkpointed.



src/master/master.cpp (line 6579)


```
2. Tasks on this slave launched by the framework; or
```

We should avoid using `running` to avoid confusing because it may not be 
`RUNNING` yet on the slave. We can just call them either tasks or pending tasks.



src/master/master.cpp (line 6580)


nit: s/on/for/

(as in destined for) because it's not added to/launched onto the slave yet.



src/master/master.cpp (line 6581)






src/master/master.cpp (line 6590)


We should look through `slave->totalResources.reserved()` which includes 
statically reserved resources.



src/master/master.cpp (lines 6591 - 6593)


Per the comment above, this is not needed now.



src/master/master.cpp (line 6627)


s/there are running/it has/



src/master/master.cpp (line 6636)


No need to copy, you are not modifying `slave->pendingTasks` while 
iterating.

Also align it like this

```
foreachkey (const FrameworkID& frameworkId,
slave->pendingTasks) {
}
```

if it exceeded 80 chars (which looks like it didn't so it should be put on 
the same line).



src/master/master.cpp (line 6639)


It's naturally taken care of when `Slave` is destructed right?

Plus always use an empty line between `}` and the next statement.



src/master/master.cpp (line 6651)


s/lost slave/the lost slave/.




Re: Review Request 47082: LostSlaveMessage should be sent to affected frameworks only.

2016-05-26 Thread Mesos ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/47082/#review135132
---



Patch looks great!

Reviews applied: [47082]

Passed command: export OS='ubuntu:14.04' BUILDTOOL='autotools' COMPILER='gcc' 
CONFIGURATION='--verbose' ENVIRONMENT='GLOG_v=1 MESOS_VERBOSE=1'; 
./support/docker_build.sh

- Mesos ReviewBot


On May 26, 2016, 11:56 p.m., Anindya Sinha wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/47082/
> ---
> 
> (Updated May 26, 2016, 11:56 p.m.)
> 
> 
> Review request for mesos and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-5143
> https://issues.apache.org/jira/browse/MESOS-5143
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> When a slave is removed, master sends a LostSlaveMessage to affected
> frameworks only (instead of all registered frameworks). An affected
> framework is a framework which satisfied one or more conditions of
> the following:
> 
> 1. There are running tasks on this slave belonging to the framework.
> 2. There are pending tasks on this slave belonging to the framework.
> 3. Reserved resources on the slave have a matching role with the
>role of the framework.
> 4. There are pending offers or pending inverse offers from this slave
>for the framework.
> 
> 
> Diffs
> -
> 
>   src/master/master.hpp 1a875c32eddfb6d884e3d0dda7f5716ee53966c3 
>   src/master/master.cpp 0005a29caabcc6a3776037cf86a2b12660e6377b 
>   src/tests/master_tests.cpp 34be015aa314a7574e9065efb7b1bb8e1570c5b7 
> 
> Diff: https://reviews.apache.org/r/47082/diff/
> 
> 
> Testing
> ---
> 
> All existing and modified tests passed.
> 
> 
> Thanks,
> 
> Anindya Sinha
> 
>



Re: Review Request 47082: LostSlaveMessage should be sent to affected frameworks only.

2016-05-26 Thread Anindya Sinha


> On May 25, 2016, 6:27 p.m., Jiang Yan Xu wrote:
> > src/master/master.cpp, line 6551
> > 
> >
> > I don't feel that this boolean is necessary, we can easily check if a 
> > framework is added by checking if it's in the set (O(1) if hashset), right?

For a std::set, we needed this bool. I think we can avoid this if we use a 
hashset.


> On May 25, 2016, 6:27 p.m., Jiang Yan Xu wrote:
> > src/master/master.cpp, lines 6576-6579
> > 
> >
> > From `slave->checkpointedResource.reservations()` we can get the list 
> > of roles and from `activeRoles` we can get the list of frameworks for each 
> > role.

Moved this to use activeRoles.


> On May 25, 2016, 6:27 p.m., Jiang Yan Xu wrote:
> > src/tests/master_authorization_tests.cpp, lines 338-339
> > 
> >
> > If we don't expect it to be call, we should do
> > 
> > ```
> > EXPECT_CALL(sched, slaveLost(, _))
> >   .Times(0);
> > ```
> > 
> > Here and elsewhere.
> > 
> > But here I think we should fix this to have the framework receive the 
> > slave lost messsage. See the comment below.

Reinstated the original version, since we expect SlaveLostMessage to be sent 
for pending tasks.


> On May 25, 2016, 6:27 p.m., Jiang Yan Xu wrote:
> > src/tests/master_authorization_tests.cpp, lines 346-348
> > 
> >
> > So if the task is stuck in the `pendingTasks` a slave lost message is 
> > not sent but later a TASK_LOST is sent with reason 
> > `REASON_SLAVE_REMOVED`... We should handle this case the same way as the 
> > other cases where we do send slave lost IMO.
> > 
> > We probably need to add a `pendingTasks` hashmap in the slave as well 
> > and check against that map during `removeSlave()`, thoughts?

Good catch. Yes, added a slave->pendingTasks and use this to send out 
SlaveLostMessage to frameworks with pending tasks.


- Anindya


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/47082/#review134535
---


On May 26, 2016, 11:56 p.m., Anindya Sinha wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/47082/
> ---
> 
> (Updated May 26, 2016, 11:56 p.m.)
> 
> 
> Review request for mesos and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-5143
> https://issues.apache.org/jira/browse/MESOS-5143
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> When a slave is removed, master sends a LostSlaveMessage to affected
> frameworks only (instead of all registered frameworks). An affected
> framework is a framework which satisfied one or more conditions of
> the following:
> 
> 1. There are running tasks on this slave belonging to the framework.
> 2. There are pending tasks on this slave belonging to the framework.
> 3. Reserved resources on the slave have a matching role with the
>role of the framework.
> 4. There are pending offers or pending inverse offers from this slave
>for the framework.
> 
> 
> Diffs
> -
> 
>   src/master/master.hpp 1a875c32eddfb6d884e3d0dda7f5716ee53966c3 
>   src/master/master.cpp 0005a29caabcc6a3776037cf86a2b12660e6377b 
>   src/tests/master_tests.cpp 34be015aa314a7574e9065efb7b1bb8e1570c5b7 
> 
> Diff: https://reviews.apache.org/r/47082/diff/
> 
> 
> Testing
> ---
> 
> All existing and modified tests passed.
> 
> 
> Thanks,
> 
> Anindya Sinha
> 
>



Re: Review Request 47082: LostSlaveMessage should be sent to affected frameworks only.

2016-05-26 Thread Anindya Sinha

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/47082/
---

(Updated May 26, 2016, 11:56 p.m.)


Review request for mesos and Jiang Yan Xu.


Changes
---

Updated based on review comments.


Bugs: MESOS-5143
https://issues.apache.org/jira/browse/MESOS-5143


Repository: mesos


Description (updated)
---

When a slave is removed, master sends a LostSlaveMessage to affected
frameworks only (instead of all registered frameworks). An affected
framework is a framework which satisfied one or more conditions of
the following:

1. There are running tasks on this slave belonging to the framework.
2. There are pending tasks on this slave belonging to the framework.
3. Reserved resources on the slave have a matching role with the
   role of the framework.
4. There are pending offers or pending inverse offers from this slave
   for the framework.


Diffs (updated)
-

  src/master/master.hpp 1a875c32eddfb6d884e3d0dda7f5716ee53966c3 
  src/master/master.cpp 0005a29caabcc6a3776037cf86a2b12660e6377b 
  src/tests/master_tests.cpp 34be015aa314a7574e9065efb7b1bb8e1570c5b7 

Diff: https://reviews.apache.org/r/47082/diff/


Testing
---

All existing and modified tests passed.


Thanks,

Anindya Sinha



Re: Review Request 47082: LostSlaveMessage should be sent to affected frameworks only.

2016-05-25 Thread Jiang Yan Xu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/47082/#review134535
---



I think for this patch we should at least have one test case where a slave lost 
message is not sent.
Perhaps two frameworks connected the master and the slave lost message is only 
sent to one because the other doesn't have any business with that slave?


src/master/master.hpp (line 24)


Use a hashset (already included)?



src/master/master.hpp (line 666)


Use a hashset since no ordering is required?



src/master/master.cpp (line 96)


Remove this if we use a hashset.



src/master/master.cpp (lines 6545 - 6546)


For 1: Also there are inverse offers.

And there is also 
3. They have running tasks on this slave.
4. They have pending tasks for this slave.

You did add framworks for running tasks below when you go through 
`slave->tasks` but I think here we should list all cases for which we send the 
message in one place.

I commented on the pending tasks in one of the tests.



src/master/master.cpp (line 6550)


This is about a single slave so instead of looping through all registered 
frameworks, we can look for things in this particular slave's data members.



src/master/master.cpp (line 6551)


I don't feel that this boolean is necessary, we can easily check if a 
framework is added by checking if it's in the set (O(1) if hashset), right?



src/master/master.cpp (lines 6555 - 6561)


We can loop through `slave->offers` for this and we should look at 
`slave->inverseOffers` as well.

Actually, we already go through offers and inverseOffers below so we can 
just add `affectedFrameworks.insert(framework->id());` there (each with a 
comment).



src/master/master.cpp (lines 6566 - 6570)


Looks like this is no longer applicable in the suggested approach.



src/master/master.cpp (lines 6576 - 6579)


From `slave->checkpointedResource.reservations()` we can get the list of 
roles and from `activeRoles` we can get the list of frameworks for each role.



src/master/master.cpp (line 6722)


Also put the slaveID in this log?

e.g., something like this

```
LOG(WARNING) << "Dropping LostSlaveMessage about agent " << slaveInfo.id() 
 << " for unknown framework " << frameworkId;
```



src/tests/master_authorization_tests.cpp (lines 338 - 339)


If we don't expect it to be call, we should do

```
EXPECT_CALL(sched, slaveLost(, _))
  .Times(0);
```

Here and elsewhere.

But here I think we should fix this to have the framework receive the slave 
lost messsage. See the comment below.



src/tests/master_authorization_tests.cpp (lines 346 - 348)


So if the task is stuck in the `pendingTasks` a slave lost message is not 
sent but later a TASK_LOST is sent with reason `REASON_SLAVE_REMOVED`... We 
should handle this case the same way as the other cases where we do send slave 
lost IMO.

We probably need to add a `pendingTasks` hashmap in the slave as well and 
check against that map during `removeSlave()`, thoughts?


- Jiang Yan Xu


On May 6, 2016, 7:27 p.m., Anindya Sinha wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/47082/
> ---
> 
> (Updated May 6, 2016, 7:27 p.m.)
> 
> 
> Review request for mesos and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-5143
> https://issues.apache.org/jira/browse/MESOS-5143
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> When a slave is removed, master sends a LostSlaveMessage to affected
> frameworks only (instead of all registered frameworks). An affected
> framework is a framework which satisfied one or more conditions of
> the following:
> 
> 1. There are running tasks on this slave belonging to the framework.
> 2. Reserved resources on the slave have a matching role with the
>role of the framework.
> 3. There are pending offers from this slave for the framework.
> 
> 
> Diffs
> -
> 
>   src/master/master.hpp 3e55114ee7866e06513071e86e15608099dae052 
>   src/master/master.cpp 

Re: Review Request 47082: LostSlaveMessage should be sent to affected frameworks only.

2016-05-06 Thread Mesos ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/47082/#review132122
---



Patch looks great!

Reviews applied: [47082]

Passed command: export OS='ubuntu:14.04' CONFIGURATION='--verbose' 
COMPILER='gcc' ENVIRONMENT='GLOG_v=1 MESOS_VERBOSE=1'; ./support/docker_build.sh

- Mesos ReviewBot


On May 7, 2016, 2:27 a.m., Anindya Sinha wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/47082/
> ---
> 
> (Updated May 7, 2016, 2:27 a.m.)
> 
> 
> Review request for mesos and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-5143
> https://issues.apache.org/jira/browse/MESOS-5143
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> When a slave is removed, master sends a LostSlaveMessage to affected
> frameworks only (instead of all registered frameworks). An affected
> framework is a framework which satisfied one or more conditions of
> the following:
> 
> 1. There are running tasks on this slave belonging to the framework.
> 2. Reserved resources on the slave have a matching role with the
>role of the framework.
> 3. There are pending offers from this slave for the framework.
> 
> 
> Diffs
> -
> 
>   src/master/master.hpp 3e55114ee7866e06513071e86e15608099dae052 
>   src/master/master.cpp 6d3e0f7c634690a35eec1ce827b705e04c3af87e 
>   src/tests/master_authorization_tests.cpp 
> 804b39a269c09df9f6c0bbdf6f8b53921ac09ce8 
>   src/tests/master_tests.cpp 8e00753fcbcd0cae1d08aad387b08aa17c7f63ad 
> 
> Diff: https://reviews.apache.org/r/47082/diff/
> 
> 
> Testing
> ---
> 
> All existing and modified tests passed.
> 
> 
> Thanks,
> 
> Anindya Sinha
> 
>