Re: Review Request 66311: Implement recovery of resource provider manager.

2018-04-19 Thread Chun-Hung Hsiao


> On April 19, 2018, 6 a.m., Chun-Hung Hsiao wrote:
> > src/resource_provider/registrar.cpp
> > Lines 193-203 (original), 194-203 (patched)
> > 
> >
> > How about moving this into `GenericRegistrarProcess::initialize()` to 
> > start recovery early?
> > 
> > If we do this and keep `recovered` (See below) then this function 
> > should return
> > ```
> > recovered->then([=] { return registry.get(); })
> > ```
> 
> Benjamin Bannier wrote:
> We already require users to `recover` registrars before being able to 
> perform operations against them (like for other registrars), so I am not 
> really sure I understand how what you suggest would help. Could you explain?

Oh what I mean is doing the following:
```
virtual void initialize() override {
  registry = ... // start the recovery.
}

Future recover() {
  return registry;
}
```

The second part of my comment above is just illustrating the code if we keep 
`registry` as an `Option` and `recovered` as a `Future`.


> On April 19, 2018, 6 a.m., Chun-Hung Hsiao wrote:
> > src/resource_provider/registrar.cpp
> > Line 205 (original), 205 (patched)
> > 
> >
> > Why do we need the `undiscardable` here? Could you add some comments?
> > 
> > Also, should we prevent the recovery being discarded due to a caller 
> > discarding an `apply` call? If yes then we should do
> > ```
> > return undiscardable(registry.get()).then(... ::_apply ...);
> > ```
> > in the following `apply()` function.
> 
> Benjamin Bannier wrote:
> I added a comment and also fixed below `apply` to use `recover()` which 
> would return the cached result, already guarded by `undiscarable`.

I was condering that we could do `undiscardable` in `apply` so that the caller 
can actually discard the recovery if they want to. Not sure if this is a valid 
use case though. Please drop it if you don't think so.


> On April 19, 2018, 6 a.m., Chun-Hung Hsiao wrote:
> > src/resource_provider/registrar.cpp
> > Line 216 (original), 216 (patched)
> > 
> >
> > The overall logic is correct, but it takes a bit of inference to know 
> > that overwriting `registry` with a new `Future` in an earlier `_apply` does 
> > not affect a later `_apply` that is chained with the overwritten `Future`. 
> > So it seems more readible to me if we keep the original 
> > `Option recovered` (or make it just a `Promise`), 
> > and chain `_apply` with `recovered` here.
> 
> Benjamin Bannier wrote:
> I replaced the direct access to `registry` with `recover` here which once 
> recovered would just serve a cached result. Does that look more reasonable to 
> you?

The thing that seems unintuitive to me is that `recover()` would return 
different futures at different times. May I ask what the reason to make 
`registry` a `Future` instead of an `Option`?


- Chun-Hung


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66311/#review201488
---


On April 19, 2018, 12:29 p.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66311/
> ---
> 
> (Updated April 19, 2018, 12:29 p.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao, Jie Yu, and Jan Schlicht.
> 
> 
> Bugs: MESOS-8735
> https://issues.apache.org/jira/browse/MESOS-8735
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This patch adjusts the control flow of the resource provider manager
> so that we can in the future make use of persisted resource provider
> information.
> 
> 
> Diffs
> -
> 
>   src/resource_provider/manager.cpp 68e1866986bfb91bf8355099ee1f0a2a86aba39a 
>   src/resource_provider/registrar.hpp 
> 39f45b0a2a7c35bfe72a9305f5248409e4a3ef45 
>   src/resource_provider/registrar.cpp 
> 92ef9aecb1e93d10f46e53984391558f9901a60b 
>   src/tests/resource_provider_manager_tests.cpp 
> c52541bf130ccf4795b989b53331176a64a097ea 
> 
> 
> Diff: https://reviews.apache.org/r/66311/diff/5/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>



Re: Review Request 66311: Implement recovery of resource provider manager.

2018-04-19 Thread Benjamin Bannier


> On April 19, 2018, 8 a.m., Chun-Hung Hsiao wrote:
> > src/resource_provider/manager.cpp
> > Lines 237 (patched)
> > 
> >
> > `::recover`

This is not used anywhere else in this file, let's just keep this instance as 
is?

Dropping.


> On April 19, 2018, 8 a.m., Chun-Hung Hsiao wrote:
> > src/resource_provider/manager.cpp
> > Lines 264 (patched)
> > 
> >
> > This means that if the registrar itself takes a while to recover, and 
> > during recovery a lot of RPs subscribe to the manager, then all requests 
> > (along with potentially huge `ResourceProviderInfo`s) will be stored in the 
> > memory until the recovery is finished.
> > 
> > I was wondering if we should consider just rejecting all requests 
> > before the reccovery is finished, and have a function returning the future 
> > for recovery so the caller can do synchronization if they want to. Thoughts?

That's what I wanted to do initially as well, but it really complicates 
resource providers as they would need to implement some retry logic.

As we currently do assume not too large numbers of resource providers elsewhere 
as well, I believe keeping pending subscribe requests in memory should be fine 
for now. I added a `TODO` regarding your suggested improvement.


> On April 19, 2018, 8 a.m., Chun-Hung Hsiao wrote:
> > src/resource_provider/manager.cpp
> > Line 304 (original), 346 (patched)
> > 
> >
> > I might be missing something, but why is `this` necessary? Ditto below.

We are in a lambda body here and capture `this` which in general is not safe 
(unless one uses `dispatch` or `defer`). We often do make member access or 
method calls explicit in such contexts to call out the "dangerous parts".


> On April 19, 2018, 8 a.m., Chun-Hung Hsiao wrote:
> > src/resource_provider/registrar.cpp
> > Lines 193-203 (original), 194-203 (patched)
> > 
> >
> > How about moving this into `GenericRegistrarProcess::initialize()` to 
> > start recovery early?
> > 
> > If we do this and keep `recovered` (See below) then this function 
> > should return
> > ```
> > recovered->then([=] { return registry.get(); })
> > ```

We already require users to `recover` registrars before being able to perform 
operations against them (like for other registrars), so I am not really sure I 
understand how what you suggest would help. Could you explain?


> On April 19, 2018, 8 a.m., Chun-Hung Hsiao wrote:
> > src/resource_provider/registrar.cpp
> > Line 205 (original), 205 (patched)
> > 
> >
> > Why do we need the `undiscardable` here? Could you add some comments?
> > 
> > Also, should we prevent the recovery being discarded due to a caller 
> > discarding an `apply` call? If yes then we should do
> > ```
> > return undiscardable(registry.get()).then(... ::_apply ...);
> > ```
> > in the following `apply()` function.

I added a comment and also fixed below `apply` to use `recover()` which would 
return the cached result, already guarded by `undiscarable`.


> On April 19, 2018, 8 a.m., Chun-Hung Hsiao wrote:
> > src/resource_provider/registrar.cpp
> > Line 216 (original), 216 (patched)
> > 
> >
> > The overall logic is correct, but it takes a bit of inference to know 
> > that overwriting `registry` with a new `Future` in an earlier `_apply` does 
> > not affect a later `_apply` that is chained with the overwritten `Future`. 
> > So it seems more readible to me if we keep the original 
> > `Option recovered` (or make it just a `Promise`), 
> > and chain `_apply` with `recovered` here.

I replaced the direct access to `registry` with `recover` here which once 
recovered would just serve a cached result. Does that look more reasonable to 
you?


> On April 19, 2018, 8 a.m., Chun-Hung Hsiao wrote:
> > src/resource_provider/registrar.cpp
> > Line 250 (original), 250 (patched)
> > 
> >
> > The overall logic is correct, but it takes a bit of inference to know 
> > that the `Future` obtained from `registry.get()` is guaranteed to be ready. 
> > So it seems more readible to me if we keep `registry` as an `Option` rather 
> > than a `Future`.

It is pretty explicit on the `apply` path above -- how about adding a 
`CHECK(registry->isReady())` for documentation here?

Updated the code like that.


> On April 19, 2018, 8 a.m., Chun-Hung Hsiao wrote:
> > src/resource_provider/registrar.cpp
> > Line 381 (original), 387 (patched)
> > 

Re: Review Request 66311: Implement recovery of resource provider manager.

2018-04-19 Thread Benjamin Bannier

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66311/
---

(Updated April 19, 2018, 2:29 p.m.)


Review request for mesos, Chun-Hung Hsiao, Jie Yu, and Jan Schlicht.


Changes
---

Addressed some issues raised by Chun.


Bugs: MESOS-8735
https://issues.apache.org/jira/browse/MESOS-8735


Repository: mesos


Description
---

This patch adjusts the control flow of the resource provider manager
so that we can in the future make use of persisted resource provider
information.


Diffs (updated)
-

  src/resource_provider/manager.cpp 68e1866986bfb91bf8355099ee1f0a2a86aba39a 
  src/resource_provider/registrar.hpp 39f45b0a2a7c35bfe72a9305f5248409e4a3ef45 
  src/resource_provider/registrar.cpp 92ef9aecb1e93d10f46e53984391558f9901a60b 
  src/tests/resource_provider_manager_tests.cpp 
c52541bf130ccf4795b989b53331176a64a097ea 


Diff: https://reviews.apache.org/r/66311/diff/5/

Changes: https://reviews.apache.org/r/66311/diff/4-5/


Testing
---

`make check`


Thanks,

Benjamin Bannier



Re: Review Request 66311: Implement recovery of resource provider manager.

2018-04-19 Thread Chun-Hung Hsiao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66311/#review201488
---




src/resource_provider/manager.cpp
Lines 89 (patched)


This is not used.



src/resource_provider/manager.cpp
Lines 234-247 (patched)


Let's move this into `ResourceProviderManagerProcess::initialize()` to keep 
this running in the actor's context (although I think this is safe because we 
spawn the process after the constructor finishes).



src/resource_provider/manager.cpp
Lines 237 (patched)


`::recover`



src/resource_provider/manager.cpp
Lines 242 (patched)


I actually don't know we have `operator<<` defined for a future! Learned 
something :)



src/resource_provider/manager.cpp
Lines 264 (patched)


This means that if the registrar itself takes a while to recover, and 
during recovery a lot of RPs subscribe to the manager, then all requests (along 
with potentially huge `ResourceProviderInfo`s) will be stored in the memory 
until the recovery is finished.

I was wondering if we should consider just rejecting all requests before 
the reccovery is finished, and have a function returning the future for 
recovery so the caller can do synchronization if they want to. Thoughts?



src/resource_provider/manager.cpp
Line 304 (original), 346 (patched)


I might be missing something, but why is `this` necessary? Ditto below.



src/resource_provider/registrar.cpp
Lines 193-203 (original), 194-203 (patched)


How about moving this into `GenericRegistrarProcess::initialize()` to start 
recovery early?

If we do this and keep `recovered` (See below) then this function should 
return
```
recovered->then([=] { return registry.get(); })
```



src/resource_provider/registrar.cpp
Line 205 (original), 205 (patched)


Why do we need the `undiscardable` here? Could you add some comments?

Also, should we prevent the recovery being discarded due to a caller 
discarding an `apply` call? If yes then we should do
```
return undiscardable(registry.get()).then(... ::_apply ...);
```
in the following `apply()` function.



src/resource_provider/registrar.cpp
Line 216 (original), 216 (patched)


The overall logic is correct, but it takes a bit of inference to know that 
overwriting `registry` with a new `Future` in an earlier `_apply` does not 
affect a later `_apply` that is chained with the overwritten `Future`. So it 
seems more readible to me if we keep the original `Option 
recovered` (or make it just a `Promise`), and chain `_apply` with 
`recovered` here.



src/resource_provider/registrar.cpp
Line 250 (original), 250 (patched)


The overall logic is correct, but it takes a bit of inference to know that 
the `Future` obtained from `registry.get()` is guaranteed to be ready. So it 
seems more readible to me if we keep `registry` as an `Option` rather than a 
`Future`.



src/resource_provider/registrar.cpp
Line 381 (original), 387 (patched)


Not yours, but the use of pointer indicates that this 
`resource_provider::MasterRegistrarProcess` relies on the lifecycle of the 
backed `master::Registrar` without any guarantee. Let's add some comments about 
this.


- Chun-Hung Hsiao


On April 11, 2018, 7:30 a.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66311/
> ---
> 
> (Updated April 11, 2018, 7:30 a.m.)
> 
> 
> Review request for mesos, Jie Yu and Jan Schlicht.
> 
> 
> Bugs: MESOS-8735
> https://issues.apache.org/jira/browse/MESOS-8735
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This patch adjusts the control flow of the resource provider manager
> so that we can in the future make use of persisted resource provider
> information.
> 
> 
> Diffs
> -
> 
>   src/resource_provider/manager.cpp 68e1866986bfb91bf8355099ee1f0a2a86aba39a 
>   src/resource_provider/registrar.hpp 
> 39f45b0a2a7c35bfe72a9305f5248409e4a3ef45 
>   src/resource_provider/registrar.cpp 
> 92ef9aecb1e93d10f46e53984391558f9901a60b 
>   src/tests/resource_provider_manager_tests.cpp 
> c52541bf130ccf4795b989b53331176a64a097ea 
> 
> 
> Diff: 

Re: Review Request 66311: Implement recovery of resource provider manager.

2018-04-11 Thread Mesos Reviewbot Windows

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66311/#review200898
---



PASS: Mesos patch 66311 was successfully built and tested.

Reviews applied: `['66308', '66309', '66526', '66310', '66311']`

All the build artifacts available at: 
http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/66311

- Mesos Reviewbot Windows


On April 11, 2018, 7:30 a.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66311/
> ---
> 
> (Updated April 11, 2018, 7:30 a.m.)
> 
> 
> Review request for mesos, Jie Yu and Jan Schlicht.
> 
> 
> Bugs: MESOS-8735
> https://issues.apache.org/jira/browse/MESOS-8735
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This patch adjusts the control flow of the resource provider manager
> so that we can in the future make use of persisted resource provider
> information.
> 
> 
> Diffs
> -
> 
>   src/resource_provider/manager.cpp 68e1866986bfb91bf8355099ee1f0a2a86aba39a 
>   src/resource_provider/registrar.hpp 
> 39f45b0a2a7c35bfe72a9305f5248409e4a3ef45 
>   src/resource_provider/registrar.cpp 
> 92ef9aecb1e93d10f46e53984391558f9901a60b 
>   src/tests/resource_provider_manager_tests.cpp 
> c52541bf130ccf4795b989b53331176a64a097ea 
> 
> 
> Diff: https://reviews.apache.org/r/66311/diff/4/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>



Re: Review Request 66311: Implement recovery of resource provider manager.

2018-04-11 Thread Benjamin Bannier


> On April 10, 2018, 3:05 nachm., Jan Schlicht wrote:
> > src/resource_provider/registrar.cpp
> > Lines 97 (patched)
> > 
> >
> > `Registry` is in scope here and used without the namespace in other 
> > parts of this unit. Let's be consistent and drop the namespace here and 
> > below.
> 
> Benjamin Bannier wrote:
> See above. Dropping.
> 
> Benjamin Bannier wrote:
> Misread, reopening.

Fixed for a number `recover`s below as well.


- Benjamin


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66311/#review200734
---


On April 11, 2018, 9:30 vorm., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66311/
> ---
> 
> (Updated April 11, 2018, 9:30 vorm.)
> 
> 
> Review request for mesos, Jie Yu and Jan Schlicht.
> 
> 
> Bugs: MESOS-8735
> https://issues.apache.org/jira/browse/MESOS-8735
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This patch adjusts the control flow of the resource provider manager
> so that we can in the future make use of persisted resource provider
> information.
> 
> 
> Diffs
> -
> 
>   src/resource_provider/manager.cpp 68e1866986bfb91bf8355099ee1f0a2a86aba39a 
>   src/resource_provider/registrar.hpp 
> 39f45b0a2a7c35bfe72a9305f5248409e4a3ef45 
>   src/resource_provider/registrar.cpp 
> 92ef9aecb1e93d10f46e53984391558f9901a60b 
>   src/tests/resource_provider_manager_tests.cpp 
> c52541bf130ccf4795b989b53331176a64a097ea 
> 
> 
> Diff: https://reviews.apache.org/r/66311/diff/4/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>



Re: Review Request 66311: Implement recovery of resource provider manager.

2018-04-11 Thread Benjamin Bannier

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66311/
---

(Updated April 11, 2018, 9:30 a.m.)


Review request for mesos, Jie Yu and Jan Schlicht.


Changes
---

Addressed issues raised by Jan.


Bugs: MESOS-8735
https://issues.apache.org/jira/browse/MESOS-8735


Repository: mesos


Description
---

This patch adjusts the control flow of the resource provider manager
so that we can in the future make use of persisted resource provider
information.


Diffs (updated)
-

  src/resource_provider/manager.cpp 68e1866986bfb91bf8355099ee1f0a2a86aba39a 
  src/resource_provider/registrar.hpp 39f45b0a2a7c35bfe72a9305f5248409e4a3ef45 
  src/resource_provider/registrar.cpp 92ef9aecb1e93d10f46e53984391558f9901a60b 
  src/tests/resource_provider_manager_tests.cpp 
c52541bf130ccf4795b989b53331176a64a097ea 


Diff: https://reviews.apache.org/r/66311/diff/4/

Changes: https://reviews.apache.org/r/66311/diff/3-4/


Testing
---

`make check`


Thanks,

Benjamin Bannier



Re: Review Request 66311: Implement recovery of resource provider manager.

2018-04-10 Thread Benjamin Bannier


> On April 10, 2018, 3:05 p.m., Jan Schlicht wrote:
> >

Thank you for your comments. I have some issues updating the patch in 
reviewboard at this time, will retry later. Please see my responses below.


> On April 10, 2018, 3:05 p.m., Jan Schlicht wrote:
> > src/resource_provider/manager.cpp
> > Lines 201 (patched)
> > 
> >
> > Unnecessary namespaces, as `Registry` is in scope here. Here and in the 
> > implementation below.

This _is_ needed due to ADL -- as `ResourceProviderManagerProcess` is defined 
in `mesos::internal`, just `Registry` would resolve to 
`mesos::internal::Registry` here. Dropping.


> On April 10, 2018, 3:05 p.m., Jan Schlicht wrote:
> > src/resource_provider/registrar.cpp
> > Lines 97 (patched)
> > 
> >
> > `Registry` is in scope here and used without the namespace in other 
> > parts of this unit. Let's be consistent and drop the namespace here and 
> > below.

See above. Dropping.


- Benjamin


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66311/#review200734
---


On April 10, 2018, 2:07 p.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66311/
> ---
> 
> (Updated April 10, 2018, 2:07 p.m.)
> 
> 
> Review request for mesos, Jie Yu and Jan Schlicht.
> 
> 
> Bugs: MESOS-8735
> https://issues.apache.org/jira/browse/MESOS-8735
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This patch adjusts the control flow of the resource provider manager
> so that we can in the future make use of persisted resource provider
> information.
> 
> 
> Diffs
> -
> 
>   src/resource_provider/manager.cpp 2d3ffcdd01671f81729fe538348f2ca4e555eb4f 
>   src/resource_provider/registrar.hpp 
> 39f45b0a2a7c35bfe72a9305f5248409e4a3ef45 
>   src/resource_provider/registrar.cpp 
> 9eb49f1327a0b598c5464a29a09ee286d7018f09 
>   src/tests/resource_provider_manager_tests.cpp 
> c52541bf130ccf4795b989b53331176a64a097ea 
> 
> 
> Diff: https://reviews.apache.org/r/66311/diff/3/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>



Re: Review Request 66311: Implement recovery of resource provider manager.

2018-04-10 Thread Mesos Reviewbot Windows

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66311/#review200814
---



PASS: Mesos patch 66311 was successfully built and tested.

Reviews applied: `['66508', '66308', '66309', '66526', '66310', '66311']`

All the build artifacts available at: 
http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/66311

- Mesos Reviewbot Windows


On April 10, 2018, 2:07 p.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66311/
> ---
> 
> (Updated April 10, 2018, 2:07 p.m.)
> 
> 
> Review request for mesos, Jie Yu and Jan Schlicht.
> 
> 
> Bugs: MESOS-8735
> https://issues.apache.org/jira/browse/MESOS-8735
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This patch adjusts the control flow of the resource provider manager
> so that we can in the future make use of persisted resource provider
> information.
> 
> 
> Diffs
> -
> 
>   src/resource_provider/manager.cpp 2d3ffcdd01671f81729fe538348f2ca4e555eb4f 
>   src/resource_provider/registrar.hpp 
> 39f45b0a2a7c35bfe72a9305f5248409e4a3ef45 
>   src/resource_provider/registrar.cpp 
> 9eb49f1327a0b598c5464a29a09ee286d7018f09 
>   src/tests/resource_provider_manager_tests.cpp 
> c52541bf130ccf4795b989b53331176a64a097ea 
> 
> 
> Diff: https://reviews.apache.org/r/66311/diff/3/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>



Re: Review Request 66311: Implement recovery of resource provider manager.

2018-04-10 Thread Jan Schlicht

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66311/#review200734
---




src/resource_provider/manager.cpp
Lines 65 (patched)


Blank line prior to this one.



src/resource_provider/manager.cpp
Lines 201 (patched)


Unnecessary namespaces, as `Registry` is in scope here. Here and in the 
implementation below.



src/resource_provider/manager.cpp
Lines 237 (patched)


`registry` as name is a bit misleading, how about `recovered`?



src/resource_provider/manager.cpp
Lines 240 (patched)


s/recovery/recover



src/resource_provider/registrar.cpp
Lines 97 (patched)


`Registry` is in scope here and used without the namespace in other parts 
of this unit. Let's be consistent and drop the namespace here and below.


- Jan Schlicht


On April 10, 2018, 2:07 p.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66311/
> ---
> 
> (Updated April 10, 2018, 2:07 p.m.)
> 
> 
> Review request for mesos, Jie Yu and Jan Schlicht.
> 
> 
> Bugs: MESOS-8735
> https://issues.apache.org/jira/browse/MESOS-8735
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This patch adjusts the control flow of the resource provider manager
> so that we can in the future make use of persisted resource provider
> information.
> 
> 
> Diffs
> -
> 
>   src/resource_provider/manager.cpp 2d3ffcdd01671f81729fe538348f2ca4e555eb4f 
>   src/resource_provider/registrar.hpp 
> 39f45b0a2a7c35bfe72a9305f5248409e4a3ef45 
>   src/resource_provider/registrar.cpp 
> 9eb49f1327a0b598c5464a29a09ee286d7018f09 
>   src/tests/resource_provider_manager_tests.cpp 
> c52541bf130ccf4795b989b53331176a64a097ea 
> 
> 
> Diff: https://reviews.apache.org/r/66311/diff/3/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>



Re: Review Request 66311: Implement recovery of resource provider manager.

2018-04-10 Thread Benjamin Bannier

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66311/
---

(Updated April 10, 2018, 2:07 p.m.)


Review request for mesos, Jie Yu and Jan Schlicht.


Changes
---

Rebased.


Bugs: MESOS-8735
https://issues.apache.org/jira/browse/MESOS-8735


Repository: mesos


Description
---

This patch adjusts the control flow of the resource provider manager
so that we can in the future make use of persisted resource provider
information.


Diffs (updated)
-

  src/resource_provider/manager.cpp 2d3ffcdd01671f81729fe538348f2ca4e555eb4f 
  src/resource_provider/registrar.hpp 39f45b0a2a7c35bfe72a9305f5248409e4a3ef45 
  src/resource_provider/registrar.cpp 9eb49f1327a0b598c5464a29a09ee286d7018f09 
  src/tests/resource_provider_manager_tests.cpp 
c52541bf130ccf4795b989b53331176a64a097ea 


Diff: https://reviews.apache.org/r/66311/diff/3/

Changes: https://reviews.apache.org/r/66311/diff/2-3/


Testing
---

`make check`


Thanks,

Benjamin Bannier



Re: Review Request 66311: Implement recovery of resource provider manager.

2018-04-09 Thread Mesos Reviewbot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66311/#review200735
---



Bad patch!

Reviews applied: [66311, 66310, 66309, 66308]

Failed command: python support/apply-reviews.py -n -r 66310

Error:
2018-04-09 14:29:49 URL:https://reviews.apache.org/r/66310/diff/raw/ 
[7210/7210] -> "66310.patch" [1]
error: patch failed: src/slave/slave.cpp:36
error: src/slave/slave.cpp: patch does not apply

Full log: https://builds.apache.org/job/Mesos-Reviewbot/22136/console

- Mesos Reviewbot


On April 9, 2018, 10:29 a.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66311/
> ---
> 
> (Updated April 9, 2018, 10:29 a.m.)
> 
> 
> Review request for mesos, Jie Yu and Jan Schlicht.
> 
> 
> Bugs: MESOS-8735
> https://issues.apache.org/jira/browse/MESOS-8735
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This patch adjusts the control flow of the resource provider manager
> so that we can in the future make use of persisted resource provider
> information.
> 
> 
> Diffs
> -
> 
>   src/resource_provider/manager.cpp 2d3ffcdd01671f81729fe538348f2ca4e555eb4f 
>   src/resource_provider/registrar.hpp 
> 39f45b0a2a7c35bfe72a9305f5248409e4a3ef45 
>   src/resource_provider/registrar.cpp 
> 9eb49f1327a0b598c5464a29a09ee286d7018f09 
>   src/tests/resource_provider_manager_tests.cpp 
> 05ae5defb986103520cf630d5578e8a04e662313 
> 
> 
> Diff: https://reviews.apache.org/r/66311/diff/2/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>



Re: Review Request 66311: Implement recovery of resource provider manager.

2018-04-09 Thread Mesos Reviewbot Windows

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66311/#review200729
---



FAIL: Failed to apply the dependent review: 66310.

Failed command: `python.exe .\support\apply-reviews.py -n -r 66310`

All the build artifacts available at: 
http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/66311

Relevant logs:

- 
[apply-review-66310-stdout.log](http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/66311/logs/apply-review-66310-stdout.log):

```
error: patch failed: src/slave/slave.cpp:36
error: src/slave/slave.cpp: patch does not apply
```

- Mesos Reviewbot Windows


On April 9, 2018, 10:29 a.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66311/
> ---
> 
> (Updated April 9, 2018, 10:29 a.m.)
> 
> 
> Review request for mesos, Jie Yu and Jan Schlicht.
> 
> 
> Bugs: MESOS-8735
> https://issues.apache.org/jira/browse/MESOS-8735
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This patch adjusts the control flow of the resource provider manager
> so that we can in the future make use of persisted resource provider
> information.
> 
> 
> Diffs
> -
> 
>   src/resource_provider/manager.cpp 2d3ffcdd01671f81729fe538348f2ca4e555eb4f 
>   src/resource_provider/registrar.hpp 
> 39f45b0a2a7c35bfe72a9305f5248409e4a3ef45 
>   src/resource_provider/registrar.cpp 
> 9eb49f1327a0b598c5464a29a09ee286d7018f09 
>   src/tests/resource_provider_manager_tests.cpp 
> 05ae5defb986103520cf630d5578e8a04e662313 
> 
> 
> Diff: https://reviews.apache.org/r/66311/diff/2/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>



Re: Review Request 66311: Implement recovery of resource provider manager.

2018-04-09 Thread Benjamin Bannier


> On April 6, 2018, 4:22 p.m., Benjamin Bannier wrote:
> > src/resource_provider/manager.cpp
> > Lines 261-263 (patched)
> > 
> >
> > The used endpoint detector currently cannot deal with this which can 
> > e.g., lead to failed resource provider subscribes in production or test 
> > code.

Addressed by blocking any API call until the RP manager has finished recovery. 
This appears okay as recovery is started automatically when the manager is 
constructed.


- Benjamin


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66311/#review200650
---


On April 9, 2018, 12:29 p.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66311/
> ---
> 
> (Updated April 9, 2018, 12:29 p.m.)
> 
> 
> Review request for mesos, Jie Yu and Jan Schlicht.
> 
> 
> Bugs: MESOS-8735
> https://issues.apache.org/jira/browse/MESOS-8735
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This patch adjusts the control flow of the resource provider manager
> so that we can in the future make use of persisted resource provider
> information.
> 
> 
> Diffs
> -
> 
>   src/resource_provider/manager.cpp 2d3ffcdd01671f81729fe538348f2ca4e555eb4f 
>   src/resource_provider/registrar.hpp 
> 39f45b0a2a7c35bfe72a9305f5248409e4a3ef45 
>   src/resource_provider/registrar.cpp 
> 9eb49f1327a0b598c5464a29a09ee286d7018f09 
>   src/tests/resource_provider_manager_tests.cpp 
> 05ae5defb986103520cf630d5578e8a04e662313 
> 
> 
> Diff: https://reviews.apache.org/r/66311/diff/2/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>



Re: Review Request 66311: Implement recovery of resource provider manager.

2018-04-09 Thread Benjamin Bannier

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66311/
---

(Updated April 9, 2018, 12:29 p.m.)


Review request for mesos, Jie Yu and Jan Schlicht.


Changes
---

Blocked RP manager API until recovery is complete.


Bugs: MESOS-8735
https://issues.apache.org/jira/browse/MESOS-8735


Repository: mesos


Description
---

This patch adjusts the control flow of the resource provider manager
so that we can in the future make use of persisted resource provider
information.


Diffs (updated)
-

  src/resource_provider/manager.cpp 2d3ffcdd01671f81729fe538348f2ca4e555eb4f 
  src/resource_provider/registrar.hpp 39f45b0a2a7c35bfe72a9305f5248409e4a3ef45 
  src/resource_provider/registrar.cpp 9eb49f1327a0b598c5464a29a09ee286d7018f09 
  src/tests/resource_provider_manager_tests.cpp 
05ae5defb986103520cf630d5578e8a04e662313 


Diff: https://reviews.apache.org/r/66311/diff/2/

Changes: https://reviews.apache.org/r/66311/diff/1-2/


Testing
---

`make check`


Thanks,

Benjamin Bannier



Re: Review Request 66311: Implement recovery of resource provider manager.

2018-04-06 Thread Benjamin Bannier

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66311/#review200650
---




src/resource_provider/manager.cpp
Lines 261-263 (patched)


The used endpoint detector currently cannot deal with this which can e.g., 
lead to failed resource provider subscribes in production or test code.


- Benjamin Bannier


On March 27, 2018, 5:34 p.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66311/
> ---
> 
> (Updated March 27, 2018, 5:34 p.m.)
> 
> 
> Review request for mesos, Jie Yu and Jan Schlicht.
> 
> 
> Bugs: MESOS-8735
> https://issues.apache.org/jira/browse/MESOS-8735
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This patch adjusts the control flow of the resource provider manager
> so that we can in the future make use of persisted resource provider
> information.
> 
> 
> Diffs
> -
> 
>   src/resource_provider/manager.cpp 2d3ffcdd01671f81729fe538348f2ca4e555eb4f 
>   src/resource_provider/registrar.hpp 
> 39f45b0a2a7c35bfe72a9305f5248409e4a3ef45 
>   src/resource_provider/registrar.cpp 
> 9eb49f1327a0b598c5464a29a09ee286d7018f09 
>   src/tests/resource_provider_manager_tests.cpp 
> d947bd037190e6b7ea7b2277b5fbe47816878de4 
> 
> 
> Diff: https://reviews.apache.org/r/66311/diff/1/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>



Re: Review Request 66311: Implement recovery of resource provider manager.

2018-03-28 Thread Mesos Reviewbot Windows

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66311/#review200118
---



FAIL: Some of the unit tests failed. Please check the relevant logs.

Reviews applied: `['66308', '66309', '66310', '66311']`

Failed command: `Start-MesosCITesting`

All the build artifacts available at: 
http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/66311

Relevant logs:

- 
[mesos-tests-stdout.log](http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/66311/logs/mesos-tests-stdout.log):

```
[   OK ] Endpoint/SlaveEndpointTest.NoAuthorizer/2 (111 ms)
[--] 9 tests from Endpoint/SlaveEndpointTest (988 ms total)

[--] 2 tests from ContainerizerType/DefaultContainerDNSFlagTest
[ RUN  ] ContainerizerType/DefaultContainerDNSFlagTest.ValidateFlag/0
[   OK ] ContainerizerType/DefaultContainerDNSFlagTest.ValidateFlag/0 (38 
ms)
[ RUN  ] ContainerizerType/DefaultContainerDNSFlagTest.ValidateFlag/1
[   OK ] ContainerizerType/DefaultContainerDNSFlagTest.ValidateFlag/1 (42 
ms)
[--] 2 tests from ContainerizerType/DefaultContainerDNSFlagTest (81 ms 
total)

[--] 1 test from IsolationFlag/CpuIsolatorTest
[ RUN  ] IsolationFlag/CpuIsolatorTest.ROOT_UserCpuUsage/0
[   OK ] IsolationFlag/CpuIsolatorTest.ROOT_UserCpuUsage/0 (778 ms)
[--] 1 test from IsolationFlag/CpuIsolatorTest (799 ms total)

[--] 1 test from IsolationFlag/MemoryIsolatorTest
[ RUN  ] IsolationFlag/MemoryIsolatorTest.ROOT_MemUsage/0
[   OK ] IsolationFlag/MemoryIsolatorTest.ROOT_MemUsage/0 (732 ms)
[--] 1 test from IsolationFlag/MemoryIsolatorTest (753 ms total)

[--] Global test environment tear-down
[==] 949 tests from 94 test cases ran. (423168 ms total)
[  PASSED  ] 948 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] CommandExecutorCheckTest.CommandCheckTimeout

 1 FAILED TEST
  YOU HAVE 214 DISABLED TESTS

```

- 
[mesos-tests-stderr.log](http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/66311/logs/mesos-tests-stderr.log):

```
I0328 15:46:36.892980  5608 exec.cpp:236] Executor registered on agent 
48f40b10-7e34-4fe9-8156-dc7b1068fc89-S0
I0328 15:46:36.896950  5360 executor.cpp:176] Received SUBSCRIBED event
I0328 15:46:36.901968  5360 executor.cpp:180] Subscribed executor on 
winbldsrv-02.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net
I0328 15:46:36.902966  5360 executor.cpp:176] Received LAUNCH event
I0328 15:46:36.907972  5360 executor.cpp:648] Starting task 
11371b21-8bcb-41fb-853e-415218194b75
I0328 15:46:36.996968  5360 executor.cpp:483] Running 
'D:\DCOS\mesos\src\mesos-containerizer.exe launch '
I0328 15:46:37.024973  5360 executor.cpp:661] Forked command at 7704
I0328 15:46:37.056937  1004 exec.cpp:445] Executor asked to shutdown
I0328 15:46:37.057945  5448 executor.cpp:176] Received SHUTDOWN event
I0328 15:46:37.057945  5448 executor.cpp:758] Shutting down
I0328 15:46:37.057945  5448 executor.cpp:868] Sending SIGTERM to process tree 
at pid 7e34-4fe9-8156-dc7b1068fc89-
I0328 15:46:37.054944  8400 master.cpp:10446] Updating the state of task 
11371b21-8bcb-41fb-853e-415218194b75 of framework 
48f40b10-7e34-4fe9-8156-dc7b1068fc89- (latest state: TASK_KILLED, status 
update state: TASK_KILLED)
I0328 15:46:37.054944  8264 slave.cpp:3871] Shutting down framework 
48f40b10-7e34-4fe9-8156-dc7b1068fc89-
I0328 15:46:37.055943  8264 slave.cpp:6579] Shutting down executor 
'11371b21-8bcb-41fb-853e-415218194b75' of framework 
48f40b10-7e34-4fe9-8156-dc7b1068fc89- at executor(1)@10.3.1.5:54190
I0328 15:46:37.055943  8264 slave.cpp:915] Agent terminating
W0328 15:46:37.056937  8264 slave.cpp:3867] Ignoring shutdown framework 
48f40b10-7e34-4fe9-8156-dc7b1068fc89- because it is terminating
I0328 15:46:37.057945  8400 master.cpp:10545] Removing task 
11371b21-8bcb-41fb-853e-415218194b75 with resources cpus(allocated: *):4; 
mem(allocated: *):2048; disk(allocated: *):1024; ports(allocated: 
*):[31000-32000] of framework 48f40b10-7e34-4fe9-8156-dc7b1068fc89- on 
agent 48f40b10-7e34-4fe9-8156-dc7b1068fc89-S0 at slave(417)@10.3.1.5:54169 
(winbldsrv-02.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net)
I0328 15:46:37.060938  4904 containerizer.cpp:2338] Destroying container 
6cb221ac-b026-4983-b310-d1ecc7936091 in RUNNING state
I0328 15:46:37.060938  4904 containerizer.cpp:2952] Transitioning the state of 
container 6cb221ac-b026-4983-b310-d1ecc7936091 from RUNNING to DESTROYING
I0328 15:46:37.060938  8400 master.cpp:1295] Agent 
48f40b10-7e34-4fe9-8156-dc7b1068fc89-S0 at slave(417)@10.3.1.5:54169 
(winbldsrv-02.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net) disconnected
I0328 15:46:37.061939  8400 master.cpp:3283] Disconnecting agent 
48f40b10-7e34-4fe9-8156-dc7b1068fc89-S0 at slave(417)@10.3.1.5:54169