Re: Review Request 48373: Integrated the `NvidiaGpuAllocator` into the `NvidiaGpuIsolator`.

2016-06-28 Thread Kevin Klues


> On June 28, 2016, 2:55 a.m., Benjamin Mahler wrote:
> > src/slave/containerizer/mesos/isolators/gpu/nvidia.cpp, lines 442-446
> > 
> >
> > Unfortunately this won't accomplish your intent of providing context in 
> > the failure message because the .onFailed method does not provide 
> > composition, your returned Failure here will fall into the void.
> > 
> > To accomplish this, we need a .then which can accept a Future in 
> > addition to being able to take a T.

Yes, I see this now. The .then is just chained onto the .onFailed. In fact, the 
way I had it before, I think we never would have actually cleaned up anything 
(the .onFailed() was the only thing chained to the original Future).


> On June 28, 2016, 2:55 a.m., Benjamin Mahler wrote:
> > src/slave/containerizer/mesos/isolators/gpu/nvidia.cpp, line 375
> > 
> >
> > Hm.. seems we should do another lookup of the 'info' since we are 
> > returning to the isolator context and things may have occurred in the 
> > interim. Perhaps it's worth pulling out an `_update` continuation to force 
> > this (no captures possible).

I see you committed this with:
```
  .then(defer(PID(this),
  ::_update,
  containerId,
  lambda::_1));
```

Why `PID(this)` instead of `self()`?

Also, I see why it makes sense to re-lookup `info`, but I'm unclear on why the 
continuation had to be pulled out into a separate function.  The logic feels 
very disjoint now when adding GPUs vs. taking them away.  Maybe it's just the 
name `_update` since it only actually gets triggered in the "adding" GPUs case.


> On June 28, 2016, 2:55 a.m., Benjamin Mahler wrote:
> > src/slave/containerizer/mesos/isolators/gpu/nvidia.cpp, line 447
> > 
> >
> > This needs to defer to self since the infos map is managed by the 
> > isolator actor. In general, capturing and using 'this' within a 
> > continuation without defering to self is dangerous.

Yes. This was an oversight. When I started thsi patch I was unclear on the need 
for `defer()` inside the `.then()`, and looks like I missed one when doing a 
pass over the code to add it.


> On June 28, 2016, 2:55 a.m., Benjamin Mahler wrote:
> > src/slave/containerizer/mesos/isolators/gpu/nvidia.cpp, line 261
> > 
> >
> > Hm.. wonder why we didn't just use foreach here?

I believe the original code needed a normal `for` look for some reason and the 
new code just inherited this form.  I agree -- a `foreach` is more appropriate 
here now though.


> On June 28, 2016, 2:55 a.m., Benjamin Mahler wrote:
> > src/slave/containerizer/mesos/isolators/gpu/nvidia.cpp, lines 448-451
> > 
> >
> > To guard against races, how about:
> > 
> > ```
> >   CHECK(infos.contains(containerId));
> >   delete infos.at(containerId);
> >   infos.erase(containerId);
> > 
> >   return Nothing();
> > ```

Makes sense.  It's the same reasoning as for `update()` with needing to 
re-lookup `info` (only this time we don't need to store a temporary to it since 
we are just deleting it).


- Kevin


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/48373/#review139714
---


On June 23, 2016, 4:34 p.m., Kevin Klues wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/48373/
> ---
> 
> (Updated June 23, 2016, 4:34 p.m.)
> 
> 
> Review request for mesos and Benjamin Mahler.
> 
> 
> Bugs: MESOS-5559
> https://issues.apache.org/jira/browse/MESOS-5559
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> All logic to do GPU allocation is now conducted asynchronously through
> the `NvidiaGpuAllocator` component.
> 
> 
> Diffs
> -
> 
>   src/slave/containerizer/mesos/isolators/gpu/nvidia.hpp 
> 28ad0b3d1d8e7642a93e4ebb0fcc399e182be393 
>   src/slave/containerizer/mesos/isolators/gpu/nvidia.cpp 
> cfb23244d0e6f6e0d94685a35826b35f3297c6fe 
> 
> Diff: https://reviews.apache.org/r/48373/diff/
> 
> 
> Testing
> ---
> 
> GTEST_FILTER="" make -j check && sudo GTEST_FILTER="*NVIDIA*" src/mesos-tests
> 
> 
> Thanks,
> 
> Kevin Klues
> 
>



Re: Review Request 48373: Integrated the `NvidiaGpuAllocator` into the `NvidiaGpuIsolator`.

2016-06-27 Thread Benjamin Mahler

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/48373/#review139714
---


Fix it, then Ship it!





src/slave/containerizer/mesos/isolators/gpu/nvidia.cpp (lines 113 - 119)


Don't forget to include error context:

```
  // Create the allocator.
  //
  // TODO(klueska): We need to share the allocator
  // with the docker containerizer.
  Try create = NvidiaGpuAllocator::create(flags);
  if (create.isError()) {
return Error("Failed to NvidiaGpuAllocator::create: " + 
_allocator.error());
  }
```



src/slave/containerizer/mesos/isolators/gpu/nvidia.cpp (line 218)


Hm.. wonder why we didn't just use foreach here?



src/slave/containerizer/mesos/isolators/gpu/nvidia.cpp (line 325)


Hm.. seems we should do another lookup of the 'info' since we are returning 
to the isolator context and things may have occurred in the interim. Perhaps 
it's worth pulling out an `_update` continuation to force this (no captures 
possible).



src/slave/containerizer/mesos/isolators/gpu/nvidia.cpp (lines 392 - 396)


Unfortunately this won't accomplish your intent of providing context in the 
failure message because the .onFailed method does not provide composition, your 
returned Failure here will fall into the void.

To accomplish this, we need a .then which can accept a Future in 
addition to being able to take a T.



src/slave/containerizer/mesos/isolators/gpu/nvidia.cpp (line 397)


This needs to defer to self since the infos map is managed by the isolator 
actor. In general, capturing and using 'this' within a continuation without 
defering to self is dangerous.



src/slave/containerizer/mesos/isolators/gpu/nvidia.cpp (lines 398 - 401)


To guard against races, how about:

```
  CHECK(infos.contains(containerId));
  delete infos.at(containerId);
  infos.erase(containerId);

  return Nothing();
```


- Benjamin Mahler


On June 23, 2016, 4:34 p.m., Kevin Klues wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/48373/
> ---
> 
> (Updated June 23, 2016, 4:34 p.m.)
> 
> 
> Review request for mesos and Benjamin Mahler.
> 
> 
> Bugs: MESOS-5559
> https://issues.apache.org/jira/browse/MESOS-5559
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> All logic to do GPU allocation is now conducted asynchronously through
> the `NvidiaGpuAllocator` component.
> 
> 
> Diffs
> -
> 
>   src/slave/containerizer/mesos/isolators/gpu/nvidia.hpp 
> 28ad0b3d1d8e7642a93e4ebb0fcc399e182be393 
>   src/slave/containerizer/mesos/isolators/gpu/nvidia.cpp 
> cfb23244d0e6f6e0d94685a35826b35f3297c6fe 
> 
> Diff: https://reviews.apache.org/r/48373/diff/
> 
> 
> Testing
> ---
> 
> GTEST_FILTER="" make -j check && sudo GTEST_FILTER="*NVIDIA*" src/mesos-tests
> 
> 
> Thanks,
> 
> Kevin Klues
> 
>



Re: Review Request 48373: Integrated the `NvidiaGpuAllocator` into the `NvidiaGpuIsolator`.

2016-06-23 Thread Kevin Klues

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/48373/
---

(Updated June 23, 2016, 4:34 p.m.)


Review request for mesos and Benjamin Mahler.


Changes
---

Rebased after some changes pushed to master already.


Bugs: MESOS-5559
https://issues.apache.org/jira/browse/MESOS-5559


Repository: mesos


Description
---

All logic to do GPU allocation is now conducted asynchronously through
the `NvidiaGpuAllocator` component.


Diffs (updated)
-

  src/slave/containerizer/mesos/isolators/gpu/nvidia.hpp 
28ad0b3d1d8e7642a93e4ebb0fcc399e182be393 
  src/slave/containerizer/mesos/isolators/gpu/nvidia.cpp 
cfb23244d0e6f6e0d94685a35826b35f3297c6fe 

Diff: https://reviews.apache.org/r/48373/diff/


Testing
---

GTEST_FILTER="" make -j check && sudo GTEST_FILTER="*NVIDIA*" src/mesos-tests


Thanks,

Kevin Klues



Re: Review Request 48373: Integrated the `NvidiaGpuAllocator` into the `NvidiaGpuIsolator`.

2016-06-11 Thread Kevin Klues

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/48373/
---

(Updated June 11, 2016, 5:37 p.m.)


Review request for mesos and Benjamin Mahler.


Changes
---

Updated to properly use `defer()` on lambdas in `.then()` statements from after 
operations through the `NvidiaGPUAllocator`


Bugs: MESOS-5559
https://issues.apache.org/jira/browse/MESOS-5559


Repository: mesos


Description
---

All logic to do GPU allocation is now conducted asynchronously through
the `NvidiaGpuAllocator` component.


Diffs (updated)
-

  src/slave/containerizer/mesos/isolators/gpu/nvidia.hpp 
181a2aad97da9ee0f6ffa42cdba9c93dc0077ff7 
  src/slave/containerizer/mesos/isolators/gpu/nvidia.cpp 
d7557a0c338e8c0e51461b2326600c03f89c2e8b 

Diff: https://reviews.apache.org/r/48373/diff/


Testing
---

GTEST_FILTER="" make -j check && sudo GTEST_FILTER="*NVIDIA*" src/mesos-tests


Thanks,

Kevin Klues



Re: Review Request 48373: Integrated the `NvidiaGpuAllocator` into the `NvidiaGpuIsolator`.

2016-06-10 Thread Kevin Klues

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/48373/
---

(Updated June 11, 2016, 3:05 a.m.)


Review request for mesos and Benjamin Mahler.


Changes
---

Rebased for https://reviews.apache.org/r/48578


Bugs: MESOS-5559
https://issues.apache.org/jira/browse/MESOS-5559


Repository: mesos


Description
---

All logic to do GPU allocation is now conducted asynchronously through
the `NvidiaGpuAllocator` component.


Diffs (updated)
-

  src/slave/containerizer/mesos/isolators/gpu/nvidia.hpp 
181a2aad97da9ee0f6ffa42cdba9c93dc0077ff7 
  src/slave/containerizer/mesos/isolators/gpu/nvidia.cpp 
d7557a0c338e8c0e51461b2326600c03f89c2e8b 

Diff: https://reviews.apache.org/r/48373/diff/


Testing
---

GTEST_FILTER="" make -j check && sudo GTEST_FILTER="*NVIDIA*" src/mesos-tests


Thanks,

Kevin Klues



Review Request 48373: Integrated the `NvidiaGpuAllocator` into the `NvidiaGpuIsolator`.

2016-06-07 Thread Kevin Klues

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/48373/
---

Review request for mesos and Benjamin Mahler.


Bugs: MESOS-5559
https://issues.apache.org/jira/browse/MESOS-5559


Repository: mesos


Description
---

All logic to do GPU allocation is now conducted asynchronously through
the `NvidiaGpuAllocator` component.


Diffs
-

  src/slave/containerizer/mesos/isolators/gpu/nvidia.hpp 
f290e3c1d1114698039a505e972793dc325c7100 
  src/slave/containerizer/mesos/isolators/gpu/nvidia.cpp 
7849e518af448d9557ca4d6de4ccaba8bc572992 

Diff: https://reviews.apache.org/r/48373/diff/


Testing
---

GTEST_FILTER="" make -j check && sudo GTEST_FILTER="*NVIDIA*" src/mesos-tests


Thanks,

Kevin Klues