> On June 28, 2016, 2:55 a.m., Benjamin Mahler wrote:
> > src/slave/containerizer/mesos/isolators/gpu/nvidia.cpp, lines 442-446
> > <https://reviews.apache.org/r/48373/diff/4/?file=1428688#file1428688line442>
> >
> >     Unfortunately this won't accomplish your intent of providing context in 
> > the failure message because the .onFailed method does not provide 
> > composition, your returned Failure here will fall into the void.
> >     
> >     To accomplish this, we need a .then which can accept a Future<T> in 
> > addition to being able to take a T.

Yes, I see this now. The .then is just chained onto the .onFailed. In fact, the 
way I had it before, I think we never would have actually cleaned up anything 
(the .onFailed() was the only thing chained to the original Future).


> On June 28, 2016, 2:55 a.m., Benjamin Mahler wrote:
> > src/slave/containerizer/mesos/isolators/gpu/nvidia.cpp, line 375
> > <https://reviews.apache.org/r/48373/diff/4/?file=1428688#file1428688line375>
> >
> >     Hm.. seems we should do another lookup of the 'info' since we are 
> > returning to the isolator context and things may have occurred in the 
> > interim. Perhaps it's worth pulling out an `_update` continuation to force 
> > this (no captures possible).

I see you committed this with:
```
      .then(defer(PID<NvidiaGpuIsolatorProcess>(this),
                  &NvidiaGpuIsolatorProcess::_update,
                  containerId,
                  lambda::_1));
```

Why `PID<NvidiaGpuIsolatorProcess>(this)` instead of `self()`?

Also, I see why it makes sense to re-lookup `info`, but I'm unclear on why the 
continuation had to be pulled out into a separate function.  The logic feels 
very disjoint now when adding GPUs vs. taking them away.  Maybe it's just the 
name `_update` since it only actually gets triggered in the "adding" GPUs case.


> On June 28, 2016, 2:55 a.m., Benjamin Mahler wrote:
> > src/slave/containerizer/mesos/isolators/gpu/nvidia.cpp, line 447
> > <https://reviews.apache.org/r/48373/diff/4/?file=1428688#file1428688line447>
> >
> >     This needs to defer to self since the infos map is managed by the 
> > isolator actor. In general, capturing and using 'this' within a 
> > continuation without defering to self is dangerous.

Yes. This was an oversight. When I started thsi patch I was unclear on the need 
for `defer()` inside the `.then()`, and looks like I missed one when doing a 
pass over the code to add it.


> On June 28, 2016, 2:55 a.m., Benjamin Mahler wrote:
> > src/slave/containerizer/mesos/isolators/gpu/nvidia.cpp, line 261
> > <https://reviews.apache.org/r/48373/diff/4/?file=1428688#file1428688line261>
> >
> >     Hm.. wonder why we didn't just use foreach here?

I believe the original code needed a normal `for` look for some reason and the 
new code just inherited this form.  I agree -- a `foreach` is more appropriate 
here now though.


> On June 28, 2016, 2:55 a.m., Benjamin Mahler wrote:
> > src/slave/containerizer/mesos/isolators/gpu/nvidia.cpp, lines 448-451
> > <https://reviews.apache.org/r/48373/diff/4/?file=1428688#file1428688line448>
> >
> >     To guard against races, how about:
> >     
> >     ```
> >           CHECK(infos.contains(containerId));
> >           delete infos.at(containerId);
> >           infos.erase(containerId);
> >     
> >           return Nothing();
> >     ```

Makes sense.  It's the same reasoning as for `update()` with needing to 
re-lookup `info` (only this time we don't need to store a temporary to it since 
we are just deleting it).


- Kevin


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/48373/#review139714
-----------------------------------------------------------


On June 23, 2016, 4:34 p.m., Kevin Klues wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/48373/
> -----------------------------------------------------------
> 
> (Updated June 23, 2016, 4:34 p.m.)
> 
> 
> Review request for mesos and Benjamin Mahler.
> 
> 
> Bugs: MESOS-5559
>     https://issues.apache.org/jira/browse/MESOS-5559
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> All logic to do GPU allocation is now conducted asynchronously through
> the `NvidiaGpuAllocator` component.
> 
> 
> Diffs
> -----
> 
>   src/slave/containerizer/mesos/isolators/gpu/nvidia.hpp 
> 28ad0b3d1d8e7642a93e4ebb0fcc399e182be393 
>   src/slave/containerizer/mesos/isolators/gpu/nvidia.cpp 
> cfb23244d0e6f6e0d94685a35826b35f3297c6fe 
> 
> Diff: https://reviews.apache.org/r/48373/diff/
> 
> 
> Testing
> -------
> 
> GTEST_FILTER="" make -j check && sudo GTEST_FILTER="*NVIDIA*" src/mesos-tests
> 
> 
> Thanks,
> 
> Kevin Klues
> 
>

Reply via email to