> On June 28, 2016, 2:55 a.m., Benjamin Mahler wrote: > > src/slave/containerizer/mesos/isolators/gpu/nvidia.cpp, lines 442-446 > > <https://reviews.apache.org/r/48373/diff/4/?file=1428688#file1428688line442> > > > > Unfortunately this won't accomplish your intent of providing context in > > the failure message because the .onFailed method does not provide > > composition, your returned Failure here will fall into the void. > > > > To accomplish this, we need a .then which can accept a Future<T> in > > addition to being able to take a T.
Yes, I see this now. The .then is just chained onto the .onFailed. In fact, the way I had it before, I think we never would have actually cleaned up anything (the .onFailed() was the only thing chained to the original Future). > On June 28, 2016, 2:55 a.m., Benjamin Mahler wrote: > > src/slave/containerizer/mesos/isolators/gpu/nvidia.cpp, line 375 > > <https://reviews.apache.org/r/48373/diff/4/?file=1428688#file1428688line375> > > > > Hm.. seems we should do another lookup of the 'info' since we are > > returning to the isolator context and things may have occurred in the > > interim. Perhaps it's worth pulling out an `_update` continuation to force > > this (no captures possible). I see you committed this with: ``` .then(defer(PID<NvidiaGpuIsolatorProcess>(this), &NvidiaGpuIsolatorProcess::_update, containerId, lambda::_1)); ``` Why `PID<NvidiaGpuIsolatorProcess>(this)` instead of `self()`? Also, I see why it makes sense to re-lookup `info`, but I'm unclear on why the continuation had to be pulled out into a separate function. The logic feels very disjoint now when adding GPUs vs. taking them away. Maybe it's just the name `_update` since it only actually gets triggered in the "adding" GPUs case. > On June 28, 2016, 2:55 a.m., Benjamin Mahler wrote: > > src/slave/containerizer/mesos/isolators/gpu/nvidia.cpp, line 447 > > <https://reviews.apache.org/r/48373/diff/4/?file=1428688#file1428688line447> > > > > This needs to defer to self since the infos map is managed by the > > isolator actor. In general, capturing and using 'this' within a > > continuation without defering to self is dangerous. Yes. This was an oversight. When I started thsi patch I was unclear on the need for `defer()` inside the `.then()`, and looks like I missed one when doing a pass over the code to add it. > On June 28, 2016, 2:55 a.m., Benjamin Mahler wrote: > > src/slave/containerizer/mesos/isolators/gpu/nvidia.cpp, line 261 > > <https://reviews.apache.org/r/48373/diff/4/?file=1428688#file1428688line261> > > > > Hm.. wonder why we didn't just use foreach here? I believe the original code needed a normal `for` look for some reason and the new code just inherited this form. I agree -- a `foreach` is more appropriate here now though. > On June 28, 2016, 2:55 a.m., Benjamin Mahler wrote: > > src/slave/containerizer/mesos/isolators/gpu/nvidia.cpp, lines 448-451 > > <https://reviews.apache.org/r/48373/diff/4/?file=1428688#file1428688line448> > > > > To guard against races, how about: > > > > ``` > > CHECK(infos.contains(containerId)); > > delete infos.at(containerId); > > infos.erase(containerId); > > > > return Nothing(); > > ``` Makes sense. It's the same reasoning as for `update()` with needing to re-lookup `info` (only this time we don't need to store a temporary to it since we are just deleting it). - Kevin ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/48373/#review139714 ----------------------------------------------------------- On June 23, 2016, 4:34 p.m., Kevin Klues wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/48373/ > ----------------------------------------------------------- > > (Updated June 23, 2016, 4:34 p.m.) > > > Review request for mesos and Benjamin Mahler. > > > Bugs: MESOS-5559 > https://issues.apache.org/jira/browse/MESOS-5559 > > > Repository: mesos > > > Description > ------- > > All logic to do GPU allocation is now conducted asynchronously through > the `NvidiaGpuAllocator` component. > > > Diffs > ----- > > src/slave/containerizer/mesos/isolators/gpu/nvidia.hpp > 28ad0b3d1d8e7642a93e4ebb0fcc399e182be393 > src/slave/containerizer/mesos/isolators/gpu/nvidia.cpp > cfb23244d0e6f6e0d94685a35826b35f3297c6fe > > Diff: https://reviews.apache.org/r/48373/diff/ > > > Testing > ------- > > GTEST_FILTER="" make -j check && sudo GTEST_FILTER="*NVIDIA*" src/mesos-tests > > > Thanks, > > Kevin Klues > >