> On 八月 25, 2016, 12:32 a.m., Kevin Klues wrote: > > src/slave/containerizer/docker.hpp, lines 505-507 > > <https://reviews.apache.org/r/50841/diff/6/?file=1480594#file1480594line505> > > > > I would just call this variable `gpus` > > Also the comment should read: > > ``` > > // The number of GPUs allocated to the container. > > ```
`gpus` is with type `std::list<Gpu>` so that it actually not the GPU numbers. Do you think we will use comments `The number of GPUs`? > On 八月 25, 2016, 12:32 a.m., Kevin Klues wrote: > > src/slave/containerizer/docker.cpp, line 653 > > <https://reviews.apache.org/r/50841/diff/6/?file=1480595#file1480595line653> > > > > I would probably call this variable `count`. When I saw the name > > `requestedNvidiaGpu` I thought it was a specific GPU id being passed in, > > not a count. > > > > I would also call the function `allocateNvidiaGpus()` since you can > > allocate more than one with this function. make sense > On 八月 25, 2016, 12:32 a.m., Kevin Klues wrote: > > src/slave/containerizer/docker.cpp, lines 666-668 > > <https://reviews.apache.org/r/50841/diff/6/?file=1480595#file1480595line666> > > > > I would do this as the first check in this function. If we don't have > > an allocator set, then we really shouldn't even be calling this function > > regardless of anything else that is going on. > > > > Also, the string should read: > > ``` > > "The `allocateNvidiaGpu` function was called without an > > `NvidiaGpuAllocator` set". > > ``` If we put `nvidiaGpuAllocator` check in top of this function, we have to check `requested==0` outside the function otherwise `nvidiaGpuAllocator` check will be failed if GPU feature is not enabled. But I think move `requested==0` outside `nvidiaGpuAllocator` is reasonable if we use temp `Future()`. That is something logic like ``` Future<Nothing> allocateGpus = Nothing(); ...... if (gpus.isSome()) { // Make sure that the `gpus` resource is not fractional. // We rely on scala resources only have 3 digits of precision. if (static_cast<long long>(gpus.getOrElse(0.0) * 1000.0) % 1000 != 0) { return Failure("The 'gpus' resource must be an unsigned integer"); } const size_t requested = static_cast<size_t>(gpus.getOrElse(0.0)); if (requested > 0) { allocateGpus = allocateNvidiaGpus(requested, containerId); } } ``` Make sense? > On 八月 25, 2016, 12:32 a.m., Kevin Klues wrote: > > src/slave/containerizer/docker.cpp, lines 694-696 > > <https://reviews.apache.org/r/50841/diff/6/?file=1480595#file1480595line694> > > > > Why do you need this level of indirection? Why not just pass > > `containers_[containerId]->gpuAllocated` directly to > > `nvidiaGpuAllocator->deallocate()`? `containers_[containerId]->gpuAllocated` is a list but `nvidiaGpuAllocator->deallocate()` accepts set. > On 八月 25, 2016, 12:32 a.m., Kevin Klues wrote: > > src/slave/containerizer/docker.cpp, lines 698-710 > > <https://reviews.apache.org/r/50841/diff/6/?file=1480595#file1480595line698> > > > > Why don't you just return from the `deallocate()` call with a > > `.then()`? I.e. > > > > ``` > > return nvidiaGpuAllocator->deallocate(deallocated) > > .then(defer(self(), [=](const Nothing& nothing) { > > containers_[containerId]->gpuAllocated.clear(); > > return Nothing(); > > })); > > ``` > > > > If any failures happen in the deallocation, they should get propagated > > through. > > Guangya Liu wrote: > With the current logic, we can have more log messages here with different > conditions, but seems your proposal is more simple. I prefer Kevin's proposal because it is more simple > On 八月 25, 2016, 12:32 a.m., Kevin Klues wrote: > > src/slave/containerizer/docker.cpp, line 1555 > > <https://reviews.apache.org/r/50841/diff/6/?file=1480595#file1480595line1555> > > > > I wouldn't just blindly call this function here. It should be wrapped > > in some logic that makes sure it's OK to call it (i.e. checks to make sure > > that we have the nvidia->allocator component passed in). > > > > Again, you could have some logic above which saves a temporary `Future` > > that is set to `Nothing()` by default and is the result of calling > > `deallocateNvidiaGpu()` otherwise. > > Guangya Liu wrote: > The `deallocateNvidiaGpu` already have some checking for > `nvidiaGpuAllocator`, is this enough? > > Kevin Klues wrote: > I don't think we want to call any `nvidia*` functions anywhere in the > code without checks around them at the call site. For someone reading the > code top to bottom it makes it look like we are *always* allocating / > deallocating / etc. GPUs, which is cnfusing. The checks inside the function > are to make sure that we don't accidentally call them somewhere without the > proper check at the call site. See what I revised, do you think it is better than before? - Yubo ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/50841/#review146726 ----------------------------------------------------------- On 八月 22, 2016, 10:11 a.m., Yubo Li wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/50841/ > ----------------------------------------------------------- > > (Updated 八月 22, 2016, 10:11 a.m.) > > > Review request for mesos, Benjamin Mahler, Guangya Liu, Kevin Klues, and > Rajat Phull. > > > Bugs: MESOS-5795 > https://issues.apache.org/jira/browse/MESOS-5795 > > > Repository: mesos > > > Description > ------- > > Added control logic to allocate/deallocate GPUs to GPU-related task > when the task is started/terminated. > > > Diffs > ----- > > src/slave/containerizer/docker.hpp f2a06065cf99fed934c2c1ffc47461ec8a97f50d > src/slave/containerizer/docker.cpp 5c1ee8e467d1c54c60b67dc5275ef71e1bb90723 > > Diff: https://reviews.apache.org/r/50841/diff/ > > > Testing > ------- > > make check > > > Thanks, > > Yubo Li > >