Re: Review Request 50841: Added GPU scheduling logic to docker containerizer process.

Yubo Li Fri, 26 Aug 2016 03:49:47 -0700


> On 八月 25, 2016, 12:32 a.m., Kevin Klues wrote:
> > src/slave/containerizer/docker.hpp, lines 505-507
> > <https://reviews.apache.org/r/50841/diff/6/?file=1480594#file1480594line505>
> >
> >     I would just call this variable `gpus`
> >     Also the comment should read:
> >     ```
> >     // The number of GPUs allocated to the container.
> >     ```


`gpus` is with type `std::list<Gpu>` so that it actually not the GPU numbers. 
Do you think we will use comments `The number of GPUs`?


> On 八月 25, 2016, 12:32 a.m., Kevin Klues wrote:
> > src/slave/containerizer/docker.cpp, line 653
> > <https://reviews.apache.org/r/50841/diff/6/?file=1480595#file1480595line653>
> >
> >     I would probably call this variable `count`. When I saw the name 
> > `requestedNvidiaGpu` I thought it was a specific GPU id being passed in, 
> > not a count.
> >     
> >     I would also call the function `allocateNvidiaGpus()` since you can 
> > allocate more than one with this function.

make sense


> On 八月 25, 2016, 12:32 a.m., Kevin Klues wrote:
> > src/slave/containerizer/docker.cpp, lines 666-668
> > <https://reviews.apache.org/r/50841/diff/6/?file=1480595#file1480595line666>
> >
> >     I would do this as the first check in this function.  If we don't have 
> > an allocator set, then we really shouldn't even be calling this function 
> > regardless of anything else that is going on.
> >     
> >     Also, the string should read:
> >     ```
> >     "The `allocateNvidiaGpu` function was called without an 
> > `NvidiaGpuAllocator` set".
> >     ```

If we put `nvidiaGpuAllocator` check in top of this function, we have to check 
`requested==0` outside the function otherwise `nvidiaGpuAllocator` check will 
be failed if GPU feature is not enabled. But I think move `requested==0` 
outside `nvidiaGpuAllocator` is reasonable if we use temp `Future()`. That is 
something logic like
```
Future<Nothing> allocateGpus = Nothing();
......
if (gpus.isSome()) {
  // Make sure that the `gpus` resource is not fractional.
  // We rely on scala resources only have 3 digits of precision.
  if (static_cast<long long>(gpus.getOrElse(0.0) * 1000.0) % 1000 != 0) {
    return Failure("The 'gpus' resource must be an unsigned integer");
  }

  const size_t requested = static_cast<size_t>(gpus.getOrElse(0.0));

  if (requested > 0) {
    allocateGpus = allocateNvidiaGpus(requested, containerId);
  }
}
```
Make sense?


> On 八月 25, 2016, 12:32 a.m., Kevin Klues wrote:
> > src/slave/containerizer/docker.cpp, lines 694-696
> > <https://reviews.apache.org/r/50841/diff/6/?file=1480595#file1480595line694>
> >
> >     Why do you need this level of indirection? Why not just pass 
> > `containers_[containerId]->gpuAllocated` directly to 
> > `nvidiaGpuAllocator->deallocate()`?

`containers_[containerId]->gpuAllocated` is a list but 
`nvidiaGpuAllocator->deallocate()` accepts set.


> On 八月 25, 2016, 12:32 a.m., Kevin Klues wrote:
> > src/slave/containerizer/docker.cpp, lines 698-710
> > <https://reviews.apache.org/r/50841/diff/6/?file=1480595#file1480595line698>
> >
> >     Why don't you just return from the `deallocate()` call with a 
> > `.then()`? I.e.
> >     
> >     ```
> >       return nvidiaGpuAllocator->deallocate(deallocated)
> >         .then(defer(self(), [=](const Nothing& nothing) {
> >           containers_[containerId]->gpuAllocated.clear();
> >           return Nothing();
> >         }));
> >     ```
> >     
> >     If any failures happen in the deallocation, they should get propagated 
> > through.
> 
> Guangya Liu wrote:
>     With the current logic, we can have more log messages here with different 
> conditions, but seems your proposal is more simple.

I prefer Kevin's proposal because it is more simple


> On 八月 25, 2016, 12:32 a.m., Kevin Klues wrote:
> > src/slave/containerizer/docker.cpp, line 1555
> > <https://reviews.apache.org/r/50841/diff/6/?file=1480595#file1480595line1555>
> >
> >     I wouldn't just blindly call this function here. It should be wrapped 
> > in some logic that makes sure it's OK to call it (i.e. checks to make sure 
> > that we have the nvidia->allocator component passed in).
> >     
> >     Again, you could have some logic above which saves a temporary `Future` 
> > that is set to `Nothing()` by default and is the result of calling 
> > `deallocateNvidiaGpu()` otherwise.
> 
> Guangya Liu wrote:
>     The `deallocateNvidiaGpu` already have some checking for 
> `nvidiaGpuAllocator`, is this enough?
> 
> Kevin Klues wrote:
>     I don't think we want to call any `nvidia*` functions anywhere in the 
> code without checks around them at the call site. For someone reading the 
> code top to bottom it makes it look like we are *always* allocating / 
> deallocating / etc. GPUs, which is cnfusing. The checks inside the function 
> are to make sure that we don't accidentally call them somewhere without the 
> proper check at the call site.

See what I revised, do you think it is better than before?


- Yubo


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50841/#review146726
-----------------------------------------------------------


On 八月 22, 2016, 10:11 a.m., Yubo Li wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50841/
> -----------------------------------------------------------
> 
> (Updated 八月 22, 2016, 10:11 a.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Guangya Liu, Kevin Klues, and 
> Rajat Phull.
> 
> 
> Bugs: MESOS-5795
>     https://issues.apache.org/jira/browse/MESOS-5795
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Added control logic to allocate/deallocate GPUs to GPU-related task
> when the task is started/terminated.
> 
> 
> Diffs
> -----
> 
>   src/slave/containerizer/docker.hpp f2a06065cf99fed934c2c1ffc47461ec8a97f50d 
>   src/slave/containerizer/docker.cpp 5c1ee8e467d1c54c60b67dc5275ef71e1bb90723 
> 
> Diff: https://reviews.apache.org/r/50841/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Yubo Li
> 
>

Re: Review Request 50841: Added GPU scheduling logic to docker containerizer process.

Reply via email to