> On 八月 28, 2016, 12:57 a.m., Kevin Klues wrote:
> > src/slave/containerizer/docker.hpp, line 506
> > <https://reviews.apache.org/r/50841/diff/7/?file=1486671#file1486671line506>
> >
> >     Shoud we use `set` here like we do in the mesos containerizer?

Yes, changed it to `set`.


> On 八月 28, 2016, 12:57 a.m., Kevin Klues wrote:
> > src/slave/containerizer/docker.cpp, lines 1341-1346
> > <https://reviews.apache.org/r/50841/diff/7/?file=1486672#file1486672line1341>
> >
> >     You can just use `if (gpus.get() > 0)` here.

the temp variable `requested` is removed.


> On 八月 28, 2016, 12:57 a.m., Kevin Klues wrote:
> > src/slave/containerizer/docker.cpp, lines 1541-1552
> > <https://reviews.apache.org/r/50841/diff/7/?file=1486672#file1486672line1541>
> >
> >     You can't rely on `containers_[containerId]` to be valid at this point 
> > (hence the check for it below).
> >     
> >     Regardless, I just at the head of master to see where the most 
> > appropriate place to do this deallocation would be, and this function looks 
> > *completely* different from what I see here.
> >     
> >     When was the last time you rebased?

The code is rebased at Aug. 15, let's keep this issue. I'll come back to this 
after I fix all other bugs and rebase by code.
Code rebased, where is the most appropriate place to do so?
BTW, `deallocateNvidiaGpus()` will check if `containers_[containerId]` is valid 
now, so we don't need to worry that `containers_[containerId]` is invalid.


> On 八月 28, 2016, 12:57 a.m., Kevin Klues wrote:
> > src/slave/containerizer/docker.cpp, lines 2037-2041
> > <https://reviews.apache.org/r/50841/diff/7/?file=1486672#file1486672line2037>
> >
> >     It feels too early to be doing the `deallocate()` here. I think we 
> > should only do it once the container is actually killed (i.e. in 
> > `___destroy()`).
> >     
> >     If we do it too early, the GPUs could still be being used by the 
> > container when they get allocated out to someone else. This would not go 
> > very well.
> >     
> >     There is the case, however, in `__destroy()` where we aren't able to 
> > kill the docker container at all! It's better to leak the GPUs in this case 
> > than to deallocate them and run the risk of giving them to a new container. 
> > We should mention this in the error string below if any GPUs are actually 
> > leaked:
> >     
> >     ```
> >         container->termination.fail(
> >             "Failed to kill the Docker container: " +
> >             (kill.isFailed() ? kill.failure() : "discarded future"));
> >     ```

I see. Also will come back to deal with deallocate issues after code rebased.
I moved this to `___destroy()`.
Also, added the error string for leaked GPUs:
```
    container->termination.fail(
        "Failed to kill the Docker container: " +
        (kill.isFailed() ? kill.failure() : "discarded future") +
        (container->gpus.empty() ?
            "" : ", " + stringify(container->gpus.size()) + "GPUs leaked"));
```


- Yubo


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50841/#review147089
-----------------------------------------------------------


On 九月 20, 2016, 9:22 a.m., Yubo Li wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50841/
> -----------------------------------------------------------
> 
> (Updated 九月 20, 2016, 9:22 a.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Guangya Liu, Kevin Klues, and 
> Rajat Phull.
> 
> 
> Bugs: MESOS-5795
>     https://issues.apache.org/jira/browse/MESOS-5795
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Added control logic to allocate/deallocate GPUs to GPU-related task
> when the task is started/terminated.
> 
> 
> Diffs
> -----
> 
>   src/slave/containerizer/docker.hpp f2a06065cf99fed934c2c1ffc47461ec8a97f50d 
>   src/slave/containerizer/docker.cpp 5c1ee8e467d1c54c60b67dc5275ef71e1bb90723 
> 
> Diff: https://reviews.apache.org/r/50841/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Yubo Li
> 
>

Reply via email to