Re: Review Request 50841: Added GPU scheduling logic to docker containerizer process.

2016-11-02 Thread Benjamin Mahler

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50841/#review154668
---


Fix it, then Ship it!




It would be great to describe the limitations of this patch (no recovery logic, 
no update support, it doesn't make the devices available to the container, etc) 
in the commit message so that others are aware of them.

I'm ok with shipping this patch even though this is a partial implementation 
since it makes the behavior no worse than what we have today (the docker 
containerizer just ignores GPUs entirely, whereas now it looks at them 
partially but still doesn't make them available to the container).


src/slave/containerizer/docker.hpp (lines 261 - 274)


Indentation here is off, should be 4 spaces like the other wrapping above 
and below.



src/slave/containerizer/docker.cpp (lines 695 - 698)


This return won't do anything since onFailed ignores the return value of 
the callback.



src/slave/containerizer/docker.cpp (line 701)


Shouldn't this append rather than overwrite?



src/slave/containerizer/docker.cpp (line 727)


Can we just erase rather than clear the entire set?



src/slave/containerizer/docker.cpp (lines 1374 - 1377)


Should we just use the container level resources here, rather than just the 
task level resources?



src/slave/containerizer/docker.cpp (line 1387)


Maybe move this condition up alongside the isSome check?



src/slave/containerizer/docker.cpp (lines 1998 - 2017)


It seems brittle that in all of these code paths, we assume that there are 
no gpus allocated (because we don't try to deallocate). This needs a lot of 
"non-local reasoning" to figure out that this holds because in these conditions 
we have not called launchExecutorProcess.



src/slave/containerizer/docker.cpp (lines 2163 - 2167)


We should add a TODO here that we've leaked the GPUs and they will never be 
de-allocated. Ideally, we could figure out how to de-allocate them after the 
'docker remove'?


- Benjamin Mahler


On Nov. 2, 2016, 7:26 a.m., Yubo Li wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50841/
> ---
> 
> (Updated Nov. 2, 2016, 7:26 a.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Guangya Liu, Kevin Klues, and 
> Rajat Phull.
> 
> 
> Bugs: MESOS-5795
> https://issues.apache.org/jira/browse/MESOS-5795
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Added control logic to allocate/deallocate GPUs to GPU-related task
> when the task is started/terminated.
> 
> 
> Diffs
> -
> 
>   src/slave/containerizer/docker.hpp 8da63101f951892e673612134770fc155d86112d 
>   src/slave/containerizer/docker.cpp f720320e112687e8ae972f04159b3b4cb7a58476 
> 
> Diff: https://reviews.apache.org/r/50841/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Yubo Li
> 
>



Re: Review Request 50841: Added GPU scheduling logic to docker containerizer process.

2016-11-02 Thread Kevin Klues


> On Oct. 26, 2016, 3:22 a.m., Kevin Klues wrote:
> > src/slave/containerizer/docker.cpp, line 2161
> > 
> >
> > Maybe consider using a ":" here instead of a comma?
> 
> Guangya Liu wrote:
> After a second thought, I'm not sure if using a ":" is better than a 
> comma here.
> 
> With ":", one example message would be
> ```
> Failed to kill the Docker container: discarded future: 1 GPUs leaked
> ```
> 
> With comma, one example message would be:
> ```
> Failed to kill the Docker container: discarded future, 1 GPUs leaked
> ```
> 
> Seems comma is better?
> 
> Yubo Li wrote:
> Yes, I think comma is better :)

I agree, a comma looks better here.


- Kevin


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50841/#review153837
---


On Nov. 2, 2016, 7:26 a.m., Yubo Li wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50841/
> ---
> 
> (Updated Nov. 2, 2016, 7:26 a.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Guangya Liu, Kevin Klues, and 
> Rajat Phull.
> 
> 
> Bugs: MESOS-5795
> https://issues.apache.org/jira/browse/MESOS-5795
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Added control logic to allocate/deallocate GPUs to GPU-related task
> when the task is started/terminated.
> 
> 
> Diffs
> -
> 
>   src/slave/containerizer/docker.hpp 8da63101f951892e673612134770fc155d86112d 
>   src/slave/containerizer/docker.cpp f720320e112687e8ae972f04159b3b4cb7a58476 
> 
> Diff: https://reviews.apache.org/r/50841/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Yubo Li
> 
>



Re: Review Request 50841: Added GPU scheduling logic to docker containerizer process.

2016-11-02 Thread Yubo Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50841/
---

(Updated Nov. 2, 2016, 7:26 a.m.)


Review request for mesos, Benjamin Mahler, Guangya Liu, Kevin Klues, and Rajat 
Phull.


Bugs: MESOS-5795
https://issues.apache.org/jira/browse/MESOS-5795


Repository: mesos


Description
---

Added control logic to allocate/deallocate GPUs to GPU-related task
when the task is started/terminated.


Diffs (updated)
-

  src/slave/containerizer/docker.hpp 8da63101f951892e673612134770fc155d86112d 
  src/slave/containerizer/docker.cpp f720320e112687e8ae972f04159b3b4cb7a58476 

Diff: https://reviews.apache.org/r/50841/diff/


Testing
---

make check


Thanks,

Yubo Li



Re: Review Request 50841: Added GPU scheduling logic to docker containerizer process.

2016-11-01 Thread Guangya Liu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50841/#review154404
---


Fix it, then Ship it!




Ship It!


src/slave/containerizer/docker.hpp (lines 20 - 22)


```
#include 
#include 
#include 
#include 
```


- Guangya Liu


On 十一月 1, 2016, 6:10 a.m., Yubo Li wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50841/
> ---
> 
> (Updated 十一月 1, 2016, 6:10 a.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Guangya Liu, Kevin Klues, and 
> Rajat Phull.
> 
> 
> Bugs: MESOS-5795
> https://issues.apache.org/jira/browse/MESOS-5795
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Added control logic to allocate/deallocate GPUs to GPU-related task
> when the task is started/terminated.
> 
> 
> Diffs
> -
> 
>   src/slave/containerizer/docker.hpp 8da63101f951892e673612134770fc155d86112d 
>   src/slave/containerizer/docker.cpp f720320e112687e8ae972f04159b3b4cb7a58476 
> 
> Diff: https://reviews.apache.org/r/50841/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Yubo Li
> 
>



Re: Review Request 50841: Added GPU scheduling logic to docker containerizer process.

2016-10-31 Thread Yubo Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50841/
---

(Updated Nov. 1, 2016, 6:10 a.m.)


Review request for mesos, Benjamin Mahler, Guangya Liu, Kevin Klues, and Rajat 
Phull.


Bugs: MESOS-5795
https://issues.apache.org/jira/browse/MESOS-5795


Repository: mesos


Description
---

Added control logic to allocate/deallocate GPUs to GPU-related task
when the task is started/terminated.


Diffs (updated)
-

  src/slave/containerizer/docker.hpp 8da63101f951892e673612134770fc155d86112d 
  src/slave/containerizer/docker.cpp f720320e112687e8ae972f04159b3b4cb7a58476 

Diff: https://reviews.apache.org/r/50841/diff/


Testing
---

make check


Thanks,

Yubo Li



Re: Review Request 50841: Added GPU scheduling logic to docker containerizer process.

2016-10-31 Thread Yubo Li


> On 十月 26, 2016, 3:22 a.m., Kevin Klues wrote:
> > src/slave/containerizer/docker.cpp, line 2161
> > 
> >
> > Maybe consider using a ":" here instead of a comma?
> 
> Guangya Liu wrote:
> After a second thought, I'm not sure if using a ":" is better than a 
> comma here.
> 
> With ":", one example message would be
> ```
> Failed to kill the Docker container: discarded future: 1 GPUs leaked
> ```
> 
> With comma, one example message would be:
> ```
> Failed to kill the Docker container: discarded future, 1 GPUs leaked
> ```
> 
> Seems comma is better?

Yes, I think comma is better :)


- Yubo


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50841/#review153837
---


On 十月 31, 2016, 7:09 a.m., Yubo Li wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50841/
> ---
> 
> (Updated 十月 31, 2016, 7:09 a.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Guangya Liu, Kevin Klues, and 
> Rajat Phull.
> 
> 
> Bugs: MESOS-5795
> https://issues.apache.org/jira/browse/MESOS-5795
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Added control logic to allocate/deallocate GPUs to GPU-related task
> when the task is started/terminated.
> 
> 
> Diffs
> -
> 
>   src/slave/containerizer/docker.hpp 8da63101f951892e673612134770fc155d86112d 
>   src/slave/containerizer/docker.cpp f720320e112687e8ae972f04159b3b4cb7a58476 
> 
> Diff: https://reviews.apache.org/r/50841/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Yubo Li
> 
>



Re: Review Request 50841: Added GPU scheduling logic to docker containerizer process.

2016-10-31 Thread Guangya Liu


> On 十月 26, 2016, 3:22 a.m., Kevin Klues wrote:
> > src/slave/containerizer/docker.cpp, line 2161
> > 
> >
> > Maybe consider using a ":" here instead of a comma?

After a second thought, I'm not sure if using a ":" is better than a comma here.

With ":", one example message would be
```
Failed to kill the Docker container: discarded future: 1 GPUs leaked
```

With comma, one example message would be:
```
Failed to kill the Docker container: discarded future, 1 GPUs leaked
```

Seems comma is better?


- Guangya


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50841/#review153837
---


On 十月 31, 2016, 7:09 a.m., Yubo Li wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50841/
> ---
> 
> (Updated 十月 31, 2016, 7:09 a.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Guangya Liu, Kevin Klues, and 
> Rajat Phull.
> 
> 
> Bugs: MESOS-5795
> https://issues.apache.org/jira/browse/MESOS-5795
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Added control logic to allocate/deallocate GPUs to GPU-related task
> when the task is started/terminated.
> 
> 
> Diffs
> -
> 
>   src/slave/containerizer/docker.hpp 8da63101f951892e673612134770fc155d86112d 
>   src/slave/containerizer/docker.cpp f720320e112687e8ae972f04159b3b4cb7a58476 
> 
> Diff: https://reviews.apache.org/r/50841/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Yubo Li
> 
>



Re: Review Request 50841: Added GPU scheduling logic to docker containerizer process.

2016-10-31 Thread Yubo Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50841/
---

(Updated Oct. 31, 2016, 7:09 a.m.)


Review request for mesos, Benjamin Mahler, Guangya Liu, Kevin Klues, and Rajat 
Phull.


Bugs: MESOS-5795
https://issues.apache.org/jira/browse/MESOS-5795


Repository: mesos


Description
---

Added control logic to allocate/deallocate GPUs to GPU-related task
when the task is started/terminated.


Diffs (updated)
-

  src/slave/containerizer/docker.hpp 8da63101f951892e673612134770fc155d86112d 
  src/slave/containerizer/docker.cpp f720320e112687e8ae972f04159b3b4cb7a58476 

Diff: https://reviews.apache.org/r/50841/diff/


Testing
---

make check


Thanks,

Yubo Li



Re: Review Request 50841: Added GPU scheduling logic to docker containerizer process.

2016-10-25 Thread Kevin Klues

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50841/#review153837
---




src/slave/containerizer/docker.cpp (line 2155)


Maybe consider using a ":" here instead of a comma?


- Kevin Klues


On Oct. 24, 2016, 4:59 a.m., Yubo Li wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50841/
> ---
> 
> (Updated Oct. 24, 2016, 4:59 a.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Guangya Liu, Kevin Klues, and 
> Rajat Phull.
> 
> 
> Bugs: MESOS-5795
> https://issues.apache.org/jira/browse/MESOS-5795
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Added control logic to allocate/deallocate GPUs to GPU-related task
> when the task is started/terminated.
> 
> 
> Diffs
> -
> 
>   src/slave/containerizer/docker.hpp 8da63101f951892e673612134770fc155d86112d 
>   src/slave/containerizer/docker.cpp 8ec4c0a25335fb1b36cb2ab82577f6d3e2f7f008 
> 
> Diff: https://reviews.apache.org/r/50841/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Yubo Li
> 
>



Re: Review Request 50841: Added GPU scheduling logic to docker containerizer process.

2016-10-23 Thread Yubo Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50841/
---

(Updated Oct. 24, 2016, 4:59 a.m.)


Review request for mesos, Benjamin Mahler, Guangya Liu, Kevin Klues, and Rajat 
Phull.


Bugs: MESOS-5795
https://issues.apache.org/jira/browse/MESOS-5795


Repository: mesos


Description
---

Added control logic to allocate/deallocate GPUs to GPU-related task
when the task is started/terminated.


Diffs (updated)
-

  src/slave/containerizer/docker.hpp 8da63101f951892e673612134770fc155d86112d 
  src/slave/containerizer/docker.cpp 8ec4c0a25335fb1b36cb2ab82577f6d3e2f7f008 

Diff: https://reviews.apache.org/r/50841/diff/


Testing
---

make check


Thanks,

Yubo Li



Re: Review Request 50841: Added GPU scheduling logic to docker containerizer process.

2016-10-21 Thread Guangya Liu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50841/#review153632
---




src/slave/containerizer/docker.hpp (line 259)


My previous comments does not contain `on Linux`, this is not needed as 
this helper is already protected by `#ifdef __linux__`

```
Allocate GPU resources for a specified container.
```



src/slave/containerizer/docker.hpp (line 268)


```
// Deallocate GPU resources for a specified container.
```


- Guangya Liu


On 十月 21, 2016, 9:10 a.m., Yubo Li wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50841/
> ---
> 
> (Updated 十月 21, 2016, 9:10 a.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Guangya Liu, Kevin Klues, and 
> Rajat Phull.
> 
> 
> Bugs: MESOS-5795
> https://issues.apache.org/jira/browse/MESOS-5795
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Added control logic to allocate/deallocate GPUs to GPU-related task
> when the task is started/terminated.
> 
> 
> Diffs
> -
> 
>   src/slave/containerizer/docker.hpp 8da63101f951892e673612134770fc155d86112d 
>   src/slave/containerizer/docker.cpp 8ec4c0a25335fb1b36cb2ab82577f6d3e2f7f008 
> 
> Diff: https://reviews.apache.org/r/50841/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Yubo Li
> 
>



Re: Review Request 50841: Added GPU scheduling logic to docker containerizer process.

2016-10-21 Thread Yubo Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50841/
---

(Updated 十月 21, 2016, 9:10 a.m.)


Review request for mesos, Benjamin Mahler, Guangya Liu, Kevin Klues, and Rajat 
Phull.


Bugs: MESOS-5795
https://issues.apache.org/jira/browse/MESOS-5795


Repository: mesos


Description
---

Added control logic to allocate/deallocate GPUs to GPU-related task
when the task is started/terminated.


Diffs (updated)
-

  src/slave/containerizer/docker.hpp 8da63101f951892e673612134770fc155d86112d 
  src/slave/containerizer/docker.cpp 8ec4c0a25335fb1b36cb2ab82577f6d3e2f7f008 

Diff: https://reviews.apache.org/r/50841/diff/


Testing
---

make check


Thanks,

Yubo Li



Re: Review Request 50841: Added GPU scheduling logic to docker containerizer process.

2016-10-21 Thread Guangya Liu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50841/#review153506
---


Fix it, then Ship it!




Ship It!


src/slave/containerizer/docker.hpp (lines 259 - 260)


I think that you can simplify the comments directly as

```
// Allocate GPU resources for a specified container.
```

As you aready have those helper functions protected by `#ifdef __linux__`.



src/slave/containerizer/docker.hpp (lines 269 - 270)


```
// Deallocate GPU resources for a specified container.
```



src/slave/containerizer/docker.cpp (line 702)


s/NvidiaGpuAllocator/NvidiaComponents



src/slave/containerizer/docker.cpp (line 1370)


kill this


- Guangya Liu


On 十月 21, 2016, 6:28 a.m., Yubo Li wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50841/
> ---
> 
> (Updated 十月 21, 2016, 6:28 a.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Guangya Liu, Kevin Klues, and 
> Rajat Phull.
> 
> 
> Bugs: MESOS-5795
> https://issues.apache.org/jira/browse/MESOS-5795
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Added control logic to allocate/deallocate GPUs to GPU-related task
> when the task is started/terminated.
> 
> 
> Diffs
> -
> 
>   src/slave/containerizer/docker.hpp 8da63101f951892e673612134770fc155d86112d 
>   src/slave/containerizer/docker.cpp 8ec4c0a25335fb1b36cb2ab82577f6d3e2f7f008 
> 
> Diff: https://reviews.apache.org/r/50841/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Yubo Li
> 
>



Re: Review Request 50841: Added GPU scheduling logic to docker containerizer process.

2016-10-20 Thread Yubo Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50841/
---

(Updated 十月 21, 2016, 6:28 a.m.)


Review request for mesos, Benjamin Mahler, Guangya Liu, Kevin Klues, and Rajat 
Phull.


Bugs: MESOS-5795
https://issues.apache.org/jira/browse/MESOS-5795


Repository: mesos


Description
---

Added control logic to allocate/deallocate GPUs to GPU-related task
when the task is started/terminated.


Diffs (updated)
-

  src/slave/containerizer/docker.hpp 8da63101f951892e673612134770fc155d86112d 
  src/slave/containerizer/docker.cpp 8ec4c0a25335fb1b36cb2ab82577f6d3e2f7f008 

Diff: https://reviews.apache.org/r/50841/diff/


Testing
---

make check


Thanks,

Yubo Li



Re: Review Request 50841: Added GPU scheduling logic to docker containerizer process.

2016-10-20 Thread Yubo Li


> On 十月 20, 2016, 1:32 p.m., Guangya Liu wrote:
> > src/slave/containerizer/docker.cpp, lines 2154-2158
> > 
> >
> > Do you have a patch for removing this comment?

Yes, removed it in `https://reviews.apache.org/r/52735/`, sorry for foggetting 
publish it...


- Yubo


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50841/#review153384
---


On 十月 20, 2016, 10 a.m., Yubo Li wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50841/
> ---
> 
> (Updated 十月 20, 2016, 10 a.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Guangya Liu, Kevin Klues, and 
> Rajat Phull.
> 
> 
> Bugs: MESOS-5795
> https://issues.apache.org/jira/browse/MESOS-5795
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Added control logic to allocate/deallocate GPUs to GPU-related task
> when the task is started/terminated.
> 
> 
> Diffs
> -
> 
>   src/slave/containerizer/docker.hpp 8da63101f951892e673612134770fc155d86112d 
>   src/slave/containerizer/docker.cpp 8ec4c0a25335fb1b36cb2ab82577f6d3e2f7f008 
> 
> Diff: https://reviews.apache.org/r/50841/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Yubo Li
> 
>



Re: Review Request 50841: Added GPU scheduling logic to docker containerizer process.

2016-10-20 Thread Guangya Liu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50841/#review153384
---




src/slave/containerizer/docker.hpp (lines 259 - 260)


This helper will not be called on non linux platform.

```
// Allocate GPU resources for a specified container on Linux.
```



src/slave/containerizer/docker.hpp (lines 269 - 270)


This helper will not be called on non linux platform.

```
// Deallocate GPU resources for a specified container on Linux.
```



src/slave/containerizer/docker.hpp (line 506)


s/GPU/GPU resources



src/slave/containerizer/docker.cpp (lines 661 - 662)


kill this, we already have comments for this helper in the header file. 
Also this will not be called on non-linux platform.



src/slave/containerizer/docker.cpp (line 665)


s/NvidiaGpuAllocator/NvidiaComponents



src/slave/containerizer/docker.cpp (lines 685 - 686)


kill this, we already have comments for this helper in the header file. 
Also this will not be called on non-linux platform.



src/slave/containerizer/docker.cpp (lines 704 - 705)


kill this, we already have comments for this helper in the header file. 
Also this will not be called on non-linux platform.



src/slave/containerizer/docker.cpp (lines 2151 - 2155)


Do you have a patch for removing this comment?



src/slave/containerizer/docker.cpp (lines 2156 - 2166)


What about the following:

```
string failure = "Failed to kill the Docker container: " +
 (kill.isFailed() ? kill.failure() : "discarded 
future");

#ifdef __linux__
if (!container->gpus.empty()) {
  failure += ", " + stringify(container->gpus.size()) + " GPUs leaked";
}
#endif // __linux__

container->termination.fail(failure);
```



src/slave/containerizer/docker.cpp (line 2210)


s/GPUs/GPU resources

It is better to make the concept consistent by using `GPU resources` for 
all of the comments.


- Guangya Liu


On 十月 20, 2016, 10 a.m., Yubo Li wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50841/
> ---
> 
> (Updated 十月 20, 2016, 10 a.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Guangya Liu, Kevin Klues, and 
> Rajat Phull.
> 
> 
> Bugs: MESOS-5795
> https://issues.apache.org/jira/browse/MESOS-5795
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Added control logic to allocate/deallocate GPUs to GPU-related task
> when the task is started/terminated.
> 
> 
> Diffs
> -
> 
>   src/slave/containerizer/docker.hpp 8da63101f951892e673612134770fc155d86112d 
>   src/slave/containerizer/docker.cpp 8ec4c0a25335fb1b36cb2ab82577f6d3e2f7f008 
> 
> Diff: https://reviews.apache.org/r/50841/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Yubo Li
> 
>



Re: Review Request 50841: Added GPU scheduling logic to docker containerizer process.

2016-10-20 Thread Yubo Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50841/
---

(Updated 十月 20, 2016, 10 a.m.)


Review request for mesos, Benjamin Mahler, Guangya Liu, Kevin Klues, and Rajat 
Phull.


Bugs: MESOS-5795
https://issues.apache.org/jira/browse/MESOS-5795


Repository: mesos


Description
---

Added control logic to allocate/deallocate GPUs to GPU-related task
when the task is started/terminated.


Diffs (updated)
-

  src/slave/containerizer/docker.hpp 8da63101f951892e673612134770fc155d86112d 
  src/slave/containerizer/docker.cpp 8ec4c0a25335fb1b36cb2ab82577f6d3e2f7f008 

Diff: https://reviews.apache.org/r/50841/diff/


Testing
---

make check


Thanks,

Yubo Li



Re: Review Request 50841: Added GPU scheduling logic to docker containerizer process.

2016-10-17 Thread Guangya Liu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50841/#review153021
---




src/slave/containerizer/docker.hpp (lines 258 - 274)


Move this to `#ifdef __linux__`

```
#ifdef __linux__
// Allocate GPU resources for a specified container.
process::Future allocateNvidiaGpus(
  const size_t count,
  const ContainerID& containerId);

process::Future _allocateNvidiaGpus(
  const std::set& allocated,
  const ContainerID& containerId);

// Deallocate GPU resources for a specified container.
process::Future deallocateNvidiaGpus(
  const ContainerID& containerId);

process::Future _deallocateNvidiaGpus(
  const ContainerID& containerId);
#endif
```



src/slave/containerizer/docker.hpp (lines 503 - 504)


```
#ifdef __linux__
// GPU resources allocated to the container.
std::set gpus;
#endif
```



src/slave/containerizer/docker.cpp (lines 670 - 688)


You will not need `#ifdef __linux__` for those helper functions as we 
already put those helper functions surrounded by `#ifdef __liux__`.

Ditto for others.


- Guangya Liu


On 十月 14, 2016, 9:56 a.m., Yubo Li wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50841/
> ---
> 
> (Updated 十月 14, 2016, 9:56 a.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Guangya Liu, Kevin Klues, and 
> Rajat Phull.
> 
> 
> Bugs: MESOS-5795
> https://issues.apache.org/jira/browse/MESOS-5795
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Added control logic to allocate/deallocate GPUs to GPU-related task
> when the task is started/terminated.
> 
> 
> Diffs
> -
> 
>   src/slave/containerizer/docker.hpp 8da63101f951892e673612134770fc155d86112d 
>   src/slave/containerizer/docker.cpp 8ec4c0a25335fb1b36cb2ab82577f6d3e2f7f008 
> 
> Diff: https://reviews.apache.org/r/50841/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Yubo Li
> 
>



Re: Review Request 50841: Added GPU scheduling logic to docker containerizer process.

2016-10-14 Thread Yubo Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50841/
---

(Updated 十月 14, 2016, 9:56 a.m.)


Review request for mesos, Benjamin Mahler, Guangya Liu, Kevin Klues, and Rajat 
Phull.


Bugs: MESOS-5795
https://issues.apache.org/jira/browse/MESOS-5795


Repository: mesos


Description
---

Added control logic to allocate/deallocate GPUs to GPU-related task
when the task is started/terminated.


Diffs (updated)
-

  src/slave/containerizer/docker.hpp 8da63101f951892e673612134770fc155d86112d 
  src/slave/containerizer/docker.cpp 8ec4c0a25335fb1b36cb2ab82577f6d3e2f7f008 

Diff: https://reviews.apache.org/r/50841/diff/


Testing
---

make check


Thanks,

Yubo Li



Re: Review Request 50841: Added GPU scheduling logic to docker containerizer process.

2016-10-13 Thread Yubo Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50841/
---

(Updated 十月 13, 2016, 10:14 a.m.)


Review request for mesos, Benjamin Mahler, Guangya Liu, Kevin Klues, and Rajat 
Phull.


Bugs: MESOS-5795
https://issues.apache.org/jira/browse/MESOS-5795


Repository: mesos


Description
---

Added control logic to allocate/deallocate GPUs to GPU-related task
when the task is started/terminated.


Diffs (updated)
-

  src/slave/containerizer/docker.hpp 8da63101f951892e673612134770fc155d86112d 
  src/slave/containerizer/docker.cpp 1d27761fcb3f310cf954d45ed41f4c89ecbd5982 

Diff: https://reviews.apache.org/r/50841/diff/


Testing
---

make check


Thanks,

Yubo Li



Re: Review Request 50841: Added GPU scheduling logic to docker containerizer process.

2016-10-13 Thread Yubo Li


> On 十月 11, 2016, 3:40 p.m., Guangya Liu wrote:
> > src/slave/containerizer/docker.cpp, lines 2150-2151
> > 
> >
> > Here will cause failure if gpu is not enabled, the ideal logic could be:
> > 
> > ```
> > if (container has gpu) {
> >   deallocateNvidiaGpus(containerId)
> > .onAny(defer(self(), &Self::destroy, containerId, killed, 
> > status));
> > } else {
> >   destroy(...);
> > }
> > ```
> > 
> > Or else you can follow the `allocateGpus` logic.
> > 
> > ```
> > Future deallocateGpus = Nothing();
> > 
> > if (container has gpu) {
> >   deallocateGpus = deallocateNvidiaGpus(...);
> > }
> > 
> > deallocateGpus(containerId)
> >   .onAny(defer(self(), &Self::destroy, containerId, killed, 
> > status));
> > ```

Fixed by following `allocateGpus` logic.


- Yubo


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50841/#review152140
---


On 十月 11, 2016, 8:16 a.m., Yubo Li wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50841/
> ---
> 
> (Updated 十月 11, 2016, 8:16 a.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Guangya Liu, Kevin Klues, and 
> Rajat Phull.
> 
> 
> Bugs: MESOS-5795
> https://issues.apache.org/jira/browse/MESOS-5795
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Added control logic to allocate/deallocate GPUs to GPU-related task
> when the task is started/terminated.
> 
> 
> Diffs
> -
> 
>   src/slave/containerizer/docker.hpp 8da63101f951892e673612134770fc155d86112d 
>   src/slave/containerizer/docker.cpp 1d27761fcb3f310cf954d45ed41f4c89ecbd5982 
> 
> Diff: https://reviews.apache.org/r/50841/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Yubo Li
> 
>



Re: Review Request 50841: Added GPU scheduling logic to docker containerizer process.

2016-10-13 Thread Yubo Li


> On 十月 12, 2016, 7:31 a.m., Guangya Liu wrote:
> > src/slave/containerizer/docker.cpp, lines 643-696
> > 
> >
> > I was thinking for how to make less code change but need to make sure 
> > build passed on mac.
> > 
> > How about the following?
> > 
> > ```
> > Future DockerContainerizerProcess::allocateNvidiaGpus(
> > const size_t count,
> > const ContainerID& containerId)
> > {
> > #ifdef __linux__
> >   if (!nvidiaGpuAllocator.isSome()) {
> > return Failure("The 'allocateNvidiaGpus' function was called"
> >" without an 'NvidiaGpuAllocator' set");
> >   }
> > 
> >   if (!containers_.contains(containerId)) {
> > return Failure("Container is already destroyed");
> >   }
> > 
> >   return nvidiaGpuAllocator->allocate(count)
> > .then(defer(
> > self(),
> > &Self::_allocateNvidiaGpus,
> > lambda::_1,
> > containerId));
> > #else
> >   return Nothing();
> > #endif
> > }
> > 
> > 
> > Future DockerContainerizerProcess::_allocateNvidiaGpus(
> > const set& allocated,
> > const ContainerID& containerId)
> > {
> > #ifdef __linux__
> >   if (!containers_.contains(containerId)) {
> > return nvidiaGpuAllocator->deallocate(allocated)
> >   .onFailed([](const string& message) {
> > return Failure("Failed to deallocate GPUs allocated"
> >" to destroyed container: " + message);
> >   });
> >   }
> > 
> >   containers_.at(containerId)->gpus = allocated;
> > #endif
> > 
> >   return Nothing();
> > }
> > 
> > 
> > Future DockerContainerizerProcess::deallocateNvidiaGpus(
> > const ContainerID& containerId)
> > {
> > #ifdef __linux__
> >   if (!nvidiaGpuAllocator.isSome()) {
> > return Failure("The 'deallocateNvidiaGpus' function was called"
> >" without an 'NvidiaGpuAllocator' set");
> >   }
> > 
> >   return 
> > nvidiaGpuAllocator->deallocate(containers_.at(containerId)->gpus)
> > .then(defer(
> > self(),
> > &Self::_deallocateNvidiaGpus,
> > containerId));
> > #else
> >   return Nothing();
> > #endif
> > }
> > ```
> > 
> > Then we need to update the comments of `allocateNvidiaGpus` and 
> > `deallocateNvidiaGpus` by identifying that those two APIs return Nothing if 
> > GPU is not enabled.

Fixed. Also added comments `// GPU is only supported on Linux, and it will 
return 'Nothing()' if the OS is not Linux.`


- Yubo


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50841/#review152277
---


On 十月 11, 2016, 8:16 a.m., Yubo Li wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50841/
> ---
> 
> (Updated 十月 11, 2016, 8:16 a.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Guangya Liu, Kevin Klues, and 
> Rajat Phull.
> 
> 
> Bugs: MESOS-5795
> https://issues.apache.org/jira/browse/MESOS-5795
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Added control logic to allocate/deallocate GPUs to GPU-related task
> when the task is started/terminated.
> 
> 
> Diffs
> -
> 
>   src/slave/containerizer/docker.hpp 8da63101f951892e673612134770fc155d86112d 
>   src/slave/containerizer/docker.cpp 1d27761fcb3f310cf954d45ed41f4c89ecbd5982 
> 
> Diff: https://reviews.apache.org/r/50841/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Yubo Li
> 
>



Re: Review Request 50841: Added GPU scheduling logic to docker containerizer process.

2016-10-12 Thread Guangya Liu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50841/#review152277
---




src/slave/containerizer/docker.cpp (lines 643 - 696)


I was thinking for how to make less code change but need to make sure build 
passed on mac.

How about the following?

```
Future DockerContainerizerProcess::allocateNvidiaGpus(
const size_t count,
const ContainerID& containerId)
{
#ifdef __linux__
  if (!nvidiaGpuAllocator.isSome()) {
return Failure("The 'allocateNvidiaGpus' function was called"
   " without an 'NvidiaGpuAllocator' set");
  }

  if (!containers_.contains(containerId)) {
return Failure("Container is already destroyed");
  }

  return nvidiaGpuAllocator->allocate(count)
.then(defer(
self(),
&Self::_allocateNvidiaGpus,
lambda::_1,
containerId));
#else
  return Nothing();
#endif
}

Future DockerContainerizerProcess::_allocateNvidiaGpus(
const set& allocated,
const ContainerID& containerId)
{
#ifdef __linux__
  if (!containers_.contains(containerId)) {
return nvidiaGpuAllocator->deallocate(allocated)
  .onFailed([](const string& message) {
return Failure("Failed to deallocate GPUs allocated"
   " to destroyed container: " + message);
  });
  }

  containers_.at(containerId)->gpus = allocated;
#endif

  return Nothing();
}

Future DockerContainerizerProcess::deallocateNvidiaGpus(
const ContainerID& containerId)
{
#ifdef __linux__
  if (!nvidiaGpuAllocator.isSome()) {
return Failure("The 'deallocateNvidiaGpus' function was called"
   " without an 'NvidiaGpuAllocator' set");
  }

  return nvidiaGpuAllocator->deallocate(containers_.at(containerId)->gpus)
.then(defer(
self(),
&Self::_deallocateNvidiaGpus,
containerId));
#else
  return Nothing();
#endif
}
```

Then we need to update the comments of `allocateNvidiaGpus` and 
`deallocateNvidiaGpus` by identifying that those two APIs return Nothing if GPU 
is not enabled.


- Guangya Liu


On 十月 11, 2016, 8:16 a.m., Yubo Li wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50841/
> ---
> 
> (Updated 十月 11, 2016, 8:16 a.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Guangya Liu, Kevin Klues, and 
> Rajat Phull.
> 
> 
> Bugs: MESOS-5795
> https://issues.apache.org/jira/browse/MESOS-5795
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Added control logic to allocate/deallocate GPUs to GPU-related task
> when the task is started/terminated.
> 
> 
> Diffs
> -
> 
>   src/slave/containerizer/docker.hpp 8da63101f951892e673612134770fc155d86112d 
>   src/slave/containerizer/docker.cpp 1d27761fcb3f310cf954d45ed41f4c89ecbd5982 
> 
> Diff: https://reviews.apache.org/r/50841/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Yubo Li
> 
>



Re: Review Request 50841: Added GPU scheduling logic to docker containerizer process.

2016-10-11 Thread Guangya Liu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50841/#review152140
---




src/slave/containerizer/docker.cpp (lines 2147 - 2148)


Here will cause failure if gpu is not enabled, the ideal logic could be:

```
if (container has gpu) {
  deallocateNvidiaGpus(containerId)
.onAny(defer(self(), &Self::destroy, containerId, killed, status));
} else {
  destroy(...);
}
```

Or else you can follow the `allocateGpus` logic.

```
Future deallocateGpus = Nothing();

if (container has gpu) {
  deallocateGpus = deallocateNvidiaGpus(...);
}

deallocateGpus(containerId)
  .onAny(defer(self(), &Self::destroy, containerId, killed, status));
```


- Guangya Liu


On 十月 11, 2016, 8:16 a.m., Yubo Li wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50841/
> ---
> 
> (Updated 十月 11, 2016, 8:16 a.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Guangya Liu, Kevin Klues, and 
> Rajat Phull.
> 
> 
> Bugs: MESOS-5795
> https://issues.apache.org/jira/browse/MESOS-5795
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Added control logic to allocate/deallocate GPUs to GPU-related task
> when the task is started/terminated.
> 
> 
> Diffs
> -
> 
>   src/slave/containerizer/docker.hpp 8da63101f951892e673612134770fc155d86112d 
>   src/slave/containerizer/docker.cpp 1d27761fcb3f310cf954d45ed41f4c89ecbd5982 
> 
> Diff: https://reviews.apache.org/r/50841/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Yubo Li
> 
>



Re: Review Request 50841: Added GPU scheduling logic to docker containerizer process.

2016-10-11 Thread Yubo Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50841/
---

(Updated 十月 11, 2016, 8:16 a.m.)


Review request for mesos, Benjamin Mahler, Guangya Liu, Kevin Klues, and Rajat 
Phull.


Bugs: MESOS-5795
https://issues.apache.org/jira/browse/MESOS-5795


Repository: mesos


Description
---

Added control logic to allocate/deallocate GPUs to GPU-related task
when the task is started/terminated.


Diffs (updated)
-

  src/slave/containerizer/docker.hpp 8da63101f951892e673612134770fc155d86112d 
  src/slave/containerizer/docker.cpp 1d27761fcb3f310cf954d45ed41f4c89ecbd5982 

Diff: https://reviews.apache.org/r/50841/diff/


Testing
---

make check


Thanks,

Yubo Li



Re: Review Request 50841: Added GPU scheduling logic to docker containerizer process.

2016-10-10 Thread Yubo Li


> On 十月 9, 2016, 5:16 a.m., Guangya Liu wrote:
> > src/slave/containerizer/docker.cpp, lines 2105-2109
> > 
> >
> > Not yours, but I think that the comments can be removed as we have the 
> > logic of killing those containers forcely in #2118 and #2123. You can fix 
> > this in a separate patch.

Will move to another patch


> On 十月 9, 2016, 5:16 a.m., Guangya Liu wrote:
> > src/slave/containerizer/docker.cpp, lines 2162-2182
> > 
> >
> > What about moving this to a new function named as `destroy` and use 
> > continuation for `deallocateNvidiaGpus` in #2159?

Fixed, thanks!


> On 十月 9, 2016, 5:16 a.m., Guangya Liu wrote:
> > src/slave/containerizer/docker.cpp, lines 1554-1564
> > 
> >
> > I think we do not need the update here as `destroy` will always be 
> > called even if the container exit normally, relasing the GPU in 
> > `___destroy` is good enough.

removed


- Yubo


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50841/#review151477
---


On 九月 22, 2016, 6:16 a.m., Yubo Li wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50841/
> ---
> 
> (Updated 九月 22, 2016, 6:16 a.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Guangya Liu, Kevin Klues, and 
> Rajat Phull.
> 
> 
> Bugs: MESOS-5795
> https://issues.apache.org/jira/browse/MESOS-5795
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Added control logic to allocate/deallocate GPUs to GPU-related task
> when the task is started/terminated.
> 
> 
> Diffs
> -
> 
>   src/slave/containerizer/docker.hpp 8da63101f951892e673612134770fc155d86112d 
>   src/slave/containerizer/docker.cpp 1d27761fcb3f310cf954d45ed41f4c89ecbd5982 
> 
> Diff: https://reviews.apache.org/r/50841/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Yubo Li
> 
>



Re: Review Request 50841: Added GPU scheduling logic to docker containerizer process.

2016-10-08 Thread Guangya Liu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50841/#review151477
---




src/slave/containerizer/docker.cpp (lines 648 - 649)


Put the blank space at the begining for the second sentence.
```
return Failure("The 'allocateNvidiaGpus' function was called"
   " without an 'NvidiaGpuAllocator' set");
```



src/slave/containerizer/docker.cpp (line 677)


s/containers_[containerId]/containers_.at(containerId)



src/slave/containerizer/docker.cpp (lines 687 - 688)


```
return Failure("The 'deallocateNvidiaGpus' function was called"
   " without an 'NvidiaGpuAllocator' set");
```



src/slave/containerizer/docker.cpp (line 691)


s/containers_[containerId]/containers_.at(containerId)



src/slave/containerizer/docker.cpp (line 703)


s/containers_[containerId]/containers_.at(containerId)



src/slave/containerizer/docker.cpp (lines 1339 - 1358)


What about moving this right before #1378?



src/slave/containerizer/docker.cpp (line 1350)


s/scala/scalar



src/slave/containerizer/docker.cpp (lines 1554 - 1564)


I think we do not need the update here as `destroy` will always be called 
even if the container exit normally, relasing the GPU in `___destroy` is good 
enough.



src/slave/containerizer/docker.cpp (lines 2105 - 2109)


Not yours, but I think that the comments can be removed as we have the 
logic of killing those containers forcely in #2118 and #2123. You can fix this 
in a separate patch.



src/slave/containerizer/docker.cpp (lines 2162 - 2182)


What about moving this to a new function named as `destroy` and use 
continuation for `deallocateNvidiaGpus` in #2159?


- Guangya Liu


On 九月 22, 2016, 6:16 a.m., Yubo Li wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50841/
> ---
> 
> (Updated 九月 22, 2016, 6:16 a.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Guangya Liu, Kevin Klues, and 
> Rajat Phull.
> 
> 
> Bugs: MESOS-5795
> https://issues.apache.org/jira/browse/MESOS-5795
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Added control logic to allocate/deallocate GPUs to GPU-related task
> when the task is started/terminated.
> 
> 
> Diffs
> -
> 
>   src/slave/containerizer/docker.hpp 8da63101f951892e673612134770fc155d86112d 
>   src/slave/containerizer/docker.cpp 1d27761fcb3f310cf954d45ed41f4c89ecbd5982 
> 
> Diff: https://reviews.apache.org/r/50841/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Yubo Li
> 
>



Re: Review Request 50841: Added GPU scheduling logic to docker containerizer process.

2016-09-21 Thread Yubo Li


> On 八月 28, 2016, 12:57 a.m., Kevin Klues wrote:
> > src/slave/containerizer/docker.hpp, line 506
> > 
> >
> > Shoud we use `set` here like we do in the mesos containerizer?

Yes, changed it to `set`.


> On 八月 28, 2016, 12:57 a.m., Kevin Klues wrote:
> > src/slave/containerizer/docker.cpp, lines 1341-1346
> > 
> >
> > You can just use `if (gpus.get() > 0)` here.

the temp variable `requested` is removed.


> On 八月 28, 2016, 12:57 a.m., Kevin Klues wrote:
> > src/slave/containerizer/docker.cpp, lines 1541-1552
> > 
> >
> > You can't rely on `containers_[containerId]` to be valid at this point 
> > (hence the check for it below).
> > 
> > Regardless, I just at the head of master to see where the most 
> > appropriate place to do this deallocation would be, and this function looks 
> > *completely* different from what I see here.
> > 
> > When was the last time you rebased?

The code is rebased at Aug. 15, let's keep this issue. I'll come back to this 
after I fix all other bugs and rebase by code.
Code rebased, where is the most appropriate place to do so?
BTW, `deallocateNvidiaGpus()` will check if `containers_[containerId]` is valid 
now, so we don't need to worry that `containers_[containerId]` is invalid.


> On 八月 28, 2016, 12:57 a.m., Kevin Klues wrote:
> > src/slave/containerizer/docker.cpp, lines 2037-2041
> > 
> >
> > It feels too early to be doing the `deallocate()` here. I think we 
> > should only do it once the container is actually killed (i.e. in 
> > `___destroy()`).
> > 
> > If we do it too early, the GPUs could still be being used by the 
> > container when they get allocated out to someone else. This would not go 
> > very well.
> > 
> > There is the case, however, in `__destroy()` where we aren't able to 
> > kill the docker container at all! It's better to leak the GPUs in this case 
> > than to deallocate them and run the risk of giving them to a new container. 
> > We should mention this in the error string below if any GPUs are actually 
> > leaked:
> > 
> > ```
> > container->termination.fail(
> > "Failed to kill the Docker container: " +
> > (kill.isFailed() ? kill.failure() : "discarded future"));
> > ```

I see. Also will come back to deal with deallocate issues after code rebased.
I moved this to `___destroy()`.
Also, added the error string for leaked GPUs:
```
container->termination.fail(
"Failed to kill the Docker container: " +
(kill.isFailed() ? kill.failure() : "discarded future") +
(container->gpus.empty() ?
"" : ", " + stringify(container->gpus.size()) + "GPUs leaked"));
```


- Yubo


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50841/#review147089
---


On 九月 20, 2016, 9:22 a.m., Yubo Li wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50841/
> ---
> 
> (Updated 九月 20, 2016, 9:22 a.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Guangya Liu, Kevin Klues, and 
> Rajat Phull.
> 
> 
> Bugs: MESOS-5795
> https://issues.apache.org/jira/browse/MESOS-5795
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Added control logic to allocate/deallocate GPUs to GPU-related task
> when the task is started/terminated.
> 
> 
> Diffs
> -
> 
>   src/slave/containerizer/docker.hpp f2a06065cf99fed934c2c1ffc47461ec8a97f50d 
>   src/slave/containerizer/docker.cpp 5c1ee8e467d1c54c60b67dc5275ef71e1bb90723 
> 
> Diff: https://reviews.apache.org/r/50841/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Yubo Li
> 
>



Re: Review Request 50841: Added GPU scheduling logic to docker containerizer process.

2016-09-21 Thread Yubo Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50841/
---

(Updated 九月 22, 2016, 6:16 a.m.)


Review request for mesos, Benjamin Mahler, Guangya Liu, Kevin Klues, and Rajat 
Phull.


Bugs: MESOS-5795
https://issues.apache.org/jira/browse/MESOS-5795


Repository: mesos


Description
---

Added control logic to allocate/deallocate GPUs to GPU-related task
when the task is started/terminated.


Diffs (updated)
-

  src/slave/containerizer/docker.hpp 8da63101f951892e673612134770fc155d86112d 
  src/slave/containerizer/docker.cpp 1d27761fcb3f310cf954d45ed41f4c89ecbd5982 

Diff: https://reviews.apache.org/r/50841/diff/


Testing
---

make check


Thanks,

Yubo Li



Re: Review Request 50841: Added GPU scheduling logic to docker containerizer process.

2016-09-20 Thread Yubo Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50841/
---

(Updated 九月 20, 2016, 9:22 a.m.)


Review request for mesos, Benjamin Mahler, Guangya Liu, Kevin Klues, and Rajat 
Phull.


Bugs: MESOS-5795
https://issues.apache.org/jira/browse/MESOS-5795


Repository: mesos


Description
---

Added control logic to allocate/deallocate GPUs to GPU-related task
when the task is started/terminated.


Diffs (updated)
-

  src/slave/containerizer/docker.hpp f2a06065cf99fed934c2c1ffc47461ec8a97f50d 
  src/slave/containerizer/docker.cpp 5c1ee8e467d1c54c60b67dc5275ef71e1bb90723 

Diff: https://reviews.apache.org/r/50841/diff/


Testing
---

make check


Thanks,

Yubo Li



Re: Review Request 50841: Added GPU scheduling logic to docker containerizer process.

2016-09-19 Thread Yubo Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50841/
---

(Updated 九月 19, 2016, 7:17 a.m.)


Review request for mesos, Benjamin Mahler, Guangya Liu, Kevin Klues, and Rajat 
Phull.


Bugs: MESOS-5795
https://issues.apache.org/jira/browse/MESOS-5795


Repository: mesos


Description
---

Added control logic to allocate/deallocate GPUs to GPU-related task
when the task is started/terminated.


Diffs (updated)
-

  src/slave/containerizer/docker.hpp f2a06065cf99fed934c2c1ffc47461ec8a97f50d 
  src/slave/containerizer/docker.cpp 5c1ee8e467d1c54c60b67dc5275ef71e1bb90723 

Diff: https://reviews.apache.org/r/50841/diff/


Testing
---

make check


Thanks,

Yubo Li



Re: Review Request 50841: Added GPU scheduling logic to docker containerizer process.

2016-09-18 Thread Yubo Li


> On 八月 25, 2016, 12:32 a.m., Kevin Klues wrote:
> > src/slave/containerizer/docker.cpp, lines 666-668
> > 
> >
> > I would do this as the first check in this function.  If we don't have 
> > an allocator set, then we really shouldn't even be calling this function 
> > regardless of anything else that is going on.
> > 
> > Also, the string should read:
> > ```
> > "The `allocateNvidiaGpu` function was called without an 
> > `NvidiaGpuAllocator` set".
> > ```
> 
> Yubo Li wrote:
> If we put `nvidiaGpuAllocator` check in top of this function, we have to 
> check `requested==0` outside the function otherwise `nvidiaGpuAllocator` 
> check will be failed if GPU feature is not enabled. But I think move 
> `requested==0` outside `nvidiaGpuAllocator` is reasonable if we use temp 
> `Future()`. That is something logic like
> ```
> Future allocateGpus = Nothing();
> ..
> if (gpus.isSome()) {
>   // Make sure that the `gpus` resource is not fractional.
>   // We rely on scala resources only have 3 digits of precision.
>   if (static_cast(gpus.getOrElse(0.0) * 1000.0) % 1000 != 0) {
> return Failure("The 'gpus' resource must be an unsigned integer");
>   }
> 
>   const size_t requested = static_cast(gpus.getOrElse(0.0));
> 
>   if (requested > 0) {
> allocateGpus = allocateNvidiaGpus(requested, containerId);
>   }
> }
> ```
> Make sense?
> 
> Kevin Klues wrote:
> Yeah, that looks good to me. Except now you don't need `getOrElse()`, 
> just `get()`.

Sure, I changed to `get()`.


> On 八月 25, 2016, 12:32 a.m., Kevin Klues wrote:
> > src/slave/containerizer/docker.cpp, lines 694-696
> > 
> >
> > Why do you need this level of indirection? Why not just pass 
> > `containers_[containerId]->gpuAllocated` directly to 
> > `nvidiaGpuAllocator->deallocate()`?
> 
> Yubo Li wrote:
> `containers_[containerId]->gpuAllocated` is a list but 
> `nvidiaGpuAllocator->deallocate()` accepts set.
> 
> Kevin Klues wrote:
> Can we  make `containers_[containerId]->gpuAllocated` a `set`?

Yes, changed it to `set`.


- Yubo


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50841/#review146726
---


On 八月 26, 2016, 11:02 a.m., Yubo Li wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50841/
> ---
> 
> (Updated 八月 26, 2016, 11:02 a.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Guangya Liu, Kevin Klues, and 
> Rajat Phull.
> 
> 
> Bugs: MESOS-5795
> https://issues.apache.org/jira/browse/MESOS-5795
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Added control logic to allocate/deallocate GPUs to GPU-related task
> when the task is started/terminated.
> 
> 
> Diffs
> -
> 
>   src/slave/containerizer/docker.hpp f2a06065cf99fed934c2c1ffc47461ec8a97f50d 
>   src/slave/containerizer/docker.cpp 5c1ee8e467d1c54c60b67dc5275ef71e1bb90723 
> 
> Diff: https://reviews.apache.org/r/50841/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Yubo Li
> 
>



Re: Review Request 50841: Added GPU scheduling logic to docker containerizer process.

2016-08-27 Thread Kevin Klues


> On Aug. 25, 2016, 12:32 a.m., Kevin Klues wrote:
> > src/slave/containerizer/docker.hpp, lines 505-507
> > 
> >
> > I would just call this variable `gpus`
> > Also the comment should read:
> > ```
> > // The number of GPUs allocated to the container.
> > ```
> 
> Yubo Li wrote:
> `gpus` is with type `std::list` so that it actually not the GPU 
> numbers. Do you think we will use comments `The number of GPUs`?

You're right. We should say "GPUs allocated to the container" though, not just 
"GPU".


- Kevin


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50841/#review146726
---


On Aug. 26, 2016, 11:02 a.m., Yubo Li wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50841/
> ---
> 
> (Updated Aug. 26, 2016, 11:02 a.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Guangya Liu, Kevin Klues, and 
> Rajat Phull.
> 
> 
> Bugs: MESOS-5795
> https://issues.apache.org/jira/browse/MESOS-5795
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Added control logic to allocate/deallocate GPUs to GPU-related task
> when the task is started/terminated.
> 
> 
> Diffs
> -
> 
>   src/slave/containerizer/docker.hpp f2a06065cf99fed934c2c1ffc47461ec8a97f50d 
>   src/slave/containerizer/docker.cpp 5c1ee8e467d1c54c60b67dc5275ef71e1bb90723 
> 
> Diff: https://reviews.apache.org/r/50841/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Yubo Li
> 
>



Re: Review Request 50841: Added GPU scheduling logic to docker containerizer process.

2016-08-27 Thread Kevin Klues


> On Aug. 25, 2016, 12:32 a.m., Kevin Klues wrote:
> > src/slave/containerizer/docker.cpp, lines 666-668
> > 
> >
> > I would do this as the first check in this function.  If we don't have 
> > an allocator set, then we really shouldn't even be calling this function 
> > regardless of anything else that is going on.
> > 
> > Also, the string should read:
> > ```
> > "The `allocateNvidiaGpu` function was called without an 
> > `NvidiaGpuAllocator` set".
> > ```
> 
> Yubo Li wrote:
> If we put `nvidiaGpuAllocator` check in top of this function, we have to 
> check `requested==0` outside the function otherwise `nvidiaGpuAllocator` 
> check will be failed if GPU feature is not enabled. But I think move 
> `requested==0` outside `nvidiaGpuAllocator` is reasonable if we use temp 
> `Future()`. That is something logic like
> ```
> Future allocateGpus = Nothing();
> ..
> if (gpus.isSome()) {
>   // Make sure that the `gpus` resource is not fractional.
>   // We rely on scala resources only have 3 digits of precision.
>   if (static_cast(gpus.getOrElse(0.0) * 1000.0) % 1000 != 0) {
> return Failure("The 'gpus' resource must be an unsigned integer");
>   }
> 
>   const size_t requested = static_cast(gpus.getOrElse(0.0));
> 
>   if (requested > 0) {
> allocateGpus = allocateNvidiaGpus(requested, containerId);
>   }
> }
> ```
> Make sense?

Yeah, that looks good to me. Except now you don't need `getOrElse()`, just 
`get()`.


> On Aug. 25, 2016, 12:32 a.m., Kevin Klues wrote:
> > src/slave/containerizer/docker.cpp, lines 694-696
> > 
> >
> > Why do you need this level of indirection? Why not just pass 
> > `containers_[containerId]->gpuAllocated` directly to 
> > `nvidiaGpuAllocator->deallocate()`?
> 
> Yubo Li wrote:
> `containers_[containerId]->gpuAllocated` is a list but 
> `nvidiaGpuAllocator->deallocate()` accepts set.

Can we  make `containers_[containerId]->gpuAllocated` a `set`?


- Kevin


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50841/#review146726
---


On Aug. 26, 2016, 11:02 a.m., Yubo Li wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50841/
> ---
> 
> (Updated Aug. 26, 2016, 11:02 a.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Guangya Liu, Kevin Klues, and 
> Rajat Phull.
> 
> 
> Bugs: MESOS-5795
> https://issues.apache.org/jira/browse/MESOS-5795
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Added control logic to allocate/deallocate GPUs to GPU-related task
> when the task is started/terminated.
> 
> 
> Diffs
> -
> 
>   src/slave/containerizer/docker.hpp f2a06065cf99fed934c2c1ffc47461ec8a97f50d 
>   src/slave/containerizer/docker.cpp 5c1ee8e467d1c54c60b67dc5275ef71e1bb90723 
> 
> Diff: https://reviews.apache.org/r/50841/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Yubo Li
> 
>



Re: Review Request 50841: Added GPU scheduling logic to docker containerizer process.

2016-08-27 Thread Kevin Klues

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50841/#review147089
---




src/slave/containerizer/docker.hpp (line 506)


Shoud we use `set` here like we do in the mesos containerizer?



src/slave/containerizer/docker.cpp (line 663)


We don't need this line if you follow the suggestion below.



src/slave/containerizer/docker.cpp (lines 665 - 672)


We can't assume that `container` is still valid at the time we call this 
lambda.

Because of this, we need to do the same check as above inside the lambda:

```
if (!containers_.contains(containerId)) {
  return Failure("Container is already destroyed");
}
```

However, if this fails, we need to make sure and deallocate the GPUs 
because the container we were trying to attach them to is now gone.

I've run into this situation in the past, and @bmahler has usually had me 
break things like this out into separate function to avoid confusion with doing 
so many checks all over the place.

Something like the following should do it:

```
Future DockerContainerizerProcess::allocateNvidiaGpus(
const size_t count,
const ContainerID& containerId)
{
  if (!nvidiaGpuAllocator.isSome()) {
return Failure("The 'allocateNvidiaGpus' function was called"
   " without an 'NvidiaGpuAllocator' set");
  }

  if (!containers_.contains(containerId)) {
return Failure("Container is already destroyed");
  }
  
  return nvidiaGpuAllocator->allocate(count)
.then(defer(
self(),
DockerContainerizerProcess::_allocateNvidiaGpus,
lambda::_1,
containerId);
}

Future DockerContainerizerProcess::_allocateNvidiaGpus(
const set& allocated,
const ContainerID& containerId)
{
  if (!containers_.contains(containerId)) {
return nvidiaGpuAllocator->deallocate(allocated)
  .onFailed([](const string& message) {
return Failure("Failed to deallocate GPUs allocated"
   " to destroyed container: " + message);
  });
  }
  
  containers_[containerId]->gpus = allocated;

  return Nothing();
}
```



src/slave/containerizer/docker.cpp (lines 676 - 695)


We need something similar here to what I suggested above for verifying that 
the container is still valid across the continuation.



src/slave/containerizer/docker.cpp (line 1337)


You can just use `gpus.get()` here.



src/slave/containerizer/docker.cpp (lines 1341 - 1346)


You can just use `if (gpus.get() > 0)` here.



src/slave/containerizer/docker.cpp (lines 1541 - 1552)


You can't rely on `containers_[containerId]` to be valid at this point 
(hence the check for it below).

Regardless, I just at the head of master to see where the most appropriate 
place to do this deallocation would be, and this function looks *completely* 
different from what I see here.

When was the last time you rebased?



src/slave/containerizer/docker.cpp (lines 2037 - 2041)


It feels too early to be doing the `deallocate()` here. I think we should 
only do it once the container is actually killed (i.e. in `___destroy()`).

If we do it too early, the GPUs could still be being used by the container 
when they get allocated out to someone else. This would not go very well.

There is the case, however, in `__destroy()` where we aren't able to kill 
the docker container at all! It's better to leak the GPUs in this case than to 
deallocate them and run the risk of giving them to a new container. We should 
mention this in the error string below if any GPUs are actually leaked:

```
container->termination.fail(
"Failed to kill the Docker container: " +
(kill.isFailed() ? kill.failure() : "discarded future"));
```


- Kevin Klues


On Aug. 26, 2016, 11:02 a.m., Yubo Li wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50841/
> ---
> 
> (Updated Aug. 26, 2016, 11:02 a.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Guangya Liu, Kevin Klues, and 
> Rajat Phull.
> 
> 
> Bugs: MESOS-5795
> https://issues.apache.org/jira

Re: Review Request 50841: Added GPU scheduling logic to docker containerizer process.

2016-08-26 Thread Yubo Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50841/
---

(Updated 八月 26, 2016, 11:02 a.m.)


Review request for mesos, Benjamin Mahler, Guangya Liu, Kevin Klues, and Rajat 
Phull.


Changes
---

Updated to address Kevin's comments


Bugs: MESOS-5795
https://issues.apache.org/jira/browse/MESOS-5795


Repository: mesos


Description
---

Added control logic to allocate/deallocate GPUs to GPU-related task
when the task is started/terminated.


Diffs (updated)
-

  src/slave/containerizer/docker.hpp f2a06065cf99fed934c2c1ffc47461ec8a97f50d 
  src/slave/containerizer/docker.cpp 5c1ee8e467d1c54c60b67dc5275ef71e1bb90723 

Diff: https://reviews.apache.org/r/50841/diff/


Testing
---

make check


Thanks,

Yubo Li



Re: Review Request 50841: Added GPU scheduling logic to docker containerizer process.

2016-08-26 Thread Yubo Li


> On 八月 25, 2016, 12:32 a.m., Kevin Klues wrote:
> > src/slave/containerizer/docker.hpp, lines 505-507
> > 
> >
> > I would just call this variable `gpus`
> > Also the comment should read:
> > ```
> > // The number of GPUs allocated to the container.
> > ```

`gpus` is with type `std::list` so that it actually not the GPU numbers. 
Do you think we will use comments `The number of GPUs`?


> On 八月 25, 2016, 12:32 a.m., Kevin Klues wrote:
> > src/slave/containerizer/docker.cpp, line 653
> > 
> >
> > I would probably call this variable `count`. When I saw the name 
> > `requestedNvidiaGpu` I thought it was a specific GPU id being passed in, 
> > not a count.
> > 
> > I would also call the function `allocateNvidiaGpus()` since you can 
> > allocate more than one with this function.

make sense


> On 八月 25, 2016, 12:32 a.m., Kevin Klues wrote:
> > src/slave/containerizer/docker.cpp, lines 666-668
> > 
> >
> > I would do this as the first check in this function.  If we don't have 
> > an allocator set, then we really shouldn't even be calling this function 
> > regardless of anything else that is going on.
> > 
> > Also, the string should read:
> > ```
> > "The `allocateNvidiaGpu` function was called without an 
> > `NvidiaGpuAllocator` set".
> > ```

If we put `nvidiaGpuAllocator` check in top of this function, we have to check 
`requested==0` outside the function otherwise `nvidiaGpuAllocator` check will 
be failed if GPU feature is not enabled. But I think move `requested==0` 
outside `nvidiaGpuAllocator` is reasonable if we use temp `Future()`. That is 
something logic like
```
Future allocateGpus = Nothing();
..
if (gpus.isSome()) {
  // Make sure that the `gpus` resource is not fractional.
  // We rely on scala resources only have 3 digits of precision.
  if (static_cast(gpus.getOrElse(0.0) * 1000.0) % 1000 != 0) {
return Failure("The 'gpus' resource must be an unsigned integer");
  }

  const size_t requested = static_cast(gpus.getOrElse(0.0));

  if (requested > 0) {
allocateGpus = allocateNvidiaGpus(requested, containerId);
  }
}
```
Make sense?


> On 八月 25, 2016, 12:32 a.m., Kevin Klues wrote:
> > src/slave/containerizer/docker.cpp, lines 694-696
> > 
> >
> > Why do you need this level of indirection? Why not just pass 
> > `containers_[containerId]->gpuAllocated` directly to 
> > `nvidiaGpuAllocator->deallocate()`?

`containers_[containerId]->gpuAllocated` is a list but 
`nvidiaGpuAllocator->deallocate()` accepts set.


> On 八月 25, 2016, 12:32 a.m., Kevin Klues wrote:
> > src/slave/containerizer/docker.cpp, lines 698-710
> > 
> >
> > Why don't you just return from the `deallocate()` call with a 
> > `.then()`? I.e.
> > 
> > ```
> >   return nvidiaGpuAllocator->deallocate(deallocated)
> > .then(defer(self(), [=](const Nothing& nothing) {
> >   containers_[containerId]->gpuAllocated.clear();
> >   return Nothing();
> > }));
> > ```
> > 
> > If any failures happen in the deallocation, they should get propagated 
> > through.
> 
> Guangya Liu wrote:
> With the current logic, we can have more log messages here with different 
> conditions, but seems your proposal is more simple.

I prefer Kevin's proposal because it is more simple


> On 八月 25, 2016, 12:32 a.m., Kevin Klues wrote:
> > src/slave/containerizer/docker.cpp, line 1555
> > 
> >
> > I wouldn't just blindly call this function here. It should be wrapped 
> > in some logic that makes sure it's OK to call it (i.e. checks to make sure 
> > that we have the nvidia->allocator component passed in).
> > 
> > Again, you could have some logic above which saves a temporary `Future` 
> > that is set to `Nothing()` by default and is the result of calling 
> > `deallocateNvidiaGpu()` otherwise.
> 
> Guangya Liu wrote:
> The `deallocateNvidiaGpu` already have some checking for 
> `nvidiaGpuAllocator`, is this enough?
> 
> Kevin Klues wrote:
> I don't think we want to call any `nvidia*` functions anywhere in the 
> code without checks around them at the call site. For someone reading the 
> code top to bottom it makes it look like we are *always* allocating / 
> deallocating / etc. GPUs, which is cnfusing. The checks inside the function 
> are to make sure that we don't accidentally call them somewhere without the 
> proper check at the call site.

See what I revised, do you think it is better than before?


- Yubo


-

Re: Review Request 50841: Added GPU scheduling logic to docker containerizer process.

2016-08-25 Thread Kevin Klues


> On Aug. 25, 2016, 12:32 a.m., Kevin Klues wrote:
> > src/slave/containerizer/docker.cpp, line 1555
> > 
> >
> > I wouldn't just blindly call this function here. It should be wrapped 
> > in some logic that makes sure it's OK to call it (i.e. checks to make sure 
> > that we have the nvidia->allocator component passed in).
> > 
> > Again, you could have some logic above which saves a temporary `Future` 
> > that is set to `Nothing()` by default and is the result of calling 
> > `deallocateNvidiaGpu()` otherwise.
> 
> Guangya Liu wrote:
> The `deallocateNvidiaGpu` already have some checking for 
> `nvidiaGpuAllocator`, is this enough?

I don't think we want to call any `nvidia*` functions anywhere in the code 
without checks around them at the call site. For someone reading the code top 
to bottom it makes it look like we are *always* allocating / deallocating / 
etc. GPUs, which is cnfusing. The checks inside the function are to make sure 
that we don't accidentally call them somewhere without the proper check at the 
call site.


- Kevin


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50841/#review146726
---


On Aug. 22, 2016, 10:11 a.m., Yubo Li wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50841/
> ---
> 
> (Updated Aug. 22, 2016, 10:11 a.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Guangya Liu, Kevin Klues, and 
> Rajat Phull.
> 
> 
> Bugs: MESOS-5795
> https://issues.apache.org/jira/browse/MESOS-5795
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Added control logic to allocate/deallocate GPUs to GPU-related task
> when the task is started/terminated.
> 
> 
> Diffs
> -
> 
>   src/slave/containerizer/docker.hpp f2a06065cf99fed934c2c1ffc47461ec8a97f50d 
>   src/slave/containerizer/docker.cpp 5c1ee8e467d1c54c60b67dc5275ef71e1bb90723 
> 
> Diff: https://reviews.apache.org/r/50841/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Yubo Li
> 
>



Re: Review Request 50841: Added GPU scheduling logic to docker containerizer process.

2016-08-24 Thread Guangya Liu


> On 八月 25, 2016, 12:32 a.m., Kevin Klues wrote:
> > src/slave/containerizer/docker.cpp, line 1555
> > 
> >
> > I wouldn't just blindly call this function here. It should be wrapped 
> > in some logic that makes sure it's OK to call it (i.e. checks to make sure 
> > that we have the nvidia->allocator component passed in).
> > 
> > Again, you could have some logic above which saves a temporary `Future` 
> > that is set to `Nothing()` by default and is the result of calling 
> > `deallocateNvidiaGpu()` otherwise.

The `deallocateNvidiaGpu` already have some checking for `nvidiaGpuAllocator`, 
is this enough?


> On 八月 25, 2016, 12:32 a.m., Kevin Klues wrote:
> > src/slave/containerizer/docker.cpp, lines 698-710
> > 
> >
> > Why don't you just return from the `deallocate()` call with a 
> > `.then()`? I.e.
> > 
> > ```
> >   return nvidiaGpuAllocator->deallocate(deallocated)
> > .then(defer(self(), [=](const Nothing& nothing) {
> >   containers_[containerId]->gpuAllocated.clear();
> >   return Nothing();
> > }));
> > ```
> > 
> > If any failures happen in the deallocation, they should get propagated 
> > through.

With the current logic, we can have more log messages here with different 
conditions, but seems your proposal is more simple.


- Guangya


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50841/#review146726
---


On 八月 22, 2016, 10:11 a.m., Yubo Li wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50841/
> ---
> 
> (Updated 八月 22, 2016, 10:11 a.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Guangya Liu, Kevin Klues, and 
> Rajat Phull.
> 
> 
> Bugs: MESOS-5795
> https://issues.apache.org/jira/browse/MESOS-5795
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Added control logic to allocate/deallocate GPUs to GPU-related task
> when the task is started/terminated.
> 
> 
> Diffs
> -
> 
>   src/slave/containerizer/docker.hpp f2a06065cf99fed934c2c1ffc47461ec8a97f50d 
>   src/slave/containerizer/docker.cpp 5c1ee8e467d1c54c60b67dc5275ef71e1bb90723 
> 
> Diff: https://reviews.apache.org/r/50841/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Yubo Li
> 
>



Re: Review Request 50841: Added GPU scheduling logic to docker containerizer process.

2016-08-24 Thread Kevin Klues

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50841/#review146726
---




src/slave/containerizer/docker.hpp (lines 505 - 507)


I would just call this variable `gpus`
Also the comment should read:
```
// The number of GPUs allocated to the container.
```



src/slave/containerizer/docker.cpp (line 653)


I would probably call this variable `count`. When I saw the name 
`requestedNvidiaGpu` I thought it was a specific GPU id being passed in, not a 
count.

I would also call the function `allocateNvidiaGpus()` since you can 
allocate more than one with this function.



src/slave/containerizer/docker.cpp (lines 662 - 664)


With type `size_t` you can never have a negative value., so just checking 
`== 0` should be enough here.



src/slave/containerizer/docker.cpp (lines 666 - 668)


I would do this as the first check in this function.  If we don't have an 
allocator set, then we really shouldn't even be calling this function 
regardless of anything else that is going on.

Also, the string should read:
```
"The `allocateNvidiaGpu` function was called without an 
`NvidiaGpuAllocator` set".
```



src/slave/containerizer/docker.cpp (line 681)


I would rename  this function `deallocateNvidiaGpus()` (i.e. with an `s` on 
the end).



src/slave/containerizer/docker.cpp (line 684)


I would move this down just before its first use.



src/slave/containerizer/docker.cpp (lines 690 - 692)


Again, I would move this up to the top of the function, with a similar 
Failure string as before.



src/slave/containerizer/docker.cpp (lines 694 - 696)


Why do you need this level of indirection? Why not just pass 
`containers_[containerId]->gpuAllocated` directly to 
`nvidiaGpuAllocator->deallocate()`?



src/slave/containerizer/docker.cpp (lines 698 - 710)


Why don't you just return from the `deallocate()` call with a `.then()`? 
I.e.

```
  return nvidiaGpuAllocator->deallocate(deallocated)
.then(defer(self(), [=](const Nothing& nothing) {
  containers_[containerId]->gpuAllocated.clear();
  return Nothing();
}));
```

If any failures happen in the deallocation, they should get propagated 
through.



src/slave/containerizer/docker.cpp (lines 1376 - 1379)


I would probably remove the intermediate variable above called `requested` 
and instead save a temporary `Future` to the `allocateNvidiaGpu()` call. I 
would also move all of the GPU logic together instead of separating it out.

Something like:

```

Future allocateGpus = Nothing();

const Option& taskInfo = container->task;
if (taskInfo.isNone()) {
  return Failure("No task information found");
}

if (taskInfo->resources.gpus().isSome()) {
  // Make sure that the `gpus` resource is not fractional.
  // We rely on scalar resources to only have 3 digits of precision.
  if (static_cast(gpus.get() * 1000.0) % 1000 != 0) {
return Failure("The 'gpus' resource must be an unsigned integer");
  }

  allocateGpus = allocateNvidiaGpu(gpus.get(), containerId);
}

...
...
...

return allocateGpus
  .then(...)
  .then(...);
```



src/slave/containerizer/docker.cpp (line 1555)


I wouldn't just blindly call this function here. It should be wrapped in 
some logic that makes sure it's OK to call it (i.e. checks to make sure that we 
have the nvidia->allocator component passed in).

Again, you could have some logic above which saves a temporary `Future` 
that is set to `Nothing()` by default and is the result of calling 
`deallocateNvidiaGpu()` otherwise.



src/slave/containerizer/docker.cpp (line 2035)


Same here, use the temporary mentioned above to do the deallocation or not.


- Kevin Klues


On Aug. 22, 2016, 10:11 a.m., Yubo Li wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50841/
> ---
> 
> (Updated Aug. 22, 2016, 10:11 a.m.)
> 
> 
> Review request for mesos, Benjamin Mahle

Re: Review Request 50841: Added GPU scheduling logic to docker containerizer process.

2016-08-22 Thread Yubo Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50841/
---

(Updated 八月 22, 2016, 10:11 a.m.)


Review request for mesos, Benjamin Mahler, Guangya Liu, Kevin Klues, and Rajat 
Phull.


Bugs: MESOS-5795
https://issues.apache.org/jira/browse/MESOS-5795


Repository: mesos


Description
---

Added control logic to allocate/deallocate GPUs to GPU-related task
when the task is started/terminated.


Diffs (updated)
-

  src/slave/containerizer/docker.hpp f2a06065cf99fed934c2c1ffc47461ec8a97f50d 
  src/slave/containerizer/docker.cpp 5c1ee8e467d1c54c60b67dc5275ef71e1bb90723 

Diff: https://reviews.apache.org/r/50841/diff/


Testing
---

make check


Thanks,

Yubo Li



Re: Review Request 50841: Added GPU scheduling logic to docker containerizer process.

2016-08-22 Thread Guangya Liu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50841/#review146313
---




src/slave/containerizer/docker.cpp (lines 698 - 716)


This can be simplified as 

```
nvidiaGpuAllocator->deallocate(deallocated)
  .onReady(defer(self(), [=](const Nothing& nothing) {
containers_[containerId]->gpuAllocated.clear();
  }))
  .onFailed(defer(self(), [=](const string& message) {
LOG(ERROR) << "Deallocate GPUs failed for container '" << containerId
   << "': " << message;
  }))
  .onDiscarded(defer(self(), [=]() {
LOG(ERROR) << "Deallocate GPUs discarded for container '" << 
containerId;
  }));

  return Nothing();
```


- Guangya Liu


On 八月 16, 2016, 10:19 a.m., Yubo Li wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50841/
> ---
> 
> (Updated 八月 16, 2016, 10:19 a.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Guangya Liu, Kevin Klues, and 
> Rajat Phull.
> 
> 
> Bugs: MESOS-5795
> https://issues.apache.org/jira/browse/MESOS-5795
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Added control logic to allocate/deallocate GPUs to GPU-related task
> when the task is started/terminated.
> 
> 
> Diffs
> -
> 
>   src/slave/containerizer/docker.hpp f2a06065cf99fed934c2c1ffc47461ec8a97f50d 
>   src/slave/containerizer/docker.cpp 5c1ee8e467d1c54c60b67dc5275ef71e1bb90723 
> 
> Diff: https://reviews.apache.org/r/50841/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Yubo Li
> 
>



Re: Review Request 50841: Added GPU scheduling logic to docker containerizer process.

2016-08-21 Thread Guangya Liu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50841/#review146301
---




src/slave/containerizer/docker.hpp (line 505)


period to the end

```
// GPU allocated to the container.
```



src/slave/containerizer/docker.cpp (lines 2037 - 2040)


The comments here needs some update, we should mention that if GPU enabled, 
we will first deallocate GPU then destroy the container.



src/slave/containerizer/docker.cpp (line 2041)


kill this


- Guangya Liu


On 八月 16, 2016, 10:19 a.m., Yubo Li wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50841/
> ---
> 
> (Updated 八月 16, 2016, 10:19 a.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Guangya Liu, Kevin Klues, and 
> Rajat Phull.
> 
> 
> Bugs: MESOS-5795
> https://issues.apache.org/jira/browse/MESOS-5795
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Added control logic to allocate/deallocate GPUs to GPU-related task
> when the task is started/terminated.
> 
> 
> Diffs
> -
> 
>   src/slave/containerizer/docker.hpp f2a06065cf99fed934c2c1ffc47461ec8a97f50d 
>   src/slave/containerizer/docker.cpp 5c1ee8e467d1c54c60b67dc5275ef71e1bb90723 
> 
> Diff: https://reviews.apache.org/r/50841/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Yubo Li
> 
>



Re: Review Request 50841: Added GPU scheduling logic to docker containerizer process.

2016-08-16 Thread Yubo Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50841/
---

(Updated 八月 16, 2016, 10:19 a.m.)


Review request for mesos, Benjamin Mahler, Guangya Liu, Kevin Klues, and Rajat 
Phull.


Changes
---

Also updated to deal with onDiscarded() case. Please have a look.


Bugs: MESOS-5795
https://issues.apache.org/jira/browse/MESOS-5795


Repository: mesos


Description
---

Added control logic to allocate/deallocate GPUs to GPU-related task
when the task is started/terminated.


Diffs (updated)
-

  src/slave/containerizer/docker.hpp f2a06065cf99fed934c2c1ffc47461ec8a97f50d 
  src/slave/containerizer/docker.cpp 5c1ee8e467d1c54c60b67dc5275ef71e1bb90723 

Diff: https://reviews.apache.org/r/50841/diff/


Testing
---

make check


Thanks,

Yubo Li



Re: Review Request 50841: Added GPU scheduling logic to docker containerizer process.

2016-08-15 Thread Guangya Liu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50841/#review145747
---




src/slave/containerizer/docker.hpp (line 255)


What about
```
Allocate GPU resources for a specified container.
```



src/slave/containerizer/docker.hpp (line 257)


s/size_t/const size_t



src/slave/containerizer/docker.hpp (line 260)


What about comments as this:

```
Deallocate GPU resources for a specified container.
```



src/slave/containerizer/docker.cpp (line 653)


s/size_t/const size_t



src/slave/containerizer/docker.cpp (lines 694 - 696)


What about move this right after the check so as to fail fast?

```
if (containers_[containerId]->gpuAllocated.empty()) {
  return Nothing();
}
```



src/slave/containerizer/docker.cpp (line 706)


I think this should be put into onReady section as 
https://github.com/apache/mesos/blob/master/src/log/log.cpp#L116-L119



src/slave/containerizer/docker.cpp (line 1355)


s/size_t/const size_t



src/slave/containerizer/docker.cpp (line 1378)


Keep this align with `.then(` in 1376


- Guangya Liu


On 八月 15, 2016, 7:26 a.m., Yubo Li wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50841/
> ---
> 
> (Updated 八月 15, 2016, 7:26 a.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Guangya Liu, Kevin Klues, and 
> Rajat Phull.
> 
> 
> Bugs: MESOS-5795
> https://issues.apache.org/jira/browse/MESOS-5795
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Added control logic to allocate/deallocate GPUs to GPU-related task
> when the task is started/terminated.
> 
> 
> Diffs
> -
> 
>   src/slave/containerizer/docker.hpp f2a06065cf99fed934c2c1ffc47461ec8a97f50d 
>   src/slave/containerizer/docker.cpp 5c1ee8e467d1c54c60b67dc5275ef71e1bb90723 
> 
> Diff: https://reviews.apache.org/r/50841/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Yubo Li
> 
>



Re: Review Request 50841: Added GPU scheduling logic to docker containerizer process.

2016-08-15 Thread Yubo Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50841/
---

(Updated 八月 15, 2016, 7:26 a.m.)


Review request for mesos, Benjamin Mahler, Guangya Liu, Kevin Klues, and Rajat 
Phull.


Changes
---

The implementation is a little different from yours, please have a look.


Bugs: MESOS-5795
https://issues.apache.org/jira/browse/MESOS-5795


Repository: mesos


Description
---

Added control logic to allocate/deallocate GPUs to GPU-related task
when the task is started/terminated.


Diffs (updated)
-

  src/slave/containerizer/docker.hpp f2a06065cf99fed934c2c1ffc47461ec8a97f50d 
  src/slave/containerizer/docker.cpp 5c1ee8e467d1c54c60b67dc5275ef71e1bb90723 

Diff: https://reviews.apache.org/r/50841/diff/


Testing
---

make check


Thanks,

Yubo Li



Re: Review Request 50841: Added GPU scheduling logic to docker containerizer process.

2016-08-12 Thread Guangya Liu


> On 八月 11, 2016, 3:07 p.m., Guangya Liu wrote:
> > src/slave/containerizer/docker.cpp, lines 1325-1355
> > 
> >
> > What about adding a new function named as 
> > `DockerContainerizerProcess::allocateNvidiaGpu`.
> > 
> > ```
> > Future DockerContainerizerProcess::allocateNvidiaGpu(
> > size_t requestedNvidiaGpu,
> > const ContainerID& containerId)
> > {
> >   if (!containers_.contains(containerId)) {
> > return Failure("Container is already destroyed");
> >   }
> > 
> >   Container* container = containers_[containerId];
> > 
> >   if (requestedNvidiaGpu <= 0) {
> > return Nothing();
> >   }
> > 
> >   return nvidiaGpuAllocator->allocate(requestedNvidiaGpu)
> > .then(defer(self(), [=](set allocated) -> Future {
> >   foreach (const Gpu& gpu, allocated) {
> > container->gpuAllocated.push_back(gpu);
> >   }
> > 
> >   return Nothing();
> > }));
> > }
> > ```
> > 
> > Then return followingn in the end of 
> > `DockerContainerizerProcess::launchExecutorProcess`:
> > 
> > ```
> > const Resources& resources = taskInfo->resources();
> > 
> > Option gpus = resources.gpus();
> > 
> > // Make sure that the `gpus` resource is not fractional.
> > // We rely on scala resources only have 3 digits of precision.
> > if (static_cast(gpus.getOrElse(0.0) * 1000.0) % 1000 != 0) {
> >   return Failure("The 'gpus' resource must be an unsigned integer");
> > }
> > 
> > size_t requested = static_cast(gpus.getOrElse(0.0));
> > 
> > return allocateNvidiaGpu(requested, containerId)
> >   .then(defer(self(), [=]() {
> > return logger->prepare(container->executor, container->directory);
> >   }))
> >   .then(defer(
> >   self(),
> >   [=](const ContainerLogger::SubprocessInfo& subprocessInfo)
> > -> Future {
> >   // NOTE: The child process will be blocked until all hooks have been
> >   // executed.
> >   vector parentHooks;
> > 
> >   // NOTE: Currently we don't care about the order of the hooks, as
> >   // both hooks are independent.
> > 
> >   // A hook that is executed in the parent process. It attempts to 
> > checkpoint
> >   // the process pid.
> >   //
> >   // NOTE:
> >   // - The child process is blocked by the hook infrastructure while
> >   //   these hooks are executed.
> >   // - It is safe to bind `this`, as hooks are executed immediately
> >   //   in a `subprocess` call.
> >   // - If `checkpoiont` returns an Error, the child process will be 
> > killed.
> >   parentHooks.emplace_back(Subprocess::Hook(lambda::bind(
> >   &DockerContainerizerProcess::checkpoint,
> >   this,
> >   containerId,
> >   lambda::_1)));
> > 
> > #ifdef __linux__
> > // If we are on systemd, then extend the life of the executor. Any
> > // grandchildren's lives will also be extended.
> > if (systemd::enabled()) {
> >   parentHooks.emplace_back(Subprocess::Hook(
> >   &systemd::mesos::extendLifetime));
> > }
> > #endif // __linux__
> > 
> > // Prepare the flags to pass to the mesos docker executor process.
> > docker::Flags launchFlags = dockerFlags(
> > flags,
> > container->name(),
> > container->directory,
> > container->taskEnvironment);
> > 
> > VLOG(1) << "Launching 'mesos-docker-executor' with flags '"
> > << launchFlags << "'";
> > 
> > // Construct the mesos-docker-executor using the "name" we gave the
> > // container (to distinguish it from Docker containers not created
> > // by Mesos).
> > Try s = subprocess(
> > path::join(flags.launcher_dir, "mesos-docker-executor"),
> > argv,
> > Subprocess::PIPE(),
> > subprocessInfo.out,
> > subprocessInfo.err,
> > SETSID,
> > launchFlags,
> > environment,
> > None(),
> > parentHooks,
> > container->directory);
> > 
> > if (s.isError()) {
> >   return Failure("Failed to fork executor: " + s.error());
> > }
> > 
> > return s.get().pid();
> >   }));
> > }
> > 
> > ```
> 
> Yubo Li wrote:
> I saw you removed the check:
> if (!nvidiaGpuAllocator.isSome()) {
>   return Failure("Can not deallocate GPU without enabling GPU support.");
> }
> 
> If someone compiled mesos without GPU support, `nvidiaGpuAllocator` 
> should be `None()`. In `allocateNvidiaGpu()`, 
> `nvidiaGpuAllocator->allocate(requestedNvidiaGpu)` will crash. Your opinion?

Yes, we should have su

Re: Review Request 50841: Added GPU scheduling logic to docker containerizer process.

2016-08-12 Thread Yubo Li


> On 八月 11, 2016, 3:07 p.m., Guangya Liu wrote:
> > src/slave/containerizer/docker.cpp, lines 1325-1355
> > 
> >
> > What about adding a new function named as 
> > `DockerContainerizerProcess::allocateNvidiaGpu`.
> > 
> > ```
> > Future DockerContainerizerProcess::allocateNvidiaGpu(
> > size_t requestedNvidiaGpu,
> > const ContainerID& containerId)
> > {
> >   if (!containers_.contains(containerId)) {
> > return Failure("Container is already destroyed");
> >   }
> > 
> >   Container* container = containers_[containerId];
> > 
> >   if (requestedNvidiaGpu <= 0) {
> > return Nothing();
> >   }
> > 
> >   return nvidiaGpuAllocator->allocate(requestedNvidiaGpu)
> > .then(defer(self(), [=](set allocated) -> Future {
> >   foreach (const Gpu& gpu, allocated) {
> > container->gpuAllocated.push_back(gpu);
> >   }
> > 
> >   return Nothing();
> > }));
> > }
> > ```
> > 
> > Then return followingn in the end of 
> > `DockerContainerizerProcess::launchExecutorProcess`:
> > 
> > ```
> > const Resources& resources = taskInfo->resources();
> > 
> > Option gpus = resources.gpus();
> > 
> > // Make sure that the `gpus` resource is not fractional.
> > // We rely on scala resources only have 3 digits of precision.
> > if (static_cast(gpus.getOrElse(0.0) * 1000.0) % 1000 != 0) {
> >   return Failure("The 'gpus' resource must be an unsigned integer");
> > }
> > 
> > size_t requested = static_cast(gpus.getOrElse(0.0));
> > 
> > return allocateNvidiaGpu(requested, containerId)
> >   .then(defer(self(), [=]() {
> > return logger->prepare(container->executor, container->directory);
> >   }))
> >   .then(defer(
> >   self(),
> >   [=](const ContainerLogger::SubprocessInfo& subprocessInfo)
> > -> Future {
> >   // NOTE: The child process will be blocked until all hooks have been
> >   // executed.
> >   vector parentHooks;
> > 
> >   // NOTE: Currently we don't care about the order of the hooks, as
> >   // both hooks are independent.
> > 
> >   // A hook that is executed in the parent process. It attempts to 
> > checkpoint
> >   // the process pid.
> >   //
> >   // NOTE:
> >   // - The child process is blocked by the hook infrastructure while
> >   //   these hooks are executed.
> >   // - It is safe to bind `this`, as hooks are executed immediately
> >   //   in a `subprocess` call.
> >   // - If `checkpoiont` returns an Error, the child process will be 
> > killed.
> >   parentHooks.emplace_back(Subprocess::Hook(lambda::bind(
> >   &DockerContainerizerProcess::checkpoint,
> >   this,
> >   containerId,
> >   lambda::_1)));
> > 
> > #ifdef __linux__
> > // If we are on systemd, then extend the life of the executor. Any
> > // grandchildren's lives will also be extended.
> > if (systemd::enabled()) {
> >   parentHooks.emplace_back(Subprocess::Hook(
> >   &systemd::mesos::extendLifetime));
> > }
> > #endif // __linux__
> > 
> > // Prepare the flags to pass to the mesos docker executor process.
> > docker::Flags launchFlags = dockerFlags(
> > flags,
> > container->name(),
> > container->directory,
> > container->taskEnvironment);
> > 
> > VLOG(1) << "Launching 'mesos-docker-executor' with flags '"
> > << launchFlags << "'";
> > 
> > // Construct the mesos-docker-executor using the "name" we gave the
> > // container (to distinguish it from Docker containers not created
> > // by Mesos).
> > Try s = subprocess(
> > path::join(flags.launcher_dir, "mesos-docker-executor"),
> > argv,
> > Subprocess::PIPE(),
> > subprocessInfo.out,
> > subprocessInfo.err,
> > SETSID,
> > launchFlags,
> > environment,
> > None(),
> > parentHooks,
> > container->directory);
> > 
> > if (s.isError()) {
> >   return Failure("Failed to fork executor: " + s.error());
> > }
> > 
> > return s.get().pid();
> >   }));
> > }
> > 
> > ```

I saw you removed the check:
if (!nvidiaGpuAllocator.isSome()) {
  return Failure("Can not deallocate GPU without enabling GPU support.");
}

If someone compiled mesos without GPU support, `nvidiaGpuAllocator` should be 
`None()`. In `allocateNvidiaGpu()`, 
`nvidiaGpuAllocator->allocate(requestedNvidiaGpu)` will crash. Your opinion?


- Yubo


---
This is an 

Re: Review Request 50841: Added GPU scheduling logic to docker containerizer process.

2016-08-11 Thread Guangya Liu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50841/#review145507
---




src/slave/containerizer/docker.hpp (lines 215 - 218)


How about kill this and add two new functions as following:

```
process::Future deallocateNvidiaGpu(
const ContainerID& containerId);

process::Future allocateNvidiaGpu(
size_t requestedNvidiaGpu,
const ContainerID& containerId);
```



src/slave/containerizer/docker.cpp (lines 1325 - 1355)


What about adding a new function named as 
`DockerContainerizerProcess::allocateNvidiaGpu`.

```
Future DockerContainerizerProcess::allocateNvidiaGpu(
size_t requestedNvidiaGpu,
const ContainerID& containerId)
{
  if (!containers_.contains(containerId)) {
return Failure("Container is already destroyed");
  }

  Container* container = containers_[containerId];

  if (requestedNvidiaGpu <= 0) {
return Nothing();
  }

  return nvidiaGpuAllocator->allocate(requestedNvidiaGpu)
.then(defer(self(), [=](set allocated) -> Future {
  foreach (const Gpu& gpu, allocated) {
container->gpuAllocated.push_back(gpu);
  }

  return Nothing();
}));
}
```

Then return followingn in the end of 
`DockerContainerizerProcess::launchExecutorProcess`:

```
const Resources& resources = taskInfo->resources();

Option gpus = resources.gpus();

// Make sure that the `gpus` resource is not fractional.
// We rely on scala resources only have 3 digits of precision.
if (static_cast(gpus.getOrElse(0.0) * 1000.0) % 1000 != 0) {
  return Failure("The 'gpus' resource must be an unsigned integer");
}

size_t requested = static_cast(gpus.getOrElse(0.0));

return allocateNvidiaGpu(requested, containerId)
  .then(defer(self(), [=]() {
return logger->prepare(container->executor, container->directory);
  }))
  .then(defer(
  self(),
  [=](const ContainerLogger::SubprocessInfo& subprocessInfo)
-> Future {
  // NOTE: The child process will be blocked until all hooks have been
  // executed.
  vector parentHooks;

  // NOTE: Currently we don't care about the order of the hooks, as
  // both hooks are independent.

  // A hook that is executed in the parent process. It attempts to 
checkpoint
  // the process pid.
  //
  // NOTE:
  // - The child process is blocked by the hook infrastructure while
  //   these hooks are executed.
  // - It is safe to bind `this`, as hooks are executed immediately
  //   in a `subprocess` call.
  // - If `checkpoiont` returns an Error, the child process will be killed.
  parentHooks.emplace_back(Subprocess::Hook(lambda::bind(
  &DockerContainerizerProcess::checkpoint,
  this,
  containerId,
  lambda::_1)));

#ifdef __linux__
// If we are on systemd, then extend the life of the executor. Any
// grandchildren's lives will also be extended.
if (systemd::enabled()) {
  parentHooks.emplace_back(Subprocess::Hook(
  &systemd::mesos::extendLifetime));
}
#endif // __linux__

// Prepare the flags to pass to the mesos docker executor process.
docker::Flags launchFlags = dockerFlags(
flags,
container->name(),
container->directory,
container->taskEnvironment);

VLOG(1) << "Launching 'mesos-docker-executor' with flags '"
<< launchFlags << "'";

// Construct the mesos-docker-executor using the "name" we gave the
// container (to distinguish it from Docker containers not created
// by Mesos).
Try s = subprocess(
path::join(flags.launcher_dir, "mesos-docker-executor"),
argv,
Subprocess::PIPE(),
subprocessInfo.out,
subprocessInfo.err,
SETSID,
launchFlags,
environment,
None(),
parentHooks,
container->directory);

if (s.isError()) {
  return Failure("Failed to fork executor: " + s.error());
}

return s.get().pid();
  }));
}

```



src/slave/containerizer/docker.cpp (lines 1544 - 1559)


What about abstract this to a new function so that this can also be used by 
`DockerContainerizerProcess::destroy`.

```
Future DockerContainerizerProcess::deallocateNvidiaGpu(
  const ContainerID& containerId)
{
 

Re: Review Request 50841: Added GPU scheduling logic to docker containerizer process.

2016-08-10 Thread Yubo Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50841/
---

(Updated 八月 10, 2016, 10:34 a.m.)


Review request for mesos, Benjamin Mahler, Guangya Liu, Kevin Klues, and Rajat 
Phull.


Bugs: MESOS-5795
https://issues.apache.org/jira/browse/MESOS-5795


Repository: mesos


Description
---

Added control logic to allocate/deallocate GPUs to GPU-related task
when the task is started/terminated.


Diffs (updated)
-

  src/slave/containerizer/docker.hpp 43ca4317d608b3b43dd7bd0d1b55c721e7364885 
  src/slave/containerizer/docker.cpp 12bad2db03bcf755317c654f028b628c5c407a62 

Diff: https://reviews.apache.org/r/50841/diff/


Testing
---

make check


Thanks,

Yubo Li



Re: Review Request 50841: Added GPU scheduling logic to docker containerizer process.

2016-08-09 Thread Yubo Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50841/
---

(Updated 八月 9, 2016, 9:25 a.m.)


Review request for mesos, Benjamin Mahler, Guangya Liu, Kevin Klues, and Rajat 
Phull.


Changes
---

Changed to use sub-thread to work with GPU allocation and not block main thread


Bugs: MESOS-5795
https://issues.apache.org/jira/browse/MESOS-5795


Repository: mesos


Description
---

Added control logic to allocate/deallocate GPUs to GPU-related task
when the task is started/terminated.


Diffs (updated)
-

  src/slave/containerizer/docker.hpp 43ca4317d608b3b43dd7bd0d1b55c721e7364885 
  src/slave/containerizer/docker.cpp 12bad2db03bcf755317c654f028b628c5c407a62 

Diff: https://reviews.apache.org/r/50841/diff/


Testing
---

make check


Thanks,

Yubo Li



Re: Review Request 50841: Added GPU scheduling logic to docker containerizer process.

2016-08-07 Thread Guangya Liu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50841/#review145056
---




src/slave/containerizer/docker.cpp (lines 1360 - 1428)


What about putting those logic to a new function such as 
`DockerContainerizerProcess::_launchExecutorProcess`, and here you can update 
the logic as:

```
return allocator->allocate(requested)
  .then(defer(...))
```

The `defer` will call the new added function 
`DockerContainerizerProcess::_launchExecutorProcess`, using continuation here 
can make the main thread not blocked.



src/slave/containerizer/docker.cpp (line 1548)


```
return allocator->deallocate(deallocated);
```



src/slave/containerizer/docker.cpp (line 1913)


What about log a warning message here if failed to deallocate?

```
allocator->deallocate(deallocated);
  .onFailed(defer(...) {
LOG(WARNING) << "...";
  }));
```


- Guangya Liu


On 八月 5, 2016, 9:49 a.m., Yubo Li wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50841/
> ---
> 
> (Updated 八月 5, 2016, 9:49 a.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Guangya Liu, Kevin Klues, and 
> Rajat Phull.
> 
> 
> Bugs: MESOS-5795
> https://issues.apache.org/jira/browse/MESOS-5795
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Added control logic to allocate/deallocate GPUs to GPU-related task
> when the task is started/terminated.
> 
> 
> Diffs
> -
> 
>   src/slave/containerizer/docker.hpp 43ca4317d608b3b43dd7bd0d1b55c721e7364885 
>   src/slave/containerizer/docker.cpp 12bad2db03bcf755317c654f028b628c5c407a62 
> 
> Diff: https://reviews.apache.org/r/50841/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Yubo Li
> 
>