Intent to use standard kernel mechanism “cgroup” to ensure device level 
isolation.

--
Vikram

From: Charles Allen [mailto:[email protected]]
Sent: Tuesday, January 19, 2016 10:00 AM
To: [email protected]
Subject: Re: Share GPU resources via attributes or as custom resources 
(INTERNAL)

Out of curiosity, is isolation something that will be guaranteed by "standard" 
kernel mechanisms (like cgroups) or is it a functionality that requires using 
the driver/library directly?

Internally I have some POCs where a standard set of services runs in a cgroup 
always, and another cgroup is setup for mesos tasks. (that was easier than 
writing a custom mesos resource methodology). In such a scenario I'm curious 
how well mesos and non-mesos tasks on the same system would interact with the 
resource isolation.

On Mon, Jan 18, 2016 at 8:23 AM Vikrama Ditya 
<[email protected]<mailto:[email protected]>> wrote:
Clarification on isolation. As Ben and we (Nvidia) working to introduce GPU as 
first class resource into Mesos.

By default there is no isolation. But there will be isolation module for Nvidia 
GPU devices which can be linked at build time and provide isolation for GPU 
tasks among GPU devices. Initially device level isolation will be there 
assuming all tasks using same device libraries (hence no file system 
isolation). Our initial proposal is not exposing details of GPU but 
subsequently more detail of GPU resources like (topology, memory, core, 
bandwidth etc.) will be exposed to do better job scheduling.

As Ben indicated very soon we will send out design proposal to community for 
comments.

Regards
--
Vikram

From: Benjamin Mahler [mailto:[email protected]<mailto:[email protected]>]
Sent: Saturday, January 16, 2016 4:31 PM
To: [email protected]<mailto:[email protected]>

Subject: Re: Share GPU resources via attributes or as custom resources 
(INTERNAL)

There is a design proposal coming that will include guidance around using GPUs 
and better GPU support in mesos, so stay tuned.

Mesos supports adding arbitrary resources, e.g.

--resources=cpus(*):4;gpus(*):4

Mesos will then manage a scalar "gpu" resource with a value of 4. This means 
"gpu" scalars will be offered to the framework and the framework may launch 
tasks / executors that are allocated a "gpu" scalar. Of course, you'll need 
support from Marathon for custom resources when you define your job, not sure 
if that exists currently.

Now, by default no isolation is going to take place. That may be ok for you if 
you have tight control over the fact that tasks/executors only try to consume 
the number of gpus that have been allocated to them. If not, you may run an 
isolator module for gpus (e.g. using the device whitelist controller cgroup). 
At the current time you would have to write one, as I'm not sure whether one 
has been written / published.

You'll need to make sure your containers have access to the necessary gpu 
libraries. If you are running without filesystem isolation then tasks can just 
reach out of the sandbox to use the necessary libraries.

Hope that helps,
Ben

On Thu, Jan 14, 2016 at 9:02 AM, 
<[email protected]<mailto:[email protected]>> wrote:
I have a machine with 4 GPUs and want to use Mesos+Marathon to schedule the 
jobs to be run in the machine. Each job will use maximum 1 GPU and sharing 1 
GPU between small jobs would be ok.
I know Mesos does not directly support GPUs, but it seems I might use custom 
resources or attributes to do what I want. But how exactly should this be done?

If I use --attributes="hasGpu:true", would a job be sent to the machine when 
another job is already running in the machine (and only using 1 GPU)? I would 
say all jobs requesting a machine with a hasGpu attribute would be sent to the 
machine (as long as it has free CPU and memory resources). Then, if a job is 
sent to the machine when the 4 GPUs are already busy, the job will fail to 
start, right? Could then Marathon be used to re-send the job after some time, 
until it is accepted by the machine?

If I specify --resources="gpu(*):4", it is my understanding that once a job is 
sent to the machine, all 4 GPUs will become busy to the eyes of Mesos (even if 
this is not really true). If that is right, would this work-around work: 
specify 4 different resources: gpu:A, gpu:B, gpu:C and gpu:D; and use 
constraints in Marathon like this  "constraints": [["gpu", "LIKE", " [A-D]"]]?

Cheers

________________________________
This email message is for the sole use of the intended recipient(s) and may 
contain confidential information.  Any unauthorized review, use, disclosure or 
distribution is prohibited.  If you are not the intended recipient, please 
contact the sender by reply email and destroy all copies of the original 
message.
________________________________

Reply via email to