Out of curiosity, is isolation something that will be guaranteed by
"standard" kernel mechanisms (like cgroups) or is it a functionality that
requires using the driver/library directly?

Internally I have some POCs where a standard set of services runs in a
cgroup always, and another cgroup is setup for mesos tasks. (that was
easier than writing a custom mesos resource methodology). In such a
scenario I'm curious how well mesos and non-mesos tasks on the same system
would interact with the resource isolation.

On Mon, Jan 18, 2016 at 8:23 AM Vikrama Ditya <[email protected]> wrote:

> Clarification on isolation. As Ben and we (Nvidia) working to introduce
> GPU as first class resource into Mesos.
>
>
>
> By default there is no isolation. But there will be isolation module for
> Nvidia GPU devices which can be linked at build time and provide isolation
> for GPU tasks among GPU devices. Initially device level isolation will be
> there assuming all tasks using same device libraries (hence no file system
> isolation). Our initial proposal is not exposing details of GPU but
> subsequently more detail of GPU resources like (topology, memory, core,
> bandwidth etc.) will be exposed to do better job scheduling.
>
>
>
> As Ben indicated very soon we will send out design proposal to community
> for comments.
>
>
>
> Regards
>
> --
> Vikram
>
>
>
> *From:* Benjamin Mahler [mailto:[email protected]]
> *Sent:* Saturday, January 16, 2016 4:31 PM
> *To:* [email protected]
>
>
> *Subject:* Re: Share GPU resources via attributes or as custom resources
> (INTERNAL)
>
>
>
> There is a design proposal coming that will include guidance around using
> GPUs and better GPU support in mesos, so stay tuned.
>
>
>
> Mesos supports adding arbitrary resources, e.g.
>
>
>
> --resources=cpus(*):4;gpus(*):4
>
>
>
> Mesos will then manage a scalar "gpu" resource with a value of 4. This
> means "gpu" scalars will be offered to the framework and the framework may
> launch tasks / executors that are allocated a "gpu" scalar. Of course,
> you'll need support from Marathon for custom resources when you define your
> job, not sure if that exists currently.
>
>
>
> Now, by default no isolation is going to take place. That may be ok for
> you if you have tight control over the fact that tasks/executors only try
> to consume the number of gpus that have been allocated to them. If not, you
> may run an isolator module for gpus (e.g. using the device whitelist
> controller cgroup). At the current time you would have to write one, as I'm
> not sure whether one has been written / published.
>
>
>
> You'll need to make sure your containers have access to the necessary gpu
> libraries. If you are running without filesystem isolation then tasks can
> just reach out of the sandbox to use the necessary libraries.
>
>
>
> Hope that helps,
>
> Ben
>
>
>
> On Thu, Jan 14, 2016 at 9:02 AM, <[email protected]> wrote:
>
> I have a machine with 4 GPUs and want to use Mesos+Marathon to schedule
> the jobs to be run in the machine. Each job will use maximum 1 GPU and
> sharing 1 GPU between small jobs would be ok.
> I know Mesos does not directly support GPUs, but it seems I might use
> custom resources or attributes to do what I want. But how exactly should
> this be done?
>
> If I use --attributes="hasGpu:true", would a job be sent to the machine
> when another job is already running in the machine (and only using 1 GPU)?
> I would say all jobs requesting a machine with a hasGpu attribute would be
> sent to the machine (as long as it has free CPU and memory resources).
> Then, if a job is sent to the machine when the 4 GPUs are already busy, the
> job will fail to start, right? Could then Marathon be used to re-send the
> job after some time, until it is accepted by the machine?
>
> If I specify --resources="gpu(*):4", it is my understanding that once a
> job is sent to the machine, all 4 GPUs will become busy to the eyes of
> Mesos (even if this is not really true). If that is right, would this
> work-around work: specify 4 different resources: gpu:A, gpu:B, gpu:C and
> gpu:D; and use constraints in Marathon like this  "constraints": [["gpu",
> "LIKE", " [A-D]"]]?
>
> Cheers
>
>
> ------------------------------
> This email message is for the sole use of the intended recipient(s) and
> may contain confidential information.  Any unauthorized review, use,
> disclosure or distribution is prohibited.  If you are not the intended
> recipient, please contact the sender by reply email and destroy all copies
> of the original message.
> ------------------------------
>

Reply via email to