[jira] [Assigned] (MESOS-5795) Add Nvidia GPU support for in the docker containerizer

2020-05-26 Thread Meng Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/MESOS-5795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Meng Zhu reassigned MESOS-5795:
---

Assignee: Meng Zhu

> Add Nvidia GPU support for in the docker containerizer
> --
>
> Key: MESOS-5795
> URL: https://issues.apache.org/jira/browse/MESOS-5795
> Project: Mesos
>  Issue Type: Epic
>  Components: containerization, docker
>Reporter: Kevin Klues
>Assignee: Meng Zhu
>Priority: Major
>  Labels: gpu, mesosphere
>
> In order to support Nvidia GPUs with docker containers in Mesos, we need to 
> be able to consolidate all Nvidia libraries into a common volume and inject 
> that volume into the container. This tracks the support in the docker 
> containerizer. The mesos containerizer support has already been completed in 
> MESOS-5401.
> More info on why this is necessary here: 
> https://github.com/NVIDIA/nvidia-docker/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (MESOS-10133) Task launched with 0 scalar value for persistent volume.

2020-05-26 Thread Benjamin Mahler (Jira)
Benjamin Mahler created MESOS-10133:
---

 Summary: Task launched with 0 scalar value for persistent volume.
 Key: MESOS-10133
 URL: https://issues.apache.org/jira/browse/MESOS-10133
 Project: Mesos
  Issue Type: Bug
  Components: master
Reporter: Benjamin Mahler


We saw the following task launch message:

{noformat}
I0520 17:58:11.559875  2913 master.cpp:4985] Launching task T of framework 
288ebd4e-8bbf-4f2a-ac3a-2e5eb2885266 (name) with resources 
[{"allocation_info":{"role":"role"},"disk":{"persistence":{"id":"ID","principal":"p"},"volume":{"container_path":"path","mode":"RW"}},"name":"disk","reservations":[{"labels":{"labels":[{"key":"marathon_framework_id","value":"16fa03ca-0048-4124-bdac-aff56e679c95-"},{"key":"marathon_task_id","value":"T"}]},"principal":"p","role":"role","type":"DYNAMIC"}],"scalar":{"value":0.0},"type":"SCALAR"},{"allocation_info":{"role":"role"},"name":"mem","scalar":{"value":544.0},"type":"SCALAR"},{"allocation_info":{"role":"role"},"name":"cpus","scalar":{"value":0.1},"type":"SCALAR"}]
 on agent 16fa03ca-0048-4124-bdac-aff56e679c95-S49 at slave(1)@IP:5051 (IP) on  
new executor
{noformat}

In which the persistent volume is being used with a 0 scalar value. This should 
have been considered invalid since we require the entire persistent volume to 
be used, however perhaps it gets considered as not being used since the value 
is 0 (e.g. cpus:1;foobars:0 == cpus:1).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)