[jira] [Commented] (MESOS-10192) Recent Nvidia CUDA changes break Mesos GPU support

2020-10-12 Thread Qian Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-10192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17212793#comment-17212793
 ] 

Qian Zhang commented on MESOS-10192:


commit 301902be4f1332799cf3b3242cd29b4907c21c09
Author: Qian Zhang 
Date: Sat Oct 10 15:04:57 2020 +0800

Ignored the directoy `/dev/nvidia-caps` when globing Nvidia GPU devices.
 
 The directory `/dev/nvidia-caps` was introduced in CUDA 11.0, just
 ignore it since we only care about the Nvidia GPU device files.
 
 Review: https://reviews.apache.org/r/72945

> Recent Nvidia CUDA changes break Mesos GPU support
> --
>
> Key: MESOS-10192
> URL: https://issues.apache.org/jira/browse/MESOS-10192
> Project: Mesos
>  Issue Type: Bug
>  Components: agent, containerization, gpu
>Reporter: Greg Mann
>Assignee: Qian Zhang
>Priority: Major
>  Labels: GPU, containerization, containerizer, gpu
>
> Recently it seems that the layout of the Nvidia device files has changed:  
> https://docs.nvidia.com/datacenter/tesla/mig-user-guide/
> This prevents GPU tasks from launching:
> {noformat}
> W0929 17:27:21.002178 65691 http.cpp:3436] Failed to launch container 
> c08e1fc7-53c4-427e-a1a1-85b770e77d69.738440a3-f4cc-42ce-8978-418ba0011160: 
> Failed to copy device '/dev/nvidia-caps': Failed to get source dev: Not a 
> special file: /dev/nvidia-caps
> {noformat}
> due to this code, which detects the nvidia device files: 
> https://github.com/apache/mesos/blob/8700dd8d5ece658804d7b7a40863800dcc5c72bc/src/slave/containerizer/mesos/isolators/gpu/isolator.cpp#L438-L454



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (MESOS-10157) Add documentation for the `volume/csi` isolator

2020-10-12 Thread Qian Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-10157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17212784#comment-17212784
 ] 

Qian Zhang commented on MESOS-10157:


commit 3e1e0b37d6a30a2c98d1227b4ac754b1d26686f3
Author: Qian Zhang 
Date: Wed Sep 9 10:26:52 2020 +0800

Added doc for the `volume/csi` isolator.
 
 Review: https://reviews.apache.org/r/72845

> Add documentation for the `volume/csi` isolator
> ---
>
> Key: MESOS-10157
> URL: https://issues.apache.org/jira/browse/MESOS-10157
> Project: Mesos
>  Issue Type: Task
>Reporter: Qian Zhang
>Assignee: Qian Zhang
>Priority: Major
>  Labels: docs, documentation
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (MESOS-10151) Introduce a new agent flag `--csi_plugin_config_dir`

2020-10-12 Thread Qian Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-10151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17212783#comment-17212783
 ] 

Qian Zhang commented on MESOS-10151:


commit 90e5434544da9886cd6f2d87b73e3246292af107
Author: Qian Zhang 
Date: Tue Oct 13 09:58:44 2020 +0800

Corrected the example of the managed CSI plugin.
 
 Review: https://reviews.apache.org/r/72846

> Introduce a new agent flag `--csi_plugin_config_dir`
> 
>
> Key: MESOS-10151
> URL: https://issues.apache.org/jira/browse/MESOS-10151
> Project: Mesos
>  Issue Type: Task
>Reporter: Qian Zhang
>Assignee: Qian Zhang
>Priority: Major
> Fix For: 1.11.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)