Dear Kota,

Appreciate the feedback. I will read up the latest documentation when the time 
comes to configure. Thank you for your detailed email and will indeed read your 
blog.

Regards

Tim


________________________________
From: slurm-users <[email protected]> on behalf of Kota 
Tsuyuzaki <[email protected]>
Sent: Wednesday, 21 April 2021 11:09
To: 'Slurm User Community List' <[email protected]>; 
[email protected] <[email protected]>
Subject: Re: [slurm-users] SLURM A100


CAUTION: This email originated outside the UCT network. Do not click any links 
or open attachments unless you know and trust the source.


Hello Tim,

In the last year, I figured out the A100 MIG feature behavior with Slurm 
Workload Manager. At that time, it required non-default
DEVFS mode in kernel config to constraint the MIG device via Slurm cgroup. 
After the setting, A100 MIG works well to me so I suppose
it should NOT be blocking issue except you need to have the configuration.

My testing NVIDIA driver version was at 450.51.06, and the mode was not default 
at that time but the NVIDIA documents said the DEVFS
mode will be default in the future so that you should check the current newest 
docs if you mind the kernel setting.

The procedure how we can configure the DEVFS mode to A100 was written to my 
blog post(*1). It's so sorry that was in Japanese but
hopefully, the setting scripts and web links to NVIDIA official documents would 
be helpful for you. Perhaps, google translation too.

1: 
https://medium.com/nttlabs/nvidia-a100-mig-as-linux-device-66220ca16698<https://medium.com/nttlabs/nvidia-a100-mig-as-linux-device-66220ca16698>

Best,


--------------------------------------------
露崎 浩太 (Kota Tsuyuzaki)
[email protected]
NTTソフトウェアイノベーションセンタ
分散処理基盤技術プロジェクト
0422-59-2837
---------------------------------------------


> -----Original Message-----
> From: slurm-users <[email protected]> On Behalf Of 
> Timothy Carr
> Sent: Wednesday, April 21, 2021 4:14 PM
> To: [email protected]
> Subject: [slurm-users] SLURM A100
>
> Dear Community,
>
> Trust everyone is well and keeping safe?
>
> We are considering the purchase of nodes with the Nvidia A100 GPUs and 
> enabling the MIG feature which allows for the
> creation of instance resource profiles. The creation of these profiles seems 
> to be straightforward as per the
> documentation. Have any of you had the opportunity to implement the A100 MIG 
> with SLURM and have you found any
> caveats you are willing to share?
>
> Kind Regards
>
> --
> Tim
>
>
>
>
> Disclaimer - University of Cape Town This email is subject to UCT policies 
> and email disclaimer published on our website
> at 
> http://www.uct.ac.za/main/email-disclaimer<http://www.uct.ac.za/main/email-disclaimer>
>  or obtainable from +27 21 650 9111. If this email is not related to the
> business of UCT, it is sent by the sender in an individual capacity. Please 
> report security incidents or abuse via
> https://csirt.uct.ac.za/page/report-an-incident.php<https://csirt.uct.ac.za/page/report-an-incident.php>.


Disclaimer - University of Cape Town This email is subject to UCT policies and 
email disclaimer published on our website at 
http://www.uct.ac.za/main/email-disclaimer or obtainable from +27 21 650 9111. 
If this email is not related to the business of UCT, it is sent by the sender 
in an individual capacity. Please report security incidents or abuse via 
https://csirt.uct.ac.za/page/report-an-incident.php.

Reply via email to