Dear all,

first, let me say that we do not use ConstrainDevice in our setup, so we have to rely on CUDA_VISIBLE_DEVICES to ensure that user applications use the correct GPU that they have allocated on our multi-GPU nodes. This seemed to work well for quite some time on our homogenous nodes, but now that we have a heterogenous node with three different GPU architectures present, I have noticed that the way SLURM sets CUDA_VISIBLE_DEVICES does in no way conform with how CUDA actually interprets this variable.

It is my understanding that when ConstrainDevices is not set to "yes", SLURM uses the so called "Minor Number" (nvidia-smi -q | grep Minor) that is the number in the device name (/dev/nvidia0 -> ID 0 and so on) and puts it in the environment variable. This, however, does not necessarily match the device index in neither nvml nor CUDA API, nor does it correlate with the device IDs in CUDA_VISIBLE_DEVICES.

By default, CUDA uses a heuristic called FASTEST_FIRST to determine the order with respect to CUDA_VISIBLE_DEVICES, making the fastest GPU device 0 but leaving the rest of the devices unspecified (see [1]).  This behaviour can be overridden by also setting CUDA_DEVICE_ORDER=PCI_BUS_ID, but even then, it is not guaranteed that the order of the devices under /dev match the order of the PCI bus IDs.

Long story short, with the IDs that SLURM puts in CUDA_VISIBLE_DEVICES, we do not get the right devices selected by CUDA applications which can easily be verified with e.g. deviceQuery from the CUDA samples.

I currently do not see a way to fix this properly without interfacing to the CUDA RT, or at least using NVML/nvidia-smi to get the GPU UUIDs, which can also be used in CUDA_VISIBLE_DEVICES and would make this entire mess a lot more intuitive. It seems, though, that we have to patch the gres plugin in order to achieve this.

Any thoughts?

[1] http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#env-vars

--
Maik Schmidt
HPC Services

Technische Universität Dresden
Zentrum für Informationsdienste und Hochleistungsrechnen (ZIH)
Willers-Bau A116
D-01062 Dresden
Telefon: +49 351 463-32836


Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Reply via email to