What do you mean by "slurm will submit jobs to /dev/nvidia1"?

If you have task/cgroup configured, then those specific pathnames should be in the job's cgroup and other device files should not, although I have not personally verify cgroup support for gres files.

If you mean that SLURM sets the CUDA_VISIBLE_DEVICES environment variable to 1 to represent the second GPU in SLURM's table (starting the counting at zero) and the CUDA software treats that as representing /dev/nvidia1, I could see that happening. If that is the problem, the environment variable should probably be set based upon the device file name rather than it's index number within the SLURM gres.conf file. Not a trivial change. The relevant logic is the call to gres_plugin_job_set_env() from slurmd/slurmstepd/slurmstepd.c

Quoting Nicolas Bigaouette <nbigaoue...@gmail.com>:

Hi all,

I'm using slurm to submit jobs on workstations which have 3 GPUs. Two are
used for GPGPU and one is used for the monitor. Clearly, I don't want jobs
submited to the monitor card.

Thus, my /etc/slurm/gres.conf contains the following:

Name=gpu File=/dev/nvidia0
Name=gpu File=/dev/nvidia2

Note that /dev/nvidia1 is not present there: this dev entry is the
underpowered card that drives the user's monitor.

Unfortunately, it seems slurm will submit jobs to /dev/nvidia1 when
/dev/nvidia0 is busy (or if asking for a job with 2 gpu).

I've checked the source to see where the decision was taken. In
src/plugins/gres/gpu/gres_gpu.c, the function job_set_env() sets
"CUDA_VISIBLE_DEVICES", used to identify the device chosen by slurm for
cuda to run on.

I have trouble understanding where the choice of which gpu device is taken.
I think this information is encoded in gres_job_ptr->gres_bit_alloc[0] and
that job_set_env() is fine, but can't find the logic to where this is set.

Anyone could provide a clue?

Thanks a lot.

Nicolas




Reply via email to