You apparently do not have Slurm configured properly. See
http://www.schedmd.com/slurmdocs/gres.html

Quoting Sa Li <[email protected]>:

> Hi, Slurm-fans
>
> In my cluster, each node is equipped with 8 GPUs. My program needs to know
> exactly the GPU device ID assigned to it because each job needs and only
> needs one GPU, like
>            ./myCode deviceID .....
>
> However, it seems that cuda always assigns gpu device0 to my program if
> slurm indicates job requesting only 1 gpu,
>
>              srun --gres=gpu:1 ./myCode device0 &
>
>
> I wish slurm can automatically detect which gpu has been taken, then move
> to next device in CUDA_VISIBLE_DEVICES, this seems not functional at this
> point.
>
> To get around this problem, I am using a dirty way by assigning 8 devices,
> and extract each deviceID from CUDA_VISIBLE_DEVICES variable for my app,
> like
>
>               for id in $CUDA_VISIBLE_DEVICES
>               do
>                      srun --gres=gpu:8 ./myCode $id  &
>               done
>
> This is definitely not a good solution, because CUDA_VISIBLE_DEVICES not
> actually match the cudaSetDevice() function we want to use. Here is the
> experiment we did.
> We want to map the CUDA_VISIBLE_DEVICES to cudaSetDevice(), we are
> assigning CUDA_VISIBLE_DEVICES 4,5,6,7 while running "srun --gres=gpu:4",
> it appears the above
> CUDA_VISIBLE_DEVICES map to 0,1,2,3 regardless. This is verified by
> monitoring the temperature of GPUs.
>
> Ideally, I wish slurm can detect whether a GPU is in use by observing the
> temperature or memory usage or others, before assign the job to this GPU.
> If this GPU is unavailable,
> it will incrementally pick the next GPU as slurm does for node assignment.
> I wonder anyone come across this problem before and figure out a solution.
>
> Thanks
>
> **--
> *Sa Li*
> *Senior Research Developer*
>
> www.pof.com <http://www.plentyoffish.com/>
>
> *P: *778.838.1018  |  *AIM: *[email protected] <[email protected]>  |  *Skype:  
> *sa_li_cn
> | *Fb: *http://www.facebook.com/sa.li.cn
>
> STRICTLY PERSONAL AND CONFIDENTIAL. This email and any files transmitted
> with it may contain confidential and proprietary material for the sole use
> of the intended recipient. Any review or distribution by others is strictly
> prohibited. If you are not the intended recipient please contact the sender
> and delete all copie
>

Reply via email to