You apparently do not have Slurm configured properly. See http://www.schedmd.com/slurmdocs/gres.html
Quoting Sa Li <[email protected]>: > Hi, Slurm-fans > > In my cluster, each node is equipped with 8 GPUs. My program needs to know > exactly the GPU device ID assigned to it because each job needs and only > needs one GPU, like > ./myCode deviceID ..... > > However, it seems that cuda always assigns gpu device0 to my program if > slurm indicates job requesting only 1 gpu, > > srun --gres=gpu:1 ./myCode device0 & > > > I wish slurm can automatically detect which gpu has been taken, then move > to next device in CUDA_VISIBLE_DEVICES, this seems not functional at this > point. > > To get around this problem, I am using a dirty way by assigning 8 devices, > and extract each deviceID from CUDA_VISIBLE_DEVICES variable for my app, > like > > for id in $CUDA_VISIBLE_DEVICES > do > srun --gres=gpu:8 ./myCode $id & > done > > This is definitely not a good solution, because CUDA_VISIBLE_DEVICES not > actually match the cudaSetDevice() function we want to use. Here is the > experiment we did. > We want to map the CUDA_VISIBLE_DEVICES to cudaSetDevice(), we are > assigning CUDA_VISIBLE_DEVICES 4,5,6,7 while running "srun --gres=gpu:4", > it appears the above > CUDA_VISIBLE_DEVICES map to 0,1,2,3 regardless. This is verified by > monitoring the temperature of GPUs. > > Ideally, I wish slurm can detect whether a GPU is in use by observing the > temperature or memory usage or others, before assign the job to this GPU. > If this GPU is unavailable, > it will incrementally pick the next GPU as slurm does for node assignment. > I wonder anyone come across this problem before and figure out a solution. > > Thanks > > **-- > *Sa Li* > *Senior Research Developer* > > www.pof.com <http://www.plentyoffish.com/> > > *P: *778.838.1018 | *AIM: *[email protected] <[email protected]> | *Skype: > *sa_li_cn > | *Fb: *http://www.facebook.com/sa.li.cn > > STRICTLY PERSONAL AND CONFIDENTIAL. This email and any files transmitted > with it may contain confidential and proprietary material for the sole use > of the intended recipient. Any review or distribution by others is strictly > prohibited. If you are not the intended recipient please contact the sender > and delete all copie >
