[slurm-dev] Re: Single GPU server with error "Requested node configuration is not available"

Miguel Ángel Martínez del Amor Mon, 12 Aug 2013 12:30:40 -0700

Hi again,

I don't know how, but I have solved it. I have just renamed the filegres.conf and renamed back, changed some parameters on slurm.conf, goback to my desired configuration and reboot... and now works!

To who may feel interested on how I have managed to set differentnumbers of devices in the CUDA_VISIBLE_DEVICES, I have configured atask_prolog file, as follows:


---- tprolog.sh:
/
//EXP_D=`echo $CUDA_VISIBLE_DEVICES | tr "," " "`//
//
//for i in $EXP_D//
//do//
//        d=`expr $i + 2`//
//        if [ -z "$____firs__t__ime____" ]//
//        then//
//                NEW_DEVICES=$d//
//        else//
//                NEW_DEVICES=$NEW_DEVICES,$d//
//        fi//
//
//        ____firs__t__ime____=0//
//done//
//
//echo "export CUDA_VISIBLE_DEVICES=$NEW_DEVICES"/

-----

Now I think it should be enough for my single server.

Thanks and best,
Miguel


El 03/08/13 12:46, Miguel Ángel Martínez del Amor escribió:

Hi all,
I'm pretty new with SLURM. I'm moving from Grid Engine looking forbetter GPU management.
We have one server (Ubuntu server 12.04 64bits, SLURM 2.3.2) with 4GPUs, but they are specially distributed: device 0 is for testing,device 1 is a Fermi GPU (for testing as well), and devices 2 and 3(same GPU as device 0) are going to be managed by SLURM.
I have configured the slurm.conf as seen attached, and gres.conf asfollows:
/Name=gpu File=/dev/nvidia2 CPUs=[0-3]//
//Name=gpu File=/dev/nvidia3 CPUs=[4-7]/
My problem arises when I launch sbatch or srun, I got the followingerror (only when using --gres=gpu, if I delete --gres, it works fine):
/$ sbatch --gres=gpu:1 show_device.sh //
//sbatch: error: Batch job submission failed: Requested nodeconfiguration is not available//
//
//$ sbatch -n 2 --gres=gpu:2 show_device.sh //
//sbatch: error: Batch job submission failed: Requested nodeconfiguration is not available//
//
//$ srun -n 2 --gres=gpu:2 show_device.sh //
//srun: error: Unable to allocate resources: Requested nodeconfiguration is not available/
I guess something is wrong with my configuration. I think my problemis really related withhttps://groups.google.com/forum/#!topic/slurm-devel/duLt-jPBGp4<https://groups.google.com/forum/#%21topic/slurm-devel/duLt-jPBGp4>,but there is still no solution.
Moreover, do you think that SLURM is going to assign toCUDA_VISIBLE_DEVICES only devices 2 and 3, or it is going to assignfrom 0 (i.e. devices 0 and 1). Therefore, what do you suggest to me?Do I have to configure a pre-script adding 2 to each value inCUDA_VISIBLE_DEVICES? How can I do it automatically by default for anyuser?
Thank you very much in advance.

Best,
Miguel

P.S.: show_device.sh is just a script for testing and understanding SLURM:

/#!/bin/bash//
//
//echo Hostname=`hostname`//
//echo PWD=`pwd`//
//echo USER=`whoami`//
//echo PATH=$PATH//
//echo LD_LIBRARY_PATH=$LD_LIBRARY_PATH//
//echo CUDA_VISIBLE_DEVICES=$CUDA_VISIBLE_DEVICES//
//
/
--
Miguel Ángel Martínez del Amor, Ph.D.
Research Group on Natural Computing (RGNC).
Department of Computer Science and Artificial Intelligence.
E.T.S. Ingeniería Informática, 41012 Avda. Reina Mercedes.
University of Seville, Sevilla (Spain).
Webpage:http://www.gcn.us.es/mdelamor



--
Miguel Ángel Martínez del Amor, Ph.D.
Research Group on Natural Computing (RGNC).
Department of Computer Science and Artificial Intelligence.
E.T.S. Ingeniería Informática, 41012 Avda. Reina Mercedes.
University of Seville, Sevilla (Spain).
Webpage: http://www.gcn.us.es/mdelamor
Tel.: (+34)954 557 953

[slurm-dev] Re: Single GPU server with error "Requested node configuration is not available"

Reply via email to