Hi Carles,

we have just finalized the code and you are welcome to test it and send us 
back your comments! 
You can either pull the latest development version from 
https://github.com/SchedMD/slurm/ (some last patches added on Monday so 
make sure you pull the latest version)
or wait for the next release of 2.3.0 version which should be out shortly 
as Moe announced some days ago.

By the way the devices part of cgroups is still considered as experimental 
from the kernel side but I hope that it will become stable by the end of 
the year.

I don't know if you will be attending the User Group on September but 
there will be a tutorial session dedicated on the cgroups support upon 
SLURM and there will be discussions about future developments on the 
subject.

Concerning your observation I think that this is the expected behaviour 
but perhaps Moe could answer us better on this one.

Regards,
Yiannis Georgiou




De :    Carles Fenoy <[email protected]>
A :     [email protected]
Date :  07/22/2011 03:35 PM
Objet : [slurm-dev] Status of cgroups implementation
Envoyé par :    [email protected]



Hi all,

We are considering using cgroups in a new GPU cluster, and I want to know 
which is the current status of the devices part of the cgroups plugin.

We have also observed that the tasks, of a job requesting gres, that don't 
request generic resources explicitly are not assigned any resources. 
Example:

A job request 2 gpus with

sbatch --gres=gpu:1 --ntasks=2 --cpus-per-task=2 --wrap="env; srun env | 
grep CUDA"

The first env shows:
CUDA_VISIBLE_DEVICES=0

although "srun env" shows:
CUDA_VISIBLE_DEVICES=NoDevFiles
CUDA_VISIBLE_DEVICES=NoDevFiles

Is this the expected behavior?

Maybe if a job request gres and its steps don't, slurmstepd should not 
overwrite the job environment in:
        
gres_gpu.c(211):

        } else {
                /* The gres.conf file must identify specific device files
                 * in order to set the CUDA_VISIBLE_DEVICES env var */
                env_array_overwrite(job_env_ptr,"CUDA_VISIBLE_DEVICES",
                                    "NoDevFiles");
        }


-- 
--
Carles Fenoy

Reply via email to