Re: [slurm-dev] Status of cgroups implementation

Jerry Smith Wed, 27 Jul 2011 08:24:01 -0700

This is great news, both the cgroups code being finalized and a tutorialbeing offered at the User's Group.

I will be pulling the latest version and running some tests on our side,and am now going to sign up for the User's Group Meeting.


--Jerry

[email protected] wrote:

Hi Carles,
we have just finalized the code and you are welcome to test it andsend us back your comments!You can either pull the latest development version fromhttps://github.com/SchedMD/slurm/ (some last patches added on Mondayso make sure you pull the latest version)or wait for the next release of 2.3.0 version which should be outshortly as Moe announced some days ago.
By the way the devices part of cgroups is still considered asexperimental from the kernel side but I hope that it will becomestable by the end of the year.
I don't know if you will be attending the User Group on September butthere will be a tutorial session dedicated on the cgroups support uponSLURM and there will be discussions about future developments on thesubject.
Concerning your observation I think that this is the expectedbehaviour but perhaps Moe could answer us better on this one.
Regards,
Yiannis Georgiou




De :        Carles Fenoy <[email protected]>
A :        [email protected]
Date :        07/22/2011 03:35 PM
Objet :        [slurm-dev] Status of cgroups implementation
Envoyé par :        [email protected]
------------------------------------------------------------------------



Hi all,
We are considering using cgroups in a new GPU cluster, and I want toknow which is the current status of the devices part of the cgroupsplugin.
We have also observed that the tasks, of a job requesting gres, thatdon't request generic resources explicitly are not assigned anyresources. Example:
A job request 2 gpus with
sbatch --gres=gpu:1 --ntasks=2 --cpus-per-task=2 --wrap="env; srun env| grep CUDA"
The first env shows:
CUDA_VISIBLE_DEVICES=0

although "srun env" shows:
CUDA_VISIBLE_DEVICES=NoDevFiles
CUDA_VISIBLE_DEVICES=NoDevFiles

Is this the expected behavior?
Maybe if a job request gres and its steps don't, slurmstepd should notoverwrite the job environment in:gres_gpu.c(211):
        } else {
                /* The gres.conf file must identify specific device files
                 * in order to set the CUDA_VISIBLE_DEVICES env var */
                env_array_overwrite(job_env_ptr,"CUDA_VISIBLE_DEVICES",
                                    "NoDevFiles");
        }


--
--
Carles Fenoy

Re: [slurm-dev] Status of cgroups implementation

Reply via email to