On Mon, 1 Aug 2011 14:14:55 -0700, "[email protected]" <[email protected]> wrote: > The current logic requires job steps to explicitly request the generic > resources (GRES, e.g. GPUs) to be allocated. This decision was based > upon users commonly running many job steps within a job allocation and > using different resources for each job step. If a job step inherits > all of the job's GRES by default, that would require job steps to > explicitly request no GRES if desired > (e.g. "srun --gres=gpu:0 ..."). This may not be the best design for > all users, but it is what exists today.
The only problem with this approach is that it makes the common case more difficult (most of the time users run a single job step per allocation), in order to satisfy the uncommon case. Could this behavior be made configurable? mark > Moe > > > > Quoting Carles Fenoy <[email protected]>: > > > Hi all, > > > > We are considering using cgroups in a new GPU cluster, and I want to know > > which is the current status of the devices part of the cgroups plugin. > > > > We have also observed that the tasks, of a job requesting gres, that don't > > request generic resources explicitly are not assigned any resources. > > Example: > > > > A job request 2 gpus with > > > > sbatch --gres=gpu:1 --ntasks=2 --cpus-per-task=2 --wrap="env; srun env | > > grep CUDA" > > > > The first env shows: > > CUDA_VISIBLE_DEVICES=0 > > > > although "srun env" shows: > > CUDA_VISIBLE_DEVICES=NoDevFiles > > CUDA_VISIBLE_DEVICES=NoDevFiles > > > > Is this the expected behavior? > > > > Maybe if a job request gres and its steps don't, slurmstepd should not > > overwrite the job environment in: > > > > gres_gpu.c(211): > > > > } else { > > /* The gres.conf file must identify specific device files > > * in order to set the CUDA_VISIBLE_DEVICES env var */ > > env_array_overwrite(job_env_ptr,"CUDA_VISIBLE_DEVICES", > > "NoDevFiles"); > > } > > > > > > -- > > -- > > Carles Fenoy > > > > > >
