Re: [slurm-dev] Status of cgroups implementation

Mark A. Grondona Tue, 02 Aug 2011 10:23:38 -0700

On Mon, 1 Aug 2011 14:14:55 -0700, "[email protected]" <[email protected]> 
wrote:
> The current logic requires job steps to explicitly request the generic  
> resources (GRES, e.g. GPUs) to be allocated. This decision was based  
> upon users commonly running many job steps within a job allocation and  
> using different resources for each job step. If a job step inherits  
> all of the job's GRES by default, that would require job steps to  
> explicitly request no GRES if desired
> (e.g. "srun --gres=gpu:0 ..."). This may not be the best design for  
> all users, but it is what exists today.



The only problem with this approach is that it makes the common case
more difficult (most of the time users run a single job step per
allocation), in order to satisfy the uncommon case.

Could this behavior be made configurable?

mark

 
> Moe
> 
> 
> 
> Quoting Carles Fenoy <[email protected]>:
> 
> > Hi all,
> >
> > We are considering using cgroups in a new GPU cluster, and I want to know
> > which is the current status of the devices part of the cgroups plugin.
> >
> > We have also observed that the tasks, of a job requesting gres, that don't
> > request generic resources explicitly are not assigned any resources.
> > Example:
> >
> > A job request 2 gpus with
> >
> > sbatch --gres=gpu:1 --ntasks=2 --cpus-per-task=2 --wrap="env; srun env |
> > grep CUDA"
> >
> > The first env shows:
> > CUDA_VISIBLE_DEVICES=0
> >
> > although "srun env" shows:
> > CUDA_VISIBLE_DEVICES=NoDevFiles
> > CUDA_VISIBLE_DEVICES=NoDevFiles
> >
> > Is this the expected behavior?
> >
> > Maybe if a job request gres and its steps don't, slurmstepd should not
> > overwrite the job environment in:
> >
> > gres_gpu.c(211):
> >
> >         } else {
> >                 /* The gres.conf file must identify specific device files
> >                  * in order to set the CUDA_VISIBLE_DEVICES env var */
> >                 env_array_overwrite(job_env_ptr,"CUDA_VISIBLE_DEVICES",
> >                                     "NoDevFiles");
> >         }
> >
> >
> > --
> > --
> > Carles Fenoy
> >
> 
> 
> 
>

Re: [slurm-dev] Status of cgroups implementation

Reply via email to