Re: [slurm-dev] gres.conf's "File=" flag ignore

Moe Jette Tue, 24 Jan 2012 13:15:49 -0800

I had briefly looked at the gres code awhile ago and I could not
make sense of how it worked. Perhaps it would be really helpful
if Moe or Danny sent a brief explanation to the list.

Almost all of the relevant code is in src/common/gres.c and the keydata structures are in src/common/gres.h. There are gres_list fieldsassociated with nodes, jobs and steps. The list contains a key (thegres type) and a pointer to a structure of the appropriate type. Ifthere are no file names or topology defined in the gres.conf file(read by slurmd on the compute node), then the code just needs to keeptrack of the count of available and allocated gres of each type.Otherwise it needs to manage bitmaps of available gres (available andallocated). The gres plugins are optional. In the case of gres/gpu, itjust sets the CUDA_VISIBLE_DEVICES environment variable based upon theposition(s) in the bitmap allocated to the job or step.

Probably the simplest way to get the correct environment variablewould be to modify node_config_load() to cache device numbers and thenuse those device numbers rather than the bitmap index to setCUDA_VISIBLE_DEVICES values in job_set_env() and step_set_env()

Re: [slurm-dev] gres.conf's "File=" flag ignore

Reply via email to