What you have done is probably the best solution without changing the
communications protocol between the slurmd and slurmstepd processes
and adding changes to the gres plugin functions, which I do not want
to do in SLURM version 2.3. Here are the patches which I have made
for version 2.4:
https://github.com/SchedMD/slurm/commit/d703d2ec4d0db88f52ad249199dfb61020eeb277.patch
https://github.com/SchedMD/slurm/commit/ed06372a815d73f60d345927fd40fe30eccc3a37.patch
https://github.com/SchedMD/slurm/commit/bccf0f8542ad98f7df8de3750df149563bd37eb6.patch
The first two are almost identical to your patches 0002 and 0003,
although very slightly changes have been made for performance. The
third patch eliminates the need for each slurmstepd process to read
the gres.conf file, but does include more changes than I would like to
make in SLURM version 2.3, but you are welcome to make the change as a
local patch to version 2.3 if you like.
Thank you for you work.
Quoting Nicolas Bigaouette <nbigaoue...@gmail.com>:
Hi Moe,
Thanks for the hints.
Here's a patch set that implement option 3). Branched from
9a48840da4feb7a5810b3024886423b38cdb3bb7. Also available on my github:
https://github.com/nbigaouette/slurm
You will also find attached a patch (0001-Ignore-temp-files.patch) which
sets git to ignore built files.
Options 1) and 2) requires deeper understanding of the code which I don't
have for now. Reading /etc/slurm/gres.conf each time a job is run is
definitely not perfect, but it shouldn't make things slower, except maybe
for a large number of small jobs (a couple of seconds say).
Nicolas
On Mon, Jan 30, 2012 at 1:31 PM, Moe Jette <je...@schedmd.com> wrote:
The gres.conf file is ready by the slurmd daemon while the task launch and
claiming GRES is done by the slurmstepd job step shepherd. Here are some
options:
1. Move the gres_plugin_step_set_env() and gres_plugin_job_set_env() calls
from slurmstepd to slurmd (this is probably the most efficient solution,
but could break gres plugins that other people have developed)
2. Add a new gres plugin call for the slurmd to set the CUDA environment
variables based upon file names that it already has and leave the other
function calls in slurmstepd (more work, will not break any gres plugins
developed by other people but still very efficient) OR
3. Modify slurmstepd to read gres.conf to get the nvidia device numbers
(the simplest solution, but requires extra overhead for each job launch)
Quoting Nicolas Bigaouette <nbigaoue...@gmail.com>:
Hi Moe, thanks for the indications.
On Tue, Jan 24, 2012 at 4:15 PM, Moe Jette <je...@schedmd.com> wrote:
In the case of gres/gpu, it just sets the CUDA_VISIBLE_DEVICES
environment
variable based upon the position(s) in the bitmap allocated to the job or
step.
Probably the simplest way to get the correct environment variable would
be
to modify node_config_load() to cache device numbers and then use those
device numbers rather than the bitmap index to set CUDA_VISIBLE_DEVICES
values in job_set_env() and step_set_env()
This is exactly what I'm trying to do. I'm familiarizing myself with the
code and experimenting some stuff. Unfortunately, I don't see how
information can be "transfered" from node_config_load() to job_set_env().
node_config_load() only takes the file entries as input arguments and does
not have any output variables. Also, a variable global to the gres_gpu.c
file does not work as, it seems, job_set_env() is executed as a different
process then node_config_load() and as such is not sharing memory. It
might
not be exactly this situation, but the memory is definitely not shared and
thus job_set_env() cannot access variables set by node_config_load().
So either there is a simple way for that sharing of information that I did
not found, or information will have to be passed through function
arguments. But then that would change the API...
I hope I'm just missing something obvious somewhere! How is
node_config_load() supposed to configure anything if job_set_env() can't
have access to that information?
Thanks
Nicolas