Danny and I discussed this and decided that perhaps the best option would be for job steps to get the same generic resources as the job by default. The change below does that and permits the job step to specific --gres=none or set SLURM_STEP_GRES=none to give the step no generic resources.

Moe Jette


diff --git a/doc/man/man1/srun.1 b/doc/man/man1/srun.1
index 1bcf9f6..00ba95c 100644
--- a/doc/man/man1/srun.1
+++ b/doc/man/man1/srun.1
@@ -1,4 +1,4 @@
-.TH "srun" "1" "SLURM 2.3" "July 2011" "SLURM Commands"
+.TH "srun" "1" "SLURM 2.3" "August 2011" "SLURM Commands"

 .SH "NAME"
 srun \- Run parallel jobs
@@ -423,7 +423,12 @@ The available generic consumable resources is configurable by the system
 administrator.
 A list of available generic consumable resources will be printed and the
 command will exit if the option argument is "help".
-Examples of use include "\-\-gres=gpus:2*cpu,disk=40G" and "\-\-gres=help".
+Examples of use include "\-\-gres=gpu:2*cpu,disk=40G" and "\-\-gres=help".
+NOTE: By default, a job step is allocated all of the generic resources that
+have allocated to the job. To change the behavior so that each job step is
+allocated no generic resources, explicitly set the value of \-\-gres to specify
+zero counts for each generic resource OR set "\-\-gres=none" OR set the
+SLURM_STEP_GRES environment variable to "none".

 .TP
 \fB\-H, \-\-hold\fR
@@ -1521,7 +1526,7 @@ Also see \fBSLURM_EXIT_ERROR\fR.
 Same as \fB\-g, \-\-geometry\fR
 .TP
 \fBSLURM_GRES\fR
-Same as \fB\-\-gres\fR
+Same as \fB\-\-gres\fR. Also see \fBSLURM_STEP_GRES\fR
 .TP
 \fBSLURM_JOB_NAME\fR
 Same as \fB\-J, \-\-job\-name\fR except within an existing
@@ -1602,6 +1607,10 @@ Same as \fB\-e, \-\-error\fR
 \fBSLURM_STDINMODE\fR
 Same as \fB\-i, \-\-input\fR
 .TP
+\fBSLURM_STEP_GRES\fR
+Same as \fB\-\-gres\fR (only applies to job steps, not to job allocations).
+Also see \fBSLURM_GRES\fR
+.TP
 \fBSLURM_STDOUTMODE\fR
 Same as \fB\-o, \-\-output\fR
 .TP
diff --git a/src/slurmctld/step_mgr.c b/src/slurmctld/step_mgr.c
index da2ec58..5d3aec3 100644
--- a/src/slurmctld/step_mgr.c
+++ b/src/slurmctld/step_mgr.c
@@ -1813,6 +1813,10 @@ step_create(job_step_create_request_msg_t *step_specs,
        if (step_specs->no_kill > 1)
                step_specs->no_kill = 1;

+       if (step_specs->gres && !strcasecmp(step_specs->gres, "NONE"))
+               xfree(step_specs->gres);
+       else if (step_specs->gres == NULL)
+               step_specs->gres = xstrdup(job_ptr->gres);
        i = gres_plugin_step_state_validate(step_specs->gres, &step_gres_list,
                                            job_ptr->gres_list, job_ptr->job_id,
                                            NO_VAL);
diff --git a/src/srun/allocate.c b/src/srun/allocate.c
index f746a22..70d665e 100644
--- a/src/srun/allocate.c
+++ b/src/srun/allocate.c
@@ -775,7 +775,10 @@ create_job_step(srun_job_t *job, bool use_all_cpus)

        if (opt.mem_per_cpu != NO_VAL)
                job->ctx_params.mem_per_cpu = opt.mem_per_cpu;
-       job->ctx_params.gres = opt.gres;
+       if (opt.gres)
+               job->ctx_params.gres = opt.gres;
+       else
+               job->ctx_params.gres = getenv("SLURM_STEP_GRES");

        if (use_all_cpus)
                job->ctx_params.cpu_count = job->cpu_count;

Quoting [email protected]:

Both modes of operation are quite common (one step or many steps in a
job allocation). I believe that having the behavior configurable by job
using an environment variable or command line option would be ideal,
but it does not exist today.

Moe


Quoting "Mark A. Grondona" <[email protected]>:

On Mon, 1 Aug 2011 14:14:55 -0700, "[email protected]" <[email protected]> wrote:
The current logic requires job steps to explicitly request the generic
resources (GRES, e.g. GPUs) to be allocated. This decision was based
upon users commonly running many job steps within a job allocation and
using different resources for each job step. If a job step inherits
all of the job's GRES by default, that would require job steps to
explicitly request no GRES if desired
(e.g. "srun --gres=gpu:0 ..."). This may not be the best design for
all users, but it is what exists today.


The only problem with this approach is that it makes the common case
more difficult (most of the time users run a single job step per
allocation), in order to satisfy the uncommon case.

Could this behavior be made configurable?

mark


Moe



Quoting Carles Fenoy <[email protected]>:

Hi all,

We are considering using cgroups in a new GPU cluster, and I want to know
which is the current status of the devices part of the cgroups plugin.

We have also observed that the tasks, of a job requesting gres, that don't
request generic resources explicitly are not assigned any resources.
Example:

A job request 2 gpus with

sbatch --gres=gpu:1 --ntasks=2 --cpus-per-task=2 --wrap="env; srun env |
grep CUDA"

The first env shows:
CUDA_VISIBLE_DEVICES=0

although "srun env" shows:
CUDA_VISIBLE_DEVICES=NoDevFiles
CUDA_VISIBLE_DEVICES=NoDevFiles

Is this the expected behavior?

Maybe if a job request gres and its steps don't, slurmstepd should not
overwrite the job environment in:

gres_gpu.c(211):

        } else {
                /* The gres.conf file must identify specific device files
                 * in order to set the CUDA_VISIBLE_DEVICES env var */
                env_array_overwrite(job_env_ptr,"CUDA_VISIBLE_DEVICES",
                                    "NoDevFiles");
        }


--
--
Carles Fenoy








diff --git a/doc/man/man1/srun.1 b/doc/man/man1/srun.1
index 1bcf9f6..00ba95c 100644
--- a/doc/man/man1/srun.1
+++ b/doc/man/man1/srun.1
@@ -1,4 +1,4 @@
-.TH "srun" "1" "SLURM 2.3" "July 2011" "SLURM Commands"
+.TH "srun" "1" "SLURM 2.3" "August 2011" "SLURM Commands"
 
 .SH "NAME"
 srun \- Run parallel jobs
@@ -423,7 +423,12 @@ The available generic consumable resources is configurable by the system
 administrator.
 A list of available generic consumable resources will be printed and the
 command will exit if the option argument is "help".
-Examples of use include "\-\-gres=gpus:2*cpu,disk=40G" and "\-\-gres=help".
+Examples of use include "\-\-gres=gpu:2*cpu,disk=40G" and "\-\-gres=help".
+NOTE: By default, a job step is allocated all of the generic resources that
+have allocated to the job. To change the behavior so that each job step is
+allocated no generic resources, explicitly set the value of \-\-gres to specify
+zero counts for each generic resource OR set "\-\-gres=none" OR set the
+SLURM_STEP_GRES environment variable to "none".
 
 .TP
 \fB\-H, \-\-hold\fR
@@ -1521,7 +1526,7 @@ Also see \fBSLURM_EXIT_ERROR\fR.
 Same as \fB\-g, \-\-geometry\fR
 .TP
 \fBSLURM_GRES\fR
-Same as \fB\-\-gres\fR
+Same as \fB\-\-gres\fR. Also see \fBSLURM_STEP_GRES\fR
 .TP
 \fBSLURM_JOB_NAME\fR
 Same as \fB\-J, \-\-job\-name\fR except within an existing
@@ -1602,6 +1607,10 @@ Same as \fB\-e, \-\-error\fR
 \fBSLURM_STDINMODE\fR
 Same as \fB\-i, \-\-input\fR
 .TP
+\fBSLURM_STEP_GRES\fR
+Same as \fB\-\-gres\fR (only applies to job steps, not to job allocations).
+Also see \fBSLURM_GRES\fR
+.TP
 \fBSLURM_STDOUTMODE\fR
 Same as \fB\-o, \-\-output\fR
 .TP
diff --git a/src/slurmctld/step_mgr.c b/src/slurmctld/step_mgr.c
index da2ec58..5d3aec3 100644
--- a/src/slurmctld/step_mgr.c
+++ b/src/slurmctld/step_mgr.c
@@ -1813,6 +1813,10 @@ step_create(job_step_create_request_msg_t *step_specs,
 	if (step_specs->no_kill > 1)
 		step_specs->no_kill = 1;
 
+	if (step_specs->gres && !strcasecmp(step_specs->gres, "NONE"))
+		xfree(step_specs->gres);
+	else if (step_specs->gres == NULL)
+		step_specs->gres = xstrdup(job_ptr->gres);
 	i = gres_plugin_step_state_validate(step_specs->gres, &step_gres_list,
 					    job_ptr->gres_list, job_ptr->job_id,
 					    NO_VAL);
diff --git a/src/srun/allocate.c b/src/srun/allocate.c
index f746a22..70d665e 100644
--- a/src/srun/allocate.c
+++ b/src/srun/allocate.c
@@ -775,7 +775,10 @@ create_job_step(srun_job_t *job, bool use_all_cpus)
 
 	if (opt.mem_per_cpu != NO_VAL)
 		job->ctx_params.mem_per_cpu = opt.mem_per_cpu;
-	job->ctx_params.gres = opt.gres;
+	if (opt.gres)
+		job->ctx_params.gres = opt.gres;
+	else
+		job->ctx_params.gres = getenv("SLURM_STEP_GRES");
 
 	if (use_all_cpus)
 		job->ctx_params.cpu_count = job->cpu_count;

Reply via email to