that change will not work for a batch job or a Cray system.

Quoting Albert Solernou <[email protected]>:


Hi all,
could this condition be "if (req->job_step_id == 0)" instead of "if
(first_job_run)" in "_rpc_launch_tasks"? Then the call to the prolog in
"_rpc_batch_job" could be removed.

Aparently works OK for me, but I don't know if this would affect any
abnormal job.

Could you confirm if this would work for every case?

Best,
Albert

On 28/11/13 12:14, Albert Solernou wrote:

Hi Moe,
Thanks for the reply. Moving the call to gres_plugin_job_set_env() is
definitely non-trivial, however I imagine a possible workaround:

 _rpc_launch_tasks in req.c runs the prolog:
  "if !slurm_cred_jobid_cached(conf->vctx, req->job_id);"
(line 1073 in req.c v. 2.6.3)
I understand that "_rpc_batch_job" runs the prolog on his own because at
"_rpc_launch_task" it returns false to this condition. If that is the
case, and there is no other reason to run the "_rpc_batch_job" prolog
earlier, then we'd be able to find a new condition so that the prolog is
only called from "_rpc_launch_task".

Could you confirm my assumption about the prolog calls? Do you have any
suggestion on how this new condition should be formulated?

Thanks,
Albert



On 27/11/13 23:08, Moe Jette wrote:

This is definitely a non-trivial change. The call to the function
gres_plugin_job_set_env() would need to be moved from the slurmstepd
process to the slurmd daemon (before the prolog runs) and then that
environment variable would need to be passed to the prolog.

Moe Jette
SchedMD LLC

Quoting Albert Solernou <[email protected]>:


Hi all,
I may need some extra help.

I successfully modified req.c to pass to the "prolog environment" a
user environment variable that I defined, say CUDA_SET_COMPUTE_MODE.
However, I am still missing CUDA_VISIBLE_DEVICES.

When slurmd goes through _rpc_batch_job it runs the prolog. However,
CUDA_VISIBLE_DEVICES is not there yet (the slurm_msg_t that the
function handles does not have this variable within the
req->environment). It will be later, when slurmd passes through
_rpc_launch_tasks that $CUDA_VISIBLE_DEVICES is set (in its req->env),
but now it is too late.

Could you give me some hints on how to get CUDA_VISIBLE_DEVICES in
req.c:_rpc_batch_job? That would definitely speed things up.

Thanks in advance,
Albert




On Wed 20 Nov 2013 14:15:12 GMT, Albert Solernou wrote:

Thanks for the quick answer, Moe.

I'd be trying that, and let you know.

Best,
Albert

On Wed 20 Nov 2013 14:09:12 GMT, [email protected] wrote:

Your easiest option would be to modify the Slurm code to export
whatever additional environment variables that you want, which should
be pretty simple. See the function _build_env() in
src/slurmd/slurmd/req.c. If you make changes and send us the patch, we
can include it in the canonical code base.

Moe Jette
SchedMD LLC

On 2013-11-20 05:05, Albert Solernou wrote:
Hi,
I'd like to write a prolog script that changes the GPU compute mode of
the allocated GPU card(s). This change can only be done by root. My
initial idea was that the prolog scipt would use an environment
variable
as a switch.

The problem that I face are:
 - prolog or prologctld have a reduced amount of environment
variables.
Specifically, they miss "CUDA_VISIBLE_DEVICE" assigned by the GRes
plugin, as well as any user environment flag.


Is there an easy workaround? Will I have to patch the current GRes
plugin or to tinker with a new spank plugin?

Any help is welcome!

Regards,
Albert

--
---------------------------------
  Dr. Albert Solernou
  Research Associate
  Oxford Supercomputing Centre,
  University of Oxford
  Tel: +44 (0)1865 610631
---------------------------------

--
---------------------------------
  Dr. Albert Solernou
  Research Associate
  Oxford Supercomputing Centre,
  University of Oxford
  Tel: +44 (0)1865 610631
---------------------------------




--
---------------------------------
  Dr. Albert Solernou
  Research Associate
  Oxford Supercomputing Centre,
  University of Oxford
  Tel: +44 (0)1865 610631
---------------------------------


Reply via email to