Hi all, could this condition be "if (req->job_step_id == 0)" instead of "if (first_job_run)" in "_rpc_launch_tasks"? Then the call to the prolog in "_rpc_batch_job" could be removed.
Aparently works OK for me, but I don't know if this would affect any abnormal job. Could you confirm if this would work for every case? Best, Albert On 28/11/13 12:14, Albert Solernou wrote: > > Hi Moe, > Thanks for the reply. Moving the call to gres_plugin_job_set_env() is > definitely non-trivial, however I imagine a possible workaround: > > _rpc_launch_tasks in req.c runs the prolog: > "if !slurm_cred_jobid_cached(conf->vctx, req->job_id);" > (line 1073 in req.c v. 2.6.3) > I understand that "_rpc_batch_job" runs the prolog on his own because at > "_rpc_launch_task" it returns false to this condition. If that is the > case, and there is no other reason to run the "_rpc_batch_job" prolog > earlier, then we'd be able to find a new condition so that the prolog is > only called from "_rpc_launch_task". > > Could you confirm my assumption about the prolog calls? Do you have any > suggestion on how this new condition should be formulated? > > Thanks, > Albert > > > > On 27/11/13 23:08, Moe Jette wrote: >> >> This is definitely a non-trivial change. The call to the function >> gres_plugin_job_set_env() would need to be moved from the slurmstepd >> process to the slurmd daemon (before the prolog runs) and then that >> environment variable would need to be passed to the prolog. >> >> Moe Jette >> SchedMD LLC >> >> Quoting Albert Solernou <[email protected]>: >> >>> >>> Hi all, >>> I may need some extra help. >>> >>> I successfully modified req.c to pass to the "prolog environment" a >>> user environment variable that I defined, say CUDA_SET_COMPUTE_MODE. >>> However, I am still missing CUDA_VISIBLE_DEVICES. >>> >>> When slurmd goes through _rpc_batch_job it runs the prolog. However, >>> CUDA_VISIBLE_DEVICES is not there yet (the slurm_msg_t that the >>> function handles does not have this variable within the >>> req->environment). It will be later, when slurmd passes through >>> _rpc_launch_tasks that $CUDA_VISIBLE_DEVICES is set (in its req->env), >>> but now it is too late. >>> >>> Could you give me some hints on how to get CUDA_VISIBLE_DEVICES in >>> req.c:_rpc_batch_job? That would definitely speed things up. >>> >>> Thanks in advance, >>> Albert >>> >>> >>> >>> >>> On Wed 20 Nov 2013 14:15:12 GMT, Albert Solernou wrote: >>>> >>>> Thanks for the quick answer, Moe. >>>> >>>> I'd be trying that, and let you know. >>>> >>>> Best, >>>> Albert >>>> >>>> On Wed 20 Nov 2013 14:09:12 GMT, [email protected] wrote: >>>>> >>>>> Your easiest option would be to modify the Slurm code to export >>>>> whatever additional environment variables that you want, which should >>>>> be pretty simple. See the function _build_env() in >>>>> src/slurmd/slurmd/req.c. If you make changes and send us the patch, we >>>>> can include it in the canonical code base. >>>>> >>>>> Moe Jette >>>>> SchedMD LLC >>>>> >>>>> On 2013-11-20 05:05, Albert Solernou wrote: >>>>>> Hi, >>>>>> I'd like to write a prolog script that changes the GPU compute mode of >>>>>> the allocated GPU card(s). This change can only be done by root. My >>>>>> initial idea was that the prolog scipt would use an environment >>>>>> variable >>>>>> as a switch. >>>>>> >>>>>> The problem that I face are: >>>>>> - prolog or prologctld have a reduced amount of environment >>>>>> variables. >>>>>> Specifically, they miss "CUDA_VISIBLE_DEVICE" assigned by the GRes >>>>>> plugin, as well as any user environment flag. >>>>>> >>>>>> >>>>>> Is there an easy workaround? Will I have to patch the current GRes >>>>>> plugin or to tinker with a new spank plugin? >>>>>> >>>>>> Any help is welcome! >>>>>> >>>>>> Regards, >>>>>> Albert >>>> >>>> -- >>>> --------------------------------- >>>> Dr. Albert Solernou >>>> Research Associate >>>> Oxford Supercomputing Centre, >>>> University of Oxford >>>> Tel: +44 (0)1865 610631 >>>> --------------------------------- >>> >>> -- >>> --------------------------------- >>> Dr. Albert Solernou >>> Research Associate >>> Oxford Supercomputing Centre, >>> University of Oxford >>> Tel: +44 (0)1865 610631 >>> --------------------------------- >>> >> > -- --------------------------------- Dr. Albert Solernou Research Associate Oxford Supercomputing Centre, University of Oxford Tel: +44 (0)1865 610631 ---------------------------------
