Hi Moe, did you mean "for a batch job on a Cray system"? Apparently works fine for me submitting a job script with sbatch.
Albert On 29/11/13 16:52, Moe Jette wrote: > > that change will not work for a batch job or a Cray system. > > Quoting Albert Solernou <[email protected]>: > >> >> Hi all, >> could this condition be "if (req->job_step_id == 0)" instead of "if >> (first_job_run)" in "_rpc_launch_tasks"? Then the call to the prolog in >> "_rpc_batch_job" could be removed. >> >> Aparently works OK for me, but I don't know if this would affect any >> abnormal job. >> >> Could you confirm if this would work for every case? >> >> Best, >> Albert >> >> On 28/11/13 12:14, Albert Solernou wrote: >>> >>> Hi Moe, >>> Thanks for the reply. Moving the call to gres_plugin_job_set_env() is >>> definitely non-trivial, however I imagine a possible workaround: >>> >>> _rpc_launch_tasks in req.c runs the prolog: >>> "if !slurm_cred_jobid_cached(conf->vctx, req->job_id);" >>> (line 1073 in req.c v. 2.6.3) >>> I understand that "_rpc_batch_job" runs the prolog on his own because at >>> "_rpc_launch_task" it returns false to this condition. If that is the >>> case, and there is no other reason to run the "_rpc_batch_job" prolog >>> earlier, then we'd be able to find a new condition so that the prolog is >>> only called from "_rpc_launch_task". >>> >>> Could you confirm my assumption about the prolog calls? Do you have any >>> suggestion on how this new condition should be formulated? >>> >>> Thanks, >>> Albert >>> >>> >>> >>> On 27/11/13 23:08, Moe Jette wrote: >>>> >>>> This is definitely a non-trivial change. The call to the function >>>> gres_plugin_job_set_env() would need to be moved from the slurmstepd >>>> process to the slurmd daemon (before the prolog runs) and then that >>>> environment variable would need to be passed to the prolog. >>>> >>>> Moe Jette >>>> SchedMD LLC >>>> >>>> Quoting Albert Solernou <[email protected]>: >>>> >>>>> >>>>> Hi all, >>>>> I may need some extra help. >>>>> >>>>> I successfully modified req.c to pass to the "prolog environment" a >>>>> user environment variable that I defined, say CUDA_SET_COMPUTE_MODE. >>>>> However, I am still missing CUDA_VISIBLE_DEVICES. >>>>> >>>>> When slurmd goes through _rpc_batch_job it runs the prolog. However, >>>>> CUDA_VISIBLE_DEVICES is not there yet (the slurm_msg_t that the >>>>> function handles does not have this variable within the >>>>> req->environment). It will be later, when slurmd passes through >>>>> _rpc_launch_tasks that $CUDA_VISIBLE_DEVICES is set (in its req->env), >>>>> but now it is too late. >>>>> >>>>> Could you give me some hints on how to get CUDA_VISIBLE_DEVICES in >>>>> req.c:_rpc_batch_job? That would definitely speed things up. >>>>> >>>>> Thanks in advance, >>>>> Albert >>>>> >>>>> >>>>> >>>>> >>>>> On Wed 20 Nov 2013 14:15:12 GMT, Albert Solernou wrote: >>>>>> >>>>>> Thanks for the quick answer, Moe. >>>>>> >>>>>> I'd be trying that, and let you know. >>>>>> >>>>>> Best, >>>>>> Albert >>>>>> >>>>>> On Wed 20 Nov 2013 14:09:12 GMT, [email protected] wrote: >>>>>>> >>>>>>> Your easiest option would be to modify the Slurm code to export >>>>>>> whatever additional environment variables that you want, which >>>>>>> should >>>>>>> be pretty simple. See the function _build_env() in >>>>>>> src/slurmd/slurmd/req.c. If you make changes and send us the >>>>>>> patch, we >>>>>>> can include it in the canonical code base. >>>>>>> >>>>>>> Moe Jette >>>>>>> SchedMD LLC >>>>>>> >>>>>>> On 2013-11-20 05:05, Albert Solernou wrote: >>>>>>>> Hi, >>>>>>>> I'd like to write a prolog script that changes the GPU compute >>>>>>>> mode of >>>>>>>> the allocated GPU card(s). This change can only be done by root. My >>>>>>>> initial idea was that the prolog scipt would use an environment >>>>>>>> variable >>>>>>>> as a switch. >>>>>>>> >>>>>>>> The problem that I face are: >>>>>>>> - prolog or prologctld have a reduced amount of environment >>>>>>>> variables. >>>>>>>> Specifically, they miss "CUDA_VISIBLE_DEVICE" assigned by the GRes >>>>>>>> plugin, as well as any user environment flag. >>>>>>>> >>>>>>>> >>>>>>>> Is there an easy workaround? Will I have to patch the current GRes >>>>>>>> plugin or to tinker with a new spank plugin? >>>>>>>> >>>>>>>> Any help is welcome! >>>>>>>> >>>>>>>> Regards, >>>>>>>> Albert >>>>>> >>>>>> -- >>>>>> --------------------------------- >>>>>> Dr. Albert Solernou >>>>>> Research Associate >>>>>> Oxford Supercomputing Centre, >>>>>> University of Oxford >>>>>> Tel: +44 (0)1865 610631 >>>>>> --------------------------------- >>>>> >>>>> -- >>>>> --------------------------------- >>>>> Dr. Albert Solernou >>>>> Research Associate >>>>> Oxford Supercomputing Centre, >>>>> University of Oxford >>>>> Tel: +44 (0)1865 610631 >>>>> --------------------------------- >>>>> >>>> >>> >> >> -- >> --------------------------------- >> Dr. Albert Solernou >> Research Associate >> Oxford Supercomputing Centre, >> University of Oxford >> Tel: +44 (0)1865 610631 >> --------------------------------- >> > -- --------------------------------- Dr. Albert Solernou Research Associate Oxford Supercomputing Centre, University of Oxford Tel: +44 (0)1865 610631 ---------------------------------
