Hi Moe,
did you mean "for a batch job on a Cray system"?

Apparently works fine for me submitting a job script with sbatch.

Albert

On 29/11/13 16:52, Moe Jette wrote:
> 
> that change will not work for a batch job or a Cray system.
> 
> Quoting Albert Solernou <[email protected]>:
> 
>>
>> Hi all,
>> could this condition be "if (req->job_step_id == 0)" instead of "if
>> (first_job_run)" in "_rpc_launch_tasks"? Then the call to the prolog in
>> "_rpc_batch_job" could be removed.
>>
>> Aparently works OK for me, but I don't know if this would affect any
>> abnormal job.
>>
>> Could you confirm if this would work for every case?
>>
>> Best,
>> Albert
>>
>> On 28/11/13 12:14, Albert Solernou wrote:
>>>
>>> Hi Moe,
>>> Thanks for the reply. Moving the call to gres_plugin_job_set_env() is
>>> definitely non-trivial, however I imagine a possible workaround:
>>>
>>>  _rpc_launch_tasks in req.c runs the prolog:
>>>   "if !slurm_cred_jobid_cached(conf->vctx, req->job_id);"
>>> (line 1073 in req.c v. 2.6.3)
>>> I understand that "_rpc_batch_job" runs the prolog on his own because at
>>> "_rpc_launch_task" it returns false to this condition. If that is the
>>> case, and there is no other reason to run the "_rpc_batch_job" prolog
>>> earlier, then we'd be able to find a new condition so that the prolog is
>>> only called from "_rpc_launch_task".
>>>
>>> Could you confirm my assumption about the prolog calls? Do you have any
>>> suggestion on how this new condition should be formulated?
>>>
>>> Thanks,
>>> Albert
>>>
>>>
>>>
>>> On 27/11/13 23:08, Moe Jette wrote:
>>>>
>>>> This is definitely a non-trivial change. The call to the function
>>>> gres_plugin_job_set_env() would need to be moved from the slurmstepd
>>>> process to the slurmd daemon (before the prolog runs) and then that
>>>> environment variable would need to be passed to the prolog.
>>>>
>>>> Moe Jette
>>>> SchedMD LLC
>>>>
>>>> Quoting Albert Solernou <[email protected]>:
>>>>
>>>>>
>>>>> Hi all,
>>>>> I may need some extra help.
>>>>>
>>>>> I successfully modified req.c to pass to the "prolog environment" a
>>>>> user environment variable that I defined, say CUDA_SET_COMPUTE_MODE.
>>>>> However, I am still missing CUDA_VISIBLE_DEVICES.
>>>>>
>>>>> When slurmd goes through _rpc_batch_job it runs the prolog. However,
>>>>> CUDA_VISIBLE_DEVICES is not there yet (the slurm_msg_t that the
>>>>> function handles does not have this variable within the
>>>>> req->environment). It will be later, when slurmd passes through
>>>>> _rpc_launch_tasks that $CUDA_VISIBLE_DEVICES is set (in its req->env),
>>>>> but now it is too late.
>>>>>
>>>>> Could you give me some hints on how to get CUDA_VISIBLE_DEVICES in
>>>>> req.c:_rpc_batch_job? That would definitely speed things up.
>>>>>
>>>>> Thanks in advance,
>>>>> Albert
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Wed 20 Nov 2013 14:15:12 GMT, Albert Solernou wrote:
>>>>>>
>>>>>> Thanks for the quick answer, Moe.
>>>>>>
>>>>>> I'd be trying that, and let you know.
>>>>>>
>>>>>> Best,
>>>>>> Albert
>>>>>>
>>>>>> On Wed 20 Nov 2013 14:09:12 GMT, [email protected] wrote:
>>>>>>>
>>>>>>> Your easiest option would be to modify the Slurm code to export
>>>>>>> whatever additional environment variables that you want, which
>>>>>>> should
>>>>>>> be pretty simple. See the function _build_env() in
>>>>>>> src/slurmd/slurmd/req.c. If you make changes and send us the
>>>>>>> patch, we
>>>>>>> can include it in the canonical code base.
>>>>>>>
>>>>>>> Moe Jette
>>>>>>> SchedMD LLC
>>>>>>>
>>>>>>> On 2013-11-20 05:05, Albert Solernou wrote:
>>>>>>>> Hi,
>>>>>>>> I'd like to write a prolog script that changes the GPU compute
>>>>>>>> mode of
>>>>>>>> the allocated GPU card(s). This change can only be done by root. My
>>>>>>>> initial idea was that the prolog scipt would use an environment
>>>>>>>> variable
>>>>>>>> as a switch.
>>>>>>>>
>>>>>>>> The problem that I face are:
>>>>>>>>  - prolog or prologctld have a reduced amount of environment
>>>>>>>> variables.
>>>>>>>> Specifically, they miss "CUDA_VISIBLE_DEVICE" assigned by the GRes
>>>>>>>> plugin, as well as any user environment flag.
>>>>>>>>
>>>>>>>>
>>>>>>>> Is there an easy workaround? Will I have to patch the current GRes
>>>>>>>> plugin or to tinker with a new spank plugin?
>>>>>>>>
>>>>>>>> Any help is welcome!
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Albert
>>>>>>
>>>>>> -- 
>>>>>> ---------------------------------
>>>>>>   Dr. Albert Solernou
>>>>>>   Research Associate
>>>>>>   Oxford Supercomputing Centre,
>>>>>>   University of Oxford
>>>>>>   Tel: +44 (0)1865 610631
>>>>>> ---------------------------------
>>>>>
>>>>> -- 
>>>>> ---------------------------------
>>>>>   Dr. Albert Solernou
>>>>>   Research Associate
>>>>>   Oxford Supercomputing Centre,
>>>>>   University of Oxford
>>>>>   Tel: +44 (0)1865 610631
>>>>> ---------------------------------
>>>>>
>>>>
>>>
>>
>> -- 
>> ---------------------------------
>>   Dr. Albert Solernou
>>   Research Associate
>>   Oxford Supercomputing Centre,
>>   University of Oxford
>>   Tel: +44 (0)1865 610631
>> ---------------------------------
>>
> 

-- 
---------------------------------
  Dr. Albert Solernou
  Research Associate
  Oxford Supercomputing Centre,
  University of Oxford
  Tel: +44 (0)1865 610631
---------------------------------

Reply via email to