Hi all,

I am dealing with job scheduling and concurrence in Slurm. I am a bit
stuck, so maybe someone can give me a hint on this.

I have a job that I want to be executed on a given node. I am performing
the submission process with an application using Slurm API. The steps I am
following so far are:

- I check if the job can run in a given node  with slurm_job_will_run. I
make sure that it is empty and job can be executed.

- I send the job to that node employing slurm_submit_batch_job command.

- If I am lucky, my job will start immediately

- I am not, in the meanwhile a job has been allocated to that node and my
job will have to wait.

As you can see, there is an obvious race condition here. The question is,
do you know any way of avoiding it?

A workaround that I've thought is to set the job the highest possible
priority, and forbid to use that priority on user submitted jobs.  It
doesn't look very elegant though. Other approach could be to make a
reservation and (if successful) then execute the job. This however would
-correct me if I'm wrong- have the drawback of having to delete the
reservation after the job execution. It is probably not a big deal, but
doesn't look like the best solution either.  All together, I keep having
the feeling that there is an obvious solution that I am not considering.


Any ideas or suggestions?

Thanks for your help,

Manuel
-- 
Dr. Manuel Rodríguez-Pascual
skype: manuel.rodriguez.pascual
phone: (+34) 913466173 // (+34) 679925108

CIEMAT-Moncloa
Edificio 22, desp. 1.25
Avenida Complutense, 40
28040- MADRID
SPAIN

Reply via email to