Hi all, I am dealing with job scheduling and concurrence in Slurm. I am a bit stuck, so maybe someone can give me a hint on this.
I have a job that I want to be executed on a given node. I am performing the submission process with an application using Slurm API. The steps I am following so far are: - I check if the job can run in a given node with slurm_job_will_run. I make sure that it is empty and job can be executed. - I send the job to that node employing slurm_submit_batch_job command. - If I am lucky, my job will start immediately - I am not, in the meanwhile a job has been allocated to that node and my job will have to wait. As you can see, there is an obvious race condition here. The question is, do you know any way of avoiding it? A workaround that I've thought is to set the job the highest possible priority, and forbid to use that priority on user submitted jobs. It doesn't look very elegant though. Other approach could be to make a reservation and (if successful) then execute the job. This however would -correct me if I'm wrong- have the drawback of having to delete the reservation after the job execution. It is probably not a big deal, but doesn't look like the best solution either. All together, I keep having the feeling that there is an obvious solution that I am not considering. Any ideas or suggestions? Thanks for your help, Manuel -- Dr. Manuel Rodríguez-Pascual skype: manuel.rodriguez.pascual phone: (+34) 913466173 // (+34) 679925108 CIEMAT-Moncloa Edificio 22, desp. 1.25 Avenida Complutense, 40 28040- MADRID SPAIN