Le 09/03/2016 19:12, Ryan Novosielski a écrit :

On Mar 2, 2016, at 2:51 PM, Kilian Cavalotti <[email protected]> 
wrote:

On Wed, Mar 2, 2016 at 10:12 AM,  <[email protected]> wrote:
We want to introduce a new behavior in the way slurmd uses the
HealthCheckProgram. The idea is to avoid a race condition between the first
HealthCheckProgram run and the node accepting jobs. The slurmd daemon will
initialize and then loop on HealthCheckProgram execution before registering
with slurmctld. It will stay in this loop until the HealthCheckProgram
returns successfully (the node is still DOWN).

Love the idea!

I do as well. I’m currently having a devil of a time getting SLURM to accept 
jobs /after/ GPFS is available. So far I’ve tried a number of 
dependency-related tricks with systemd and am still not getting it working 
right as-yet. This would solve that and any other “not ready” problems.

That's exactly the purpose of the patch since we were facing the same issue with IB and GPFS.

FYI it's been accepted (thank you Moe btw!) and will be available in Slurm 16.05.0pre2:

https://github.com/SchedMD/slurm/commit/7fb0c9817abef04d324933e389fe274f20097075

Reply via email to