> On Mar 2, 2016, at 2:51 PM, Kilian Cavalotti 
> <[email protected]> wrote:
> 
> On Wed, Mar 2, 2016 at 10:12 AM,  <[email protected]> wrote:
>> We want to introduce a new behavior in the way slurmd uses the
>> HealthCheckProgram. The idea is to avoid a race condition between the first
>> HealthCheckProgram run and the node accepting jobs. The slurmd daemon will
>> initialize and then loop on HealthCheckProgram execution before registering
>> with slurmctld. It will stay in this loop until the HealthCheckProgram
>> returns successfully (the node is still DOWN).
> 
> Love the idea!

I do as well. I’m currently having a devil of a time getting SLURM to accept 
jobs /after/ GPFS is available. So far I’ve tried a number of 
dependency-related tricks with systemd and am still not getting it working 
right as-yet. This would solve that and any other “not ready” problems.

--
____ *Note: UMDNJ is now Rutgers-Biomedical and Health Sciences*
|| \\UTGERS      |---------------------*O*---------------------
||_// Biomedical | Ryan Novosielski - Senior Technologist
|| \\ and Health | [email protected] - 973/972.0922 (2x0922)
||  \\  Sciences | OIRT/High Perf & Res Comp - MSB C630, Newark
    `'

Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail

Reply via email to