> On Mar 2, 2016, at 2:51 PM, Kilian Cavalotti > <[email protected]> wrote: > > On Wed, Mar 2, 2016 at 10:12 AM, <[email protected]> wrote: >> We want to introduce a new behavior in the way slurmd uses the >> HealthCheckProgram. The idea is to avoid a race condition between the first >> HealthCheckProgram run and the node accepting jobs. The slurmd daemon will >> initialize and then loop on HealthCheckProgram execution before registering >> with slurmctld. It will stay in this loop until the HealthCheckProgram >> returns successfully (the node is still DOWN). > > Love the idea!
I do as well. I’m currently having a devil of a time getting SLURM to accept jobs /after/ GPFS is available. So far I’ve tried a number of dependency-related tricks with systemd and am still not getting it working right as-yet. This would solve that and any other “not ready” problems. -- ____ *Note: UMDNJ is now Rutgers-Biomedical and Health Sciences* || \\UTGERS |---------------------*O*--------------------- ||_// Biomedical | Ryan Novosielski - Senior Technologist || \\ and Health | [email protected] - 973/972.0922 (2x0922) || \\ Sciences | OIRT/High Perf & Res Comp - MSB C630, Newark `'
signature.asc
Description: Message signed with OpenPGP using GPGMail
