Am Thu, 10 Mar 2016 00:19:16 -0800
schrieb Rémi Palancher <[email protected]>:

> That's exactly the purpose of the patch since we were facing the same 
> issue with IB and GPFS.

I also mucked around with mount dependencies to the slurmd systemd
service in CentOS 7, but that did not work out well for BeeGFS, which
systemd doesn't really know about in terms of how the mount appears.
Also, BeeGFS can succeed to mount before Infiniband is active, as it
happily uses an existing Ethernet connection as alternative.

I ended up doing explicit waiting for links being up on eth0 and ib0 in
an xCAT postscript and having the slurmd service disabled by default.
After everything is there, the boot script starts slurmd via systemctl
explicitly. At that point in time, jobs really can be accepted.

If you want to rely on systemd's automagic to start all stuff at the
same time and emulate presence of services that are still starting up,
including slurmd, this sanity check inside slurmd might be the only way
to ensure something resembling a sane "booted" state for a node, until
those checks and dependencies really work inside systemd (including
GPFS and BeeGFS mounts). But it is not unreasonable to have those
configuration and health checks separately, UNIX-style.


Alrighty then,

Thomas

-- 
Dr. Thomas Orgis
Universität Hamburg
RRZ / Zentrale Dienste / HPC
Schlüterstr. 70
20146 Hamburg
Tel.: 040/42838 8826
Fax: 040/428 38 6270

Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to