Maybe it helps if you modify the [Unit] part of slurmd.service as follows:
After=network.target network-online.target munge.service Requires=network.target network-online.target munge.service If this is not sufficient, you might further try: After=network.target remote-fs.target munge.service Requires=network.target remote-fs.target munge.service 2017-03-07 2:26 GMT+01:00 Jianwen Wei <[email protected]>: > Dear SLURM developers, > > We encountered similar issues reported by Tingyang Xu in "slurm cannot work > with Infiniband after rebooting". More details can be found on his posts > on SLURM and Intel forums. > > https://groups.google.com/forum/#!searchin/slurm-devel/slurm$20cannot$20work$20with$20Infiniband$20after$20rebooting%7Csort:relevance/slurm-devel/GUeOOlaayLk/OsvdTAsRtdsJ > https://software.intel.com/en-us/forums/intel-clusters-and-hpc-technology/topic/534491 > > A workaround is to restart the slurmd service: > > # systemclt restart slurmd > > As indicated in the Intel Forum, this issue may be caused by Infiniband's > being unavailability when SLURM starts. Do you have any recommendation to > put SLURM as the last service to start when rebooting the host? > > > Best, > > Jianwen
