Dear SLURM developers,
We encountered similar issues reported by Tingyang Xu in "slurm cannot
work with Infiniband after rebooting". More details can be found on his posts
on SLURM and Intel forums.
https://groups.google.com/forum/#!searchin/slurm-devel/slurm$20cannot$20work$20with$20Infiniband$20after$20rebooting%7Csort:relevance/slurm-devel/GUeOOlaayLk/OsvdTAsRtdsJ
<https://groups.google.com/forum/#!searchin/slurm-devel/slurm$20cannot$20work$20with$20Infiniband$20after$20rebooting|sort:relevance/slurm-devel/GUeOOlaayLk/OsvdTAsRtdsJ>
https://software.intel.com/en-us/forums/intel-clusters-and-hpc-technology/topic/534491
<https://software.intel.com/en-us/forums/intel-clusters-and-hpc-technology/topic/534491>
A workaround is to restart the slurmd service:
# systemclt restart slurmd
As indicated in the Intel Forum, this issue may be caused by
Infiniband's being unavailability when SLURM starts. Do you have any
recommendation to put SLURM as the last service to start when rebooting the
host?
Best,
Jianwen