Firewall on the slurmctld server. -- ____ *Note: UMDNJ is now Rutgers-Biomedical and Health Sciences* || \\UTGERS |---------------------*O*--------------------- ||_// Biomedical | Ryan Novosielski - Senior Technologist || \\ and Health | [email protected] - 973/972.0922 (2x0922) || \\ Sciences | OIRT/High Perf & Res Comp - MSB C630, Newark `'
> On Mar 12, 2016, at 12:32, Jagga Soorma <[email protected]> wrote: > > > Hi Guys, > > I have successfully installed slurm 15.08 on a small test cluster > running CentOS 7.1. Everything seems like it is running fine and I > can submit jobs without any issues. However on the clients I am > seeing some errors on the systemctl status slurm command that don't > make sense: > > -- > # systemctl status slurm > slurm.service - LSB: slurm daemon management > Loaded: loaded (/etc/rc.d/init.d/slurm) > Active: failed (Result: timeout) since Sat 2016-03-12 09:14:22 PST; 7min ago > > Mar 12 09:12:20 client1 slurmd[123729]: _run_prolog: run job script took > usec=4 > Mar 12 09:12:20 client1 slurmd[123729]: _run_prolog: prolog with lock > for job 6 ran for 0 seconds > Mar 12 09:12:20 client1 slurmstepd[126069]: done with job > Mar 12 09:12:30 client1 slurmd[123729]: launch task 7.0 request from > [email protected] (port 42986) > Mar 12 09:12:30 client1 slurmd[123729]: _run_prolog: run job script took > usec=4 > Mar 12 09:12:30 client1 slurmd[123729]: _run_prolog: prolog with lock > for job 7 ran for 0 seconds > Mar 12 09:12:30 client1 slurmstepd[126100]: done with job > Mar 12 09:14:22 client1 systemd[1]: slurm.service operation timed out. > Terminating. > Mar 12 09:14:22 client1 systemd[1]: Failed to start LSB: slurm daemon > management. > Mar 12 09:14:22 client1 systemd[1]: Unit slurm.service entered failed state. > -- > > However slurm seems to be working fine: > > -- > # sinfo -lNe > Sat Mar 12 09:25:40 2016 > NODELIST NODES PARTITION STATE CPUS S:C:T MEMORY > TMP_DISK WEIGHT FEATURES REASON > client[1-10] 8 dev* idle 40 2:10:2 257680 0 > 1 (null) none > # srun hostname > client1 > -- > > Any ideas why the slurm service in the client might be throwing those > timed out errors? > > Thanks! >
