Felix,
How does the routing table look on the controller?
Is the IB network listed on the controller using the correct interface?
John DeSantis
2015-03-19 10:48 GMT-04:00 Felix Willenborg felix.willenb...@uni-oldenburg.de:
So i tried out installing the latest package (14.11.4-1) of slurm
So i tried out installing the latest package (14.11.4-1) of slurm with
no success - unfortunately. I kept an eye on the compilation of the
Infiniband Plugin, that it is loaded in the slurmd and that a
acct_gathering.conf is available. Still, i have the same problem. I
assume that i'm not
Felix,
My fault, I suggested something that you already checked!
John DeSantis
2015-03-17 11:28 GMT-04:00 John Desantis desan...@mail.usf.edu:
Felix,
Can you ping the nodes from the controller and vise versa?
The snippet below looks like a potential firewall issue:
2015-03-17 13:31 GMT+01:00 Felix Willenborg
felix.willenb...@uni-oldenburg.de:
Hi there,
first of all, i'm kinda new to slurm, so hopefully i may have missed
something very basic here.
slurmctld.log
Felix,
Can you ping the nodes from the controller and vise versa?
The snippet below looks like a potential firewall issue:
[2015-03-16T15:40:02.845] debug2: Error connecting slurm stream socket at
***.***.***.***52:6818: Connection timed out
Try telnet'ing from the controller to each node on
Felix,
Do the IP addresses associated with the NodeName's return proper matches
when you run lookups?
What happens if you don't use IP addresses and only host names within your
Slurm configuration?
John DeSantis
2015-03-17 11:30 GMT-04:00 John Desantis desan...@mail.usf.edu:
Felix,
My
The troubleshooting guide online may help:
http://slurm.schedmd.com/troubleshoot.html#nodes
Quoting Felix Willenborg felix.willenb...@uni-oldenburg.de:
Hi there,
first of all, i'm kinda new to slurm, so hopefully i may have missed
something very basic here.
I'm trying to set up a system
@Yann Sagon:
This is a good point. I didn't notice this. I'm new to this particular
system. Everything installed, configured etc. was already there when i
got into it. I'll keep that in mind, because i've got to discuss the
upgrade with the group. Maybe i should set it up from the beginning
Felix,
I would suggest you look in the munged log for errors and make sure time is
sync'd across all your nodes.
Trevor
On Mar 17, 2015, at 5:31 AM, Felix Willenborg
felix.willenb...@uni-oldenburg.de wrote:
Hi there,
first of all, i'm kinda new to slurm, so hopefully i may have