[slurm-dev] Re: Slurm is refusing to establish a connection between nodes and controller

2015-03-19 Thread John Desantis
Felix, How does the routing table look on the controller? Is the IB network listed on the controller using the correct interface? John DeSantis 2015-03-19 10:48 GMT-04:00 Felix Willenborg felix.willenb...@uni-oldenburg.de: So i tried out installing the latest package (14.11.4-1) of slurm

[slurm-dev] Re: Slurm is refusing to establish a connection between nodes and controller

2015-03-19 Thread Felix Willenborg
So i tried out installing the latest package (14.11.4-1) of slurm with no success - unfortunately. I kept an eye on the compilation of the Infiniband Plugin, that it is loaded in the slurmd and that a acct_gathering.conf is available. Still, i have the same problem. I assume that i'm not

[slurm-dev] Re: Slurm is refusing to establish a connection between nodes and controller

2015-03-17 Thread John Desantis
Felix, My fault, I suggested something that you already checked! John DeSantis 2015-03-17 11:28 GMT-04:00 John Desantis desan...@mail.usf.edu: Felix, Can you ping the nodes from the controller and vise versa? The snippet below looks like a potential firewall issue:

[slurm-dev] Re: Slurm is refusing to establish a connection between nodes and controller

2015-03-17 Thread Yann Sagon
2015-03-17 13:31 GMT+01:00 Felix Willenborg felix.willenb...@uni-oldenburg.de: Hi there, first of all, i'm kinda new to slurm, so hopefully i may have missed something very basic here. slurmctld.log

[slurm-dev] Re: Slurm is refusing to establish a connection between nodes and controller

2015-03-17 Thread John Desantis
Felix, Can you ping the nodes from the controller and vise versa? The snippet below looks like a potential firewall issue: [2015-03-16T15:40:02.845] debug2: Error connecting slurm stream socket at ***.***.***.***52:6818: Connection timed out Try telnet'ing from the controller to each node on

[slurm-dev] Re: Slurm is refusing to establish a connection between nodes and controller

2015-03-17 Thread John Desantis
Felix, Do the IP addresses associated with the NodeName's return proper matches when you run lookups? What happens if you don't use IP addresses and only host names within your Slurm configuration? John DeSantis 2015-03-17 11:30 GMT-04:00 John Desantis desan...@mail.usf.edu: Felix, My

[slurm-dev] Re: Slurm is refusing to establish a connection between nodes and controller

2015-03-17 Thread Moe Jette
The troubleshooting guide online may help: http://slurm.schedmd.com/troubleshoot.html#nodes Quoting Felix Willenborg felix.willenb...@uni-oldenburg.de: Hi there, first of all, i'm kinda new to slurm, so hopefully i may have missed something very basic here. I'm trying to set up a system

[slurm-dev] slurm-dev Re: Slurm is refusing to establish a connection between nodes and controller

2015-03-17 Thread Felix Willenborg
@Yann Sagon: This is a good point. I didn't notice this. I'm new to this particular system. Everything installed, configured etc. was already there when i got into it. I'll keep that in mind, because i've got to discuss the upgrade with the group. Maybe i should set it up from the beginning

[slurm-dev] Re: Slurm is refusing to establish a connection between nodes and controller

2015-03-17 Thread Cooper, Trevor
Felix, I would suggest you look in the munged log for errors and make sure time is sync'd across all your nodes. Trevor On Mar 17, 2015, at 5:31 AM, Felix Willenborg felix.willenb...@uni-oldenburg.de wrote: Hi there, first of all, i'm kinda new to slurm, so hopefully i may have