When the nodes change to the down state, what is 'sinfo -R' saying? Sometimes it gives you a reason for that.
Best, Felix Am 05.07.2017 um 13:16 schrieb Said Mohamed Said: > Thank you Adam, For NTP I did that as well before posting but didn't > fix the issue. > > Regards, > Said > ------------------------------------------------------------------------ > *From:* Adam Huffman <[email protected]> > *Sent:* Wednesday, July 5, 2017 8:11:03 PM > *To:* slurm-dev > *Subject:* [slurm-dev] Re: SLURM ERROR! NEED HELP > > > I've seen something similar when node clocks were skewed. > > Worth checking that NTP is running and they're all synchronised. > > On Wed, Jul 5, 2017 at 12:06 PM, Said Mohamed Said > <[email protected]> wrote: > > Thank you all for suggestions. I turned off firewall on both > machines but > > still no luck. I can confirm that No managed switch is preventing > the nodes > > from communicating. If you check the log file, there is > communication for > > about 4mins and then the node state goes down. > > Any other idea? > > ________________________________ > > From: Ole Holm Nielsen <[email protected]> > > Sent: Wednesday, July 5, 2017 7:07:15 PM > > To: slurm-dev > > Subject: [slurm-dev] Re: SLURM ERROR! NEED HELP > > > > > > On 07/05/2017 11:40 AM, Felix Willenborg wrote: > >> in my network I encountered that managed switches were preventing > >> necessary network communication between the nodes, on which SLURM > >> relies. You should check if you're using managed switches to connect > >> nodes to the network and if so, if they're blocking communication on > >> slurm ports. > > > > Managed switches should permit IP layer 2 traffic just like unmanaged > > switches! We only have managed Ethernet switches, and they work without > > problems. > > > > Perhaps you meant that Ethernet switches may perform some firewall > > functions by themselves? > > > > Firewalls must be off between Slurm compute nodes as well as the > > controller host. See > > > https://wiki.fysik.dtu.dk/niflheim/Slurm_configuration#configure-firewall-for-slurm-daemons > > > > /Ole
