Sinfo -R gives "NODE IS NOT RESPONDING"
ping gives successful results from both nodes

I really can not figure out what is causing the problem.

Regards,
Said
________________________________
From: Felix Willenborg <[email protected]>
Sent: Wednesday, July 5, 2017 9:07:05 PM
To: slurm-dev
Subject: [slurm-dev] Re: SLURM ERROR! NEED HELP

When the nodes change to the down state, what is 'sinfo -R' saying? Sometimes 
it gives you a reason for that.

Best,
Felix

Am 05.07.2017 um 13:16 schrieb Said Mohamed Said:
Thank you Adam, For NTP I did that as well before posting but didn't fix the 
issue.

Regards,
Said
________________________________
From: Adam Huffman <[email protected]><mailto:[email protected]>
Sent: Wednesday, July 5, 2017 8:11:03 PM
To: slurm-dev
Subject: [slurm-dev] Re: SLURM ERROR! NEED HELP


I've seen something similar when node clocks were skewed.

Worth checking that NTP is running and they're all synchronised.

On Wed, Jul 5, 2017 at 12:06 PM, Said Mohamed Said 
<[email protected]><mailto:[email protected]> wrote:
> Thank you all for suggestions. I turned off firewall on both machines but
> still no luck. I can confirm that No managed switch is preventing the nodes
> from communicating. If you check the log file, there is communication for
> about 4mins and then the node state goes down.
> Any other idea?
> ________________________________
> From: Ole Holm Nielsen 
> <[email protected]><mailto:[email protected]>
> Sent: Wednesday, July 5, 2017 7:07:15 PM
> To: slurm-dev
> Subject: [slurm-dev] Re: SLURM ERROR! NEED HELP
>
>
> On 07/05/2017 11:40 AM, Felix Willenborg wrote:
>> in my network I encountered that managed switches were preventing
>> necessary network communication between the nodes, on which SLURM
>> relies. You should check if you're using managed switches to connect
>> nodes to the network and if so, if they're blocking communication on
>> slurm ports.
>
> Managed switches should permit IP layer 2 traffic just like unmanaged
> switches!  We only have managed Ethernet switches, and they work without
> problems.
>
> Perhaps you meant that Ethernet switches may perform some firewall
> functions by themselves?
>
> Firewalls must be off between Slurm compute nodes as well as the
> controller host.  See
> https://wiki.fysik.dtu.dk/niflheim/Slurm_configuration#configure-firewall-for-slurm-daemons
>
> /Ole

Reply via email to