Which OS are you using on both nodes and how particulary did you turn
off the firewall?

Best,
Felix

Am 05.07.2017 um 16:23 schrieb Said Mohamed Said:
> Sinfo -R gives "NODE IS NOT RESPONDING"
> ping gives successful results from both nodes
>
> I really can not figure out what is causing the problem.
>
> Regards,
> Said
> ------------------------------------------------------------------------
> *From:* Felix Willenborg <[email protected]>
> *Sent:* Wednesday, July 5, 2017 9:07:05 PM
> *To:* slurm-dev
> *Subject:* [slurm-dev] Re: SLURM ERROR! NEED HELP
>  
> When the nodes change to the down state, what is 'sinfo -R' saying?
> Sometimes it gives you a reason for that.
>
> Best,
> Felix
>
> Am 05.07.2017 um 13:16 schrieb Said Mohamed Said:
>> Thank you Adam, For NTP I did that as well before posting but didn't
>> fix the issue.
>>
>> Regards,
>> Said
>> ------------------------------------------------------------------------
>> *From:* Adam Huffman <[email protected]>
>> *Sent:* Wednesday, July 5, 2017 8:11:03 PM
>> *To:* slurm-dev
>> *Subject:* [slurm-dev] Re: SLURM ERROR! NEED HELP
>>  
>>
>> I've seen something similar when node clocks were skewed.
>>
>> Worth checking that NTP is running and they're all synchronised.
>>
>> On Wed, Jul 5, 2017 at 12:06 PM, Said Mohamed Said
>> <[email protected]> wrote:
>> > Thank you all for suggestions. I turned off firewall on both
>> machines but
>> > still no luck. I can confirm that No managed switch is preventing
>> the nodes
>> > from communicating. If you check the log file, there is
>> communication for
>> > about 4mins and then the node state goes down.
>> > Any other idea?
>> > ________________________________
>> > From: Ole Holm Nielsen <[email protected]>
>> > Sent: Wednesday, July 5, 2017 7:07:15 PM
>> > To: slurm-dev
>> > Subject: [slurm-dev] Re: SLURM ERROR! NEED HELP
>> >
>> >
>> > On 07/05/2017 11:40 AM, Felix Willenborg wrote:
>> >> in my network I encountered that managed switches were preventing
>> >> necessary network communication between the nodes, on which SLURM
>> >> relies. You should check if you're using managed switches to connect
>> >> nodes to the network and if so, if they're blocking communication on
>> >> slurm ports.
>> >
>> > Managed switches should permit IP layer 2 traffic just like unmanaged
>> > switches!  We only have managed Ethernet switches, and they work
>> without
>> > problems.
>> >
>> > Perhaps you meant that Ethernet switches may perform some firewall
>> > functions by themselves?
>> >
>> > Firewalls must be off between Slurm compute nodes as well as the
>> > controller host.  See
>> >
>> https://wiki.fysik.dtu.dk/niflheim/Slurm_configuration#configure-firewall-for-slurm-daemons
>> >
>> > /Ole
>

Reply via email to