I think that looks OK. Forget my response.

On Tue, 13 May 2025 at 14:09, Tilman Hoffbauer via slurm-users <
slurm-users@lists.schedmd.com> wrote:

> Thank you for your response. nslookup on e.g. ouga20 is instant, getent
> hosts ouga20 takes about 1.6 seconds from g-vm03. It is about the same
> speed for ouga20 looking up g-vm03.
>
> Is this too slow?
> On 5/13/25 15:01, John Hearns wrote:
>
> Stupid response from me.  A loooong time ago I ha issues with slow
> response on PBS. The cause was name resolution.
>
> On your setup is name resolution OK? Can you look up host names without
> delays?
>
> On Tue, 13 May 2025 at 13:50, Tilman Hoffbauer via slurm-users <
> slurm-users@lists.schedmd.com> wrote:
>
>> Hello,
>>
>> we are running a SLURM-managed cluster with one control node (g-vm03) and
>> 26 worker nodes (ouga[03-28]) on Rocky 8. We recently updated from 20.11.9
>> through 23.02.8 to 24.11.0 and then 24.11.5. Since then, we are
>> experiencing performance issues - squeue and scontrol ping are slow to
>> react and sometimes deliver "timeout on send/recv" messages, even with only
>> very few parallel requests. We did not experience these issues with SLURM
>> 20.11.9 before, we did not check the intermediate version 23.02.8 in detail
>> before. In the log of slurmctld, we can also find messages like
>>
>> slurmctld: error: slurm_send_node_msg: [socket:[1272743]]
>> slurm_bufs_sendto(msg_type=RESPONSE_JOB_INFO) failed: Unexpected missing
>> socket error
>>
>> We thus implemented all recommendations from the high throughput
>> documentation, and did achieve improvements with it (most notably by
>> increasing the maximum number of open files and increasing MessageTimeout
>> and TCPTimeout).
>>
>> For debugging, I attached the slurm.conf, the sdiag output (the server
>> thread count is almost always 1 and sometimes increases to 2), the
>> slurmctld log and the slurmdbd log from a time of high load.
>>
>> We would be very thankful for any input on how restore the old
>> performance.
>>
>> Kind Regards,
>> Tilman Hoffbauer
>>
>>
>>
>> --
>> slurm-users mailing list -- slurm-users@lists.schedmd.com
>> To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
>>
>
> --
> slurm-users mailing list -- slurm-users@lists.schedmd.com
> To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
>
-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

Reply via email to