This may be due because of this commit :
https://github.com/SchedMD/slurm/commit/ee2813870fed48827aa0ec99e1b4baeaca710755
It seems that the behavior was changed from a fatal error to something
different when requesting cgroup devices on in cgroup.conf without the
proper conf file.
If you do not r
Hi Kevin,
Based on my understanding and a discussion with the SLURM dev team on that
subject, here are some information about the new support of X11 in
slurm-17.11 :
- slurm's native support of X11 forwarding is based on libssh2
- slurm's native support of X11 can be disabled at configure/compila
Hi,
You should look at that bug : https://bugs.schedmd.com/show_bug.cgi?id=4412
I thought it would be resolved in 17.11.0.
Regards
Matthieu
Le 30 nov. 2017 00:56, "Andy Riebs" a écrit :
> We've just installed 17.11.0 on our 100+ node x86_64 cluster running
> CentOS 7.4 this afternoon, and per
Hi,
In this kind if issues, one good thing to do is to get a backtrace of
slurmctld during the slowdown. You should thus easily identify the
subcomponent responsible for the issue.
I would bet on something like LDAP requests taking too much time because of
a missing sssd cache.
Regards
Matthieu
Hi,
your login node may have a heavy load while starting such a large number of
independant sruns.
This may induce issues not seen under normal load, like partial read/write
on sockets, triggering bugs in slurm, for functions not properly protected
against such events.
Quickly looking at the sou
Hi,
At the time the MCS logic was added to Slurm, the filtering of slurmdbd
related information based on the MCS label was defered because it requires
a new field (mcs_label) into the slurmdbd job/step records.
The addition of this label in the main branch took times and only appears
in 17.11 (se
Le jeu. 17 mai 2018 11:28, Mahmood Naderan a écrit :
> Hi,
> For an interactive job via srun, I see that after opening the gui, the
> session is terminated automatically which is weird.
>
> [mahmood@rocks7 ansys_test]$ srun --x11 -A y8 -p RUBY --ntasks=10
> --mem=8GB --pty bash
> [mahmood@compute
Hi,
Communications in Slurm are not only performed from controller to slurmd
and from slurmd to controller. You need to ensure that your login nodes can
reach the controller and the slurmd nodes as well as ensure that slurmd on
the various nodes can contact each other. This last requirement is bec
nks again, Matthieu!
>
> Best,
>
> Sean
>
>
> On Thu, May 17, 2018 at 8:06 PM, Sean Caron wrote:
>
>> Awesome tip. Thanks so much, Matthieu. I hadn't considered that. I will
>> give that a shot and see what happens.
>>
>> Best,
>>