Thanks - that's an awesome, yet horrible, hack :)
Noam
> On Nov 8, 2018, at 3:26 AM, Josep Manel Andrés Moscardó
> wrote:
>
> Hi,
> Somebody else gave me this piece of code (I hope he doesn't mind me sharing
> it :) , at least
Hi,
Somebody else gave me this piece of code (I hope he doesn't mind me
sharing it :) , at least it is how they do it:
#!/bin/bash
#SBATCH --signal=B:USR1@300 #<-- This will make Slurm send signal
USR1 to the bash process 300 seconds before the time limit
#SBATCH -t 00:06:00
Hi and thanks for all your answers and sorry for the delay in my answer.
Yesterday I have installed in the controller machine the Slurm-18.08.3
to check if with this last release the Seff command is working fine. The
behavior has improve but I still receive a error message:
#
Hi Miguel,
this is because SchedMD changed the stats field. There exists no more
rss_max, cmp. line 225 of seff.
You need to evaluate the field stats{tres_usage_in_max}, and there the
value after '2=', but this is the memory value in bytes instead of
kbytes, so this should be divided by 1024
We use sssd with realmd
enumeration is off.
Brian Andrus
On 11/8/2018 11:26 AM, Marcin Stolarek wrote:
I have very similar issue for quite a time and I was unable to find
its root cause. Are you using sssd and AD as a data source with only a
subtree of entries searched - this is my case.
All,
I am seeing what looks like the same issue as
https://bugs.schedmd.com/show_bug.cgi?id=2119
Where, slurmctld is not picking up new accounts unless it is restarted.
I have 4 clusters (non-federated), all using the same slurmdbd
When I added an association for user name=me cluster=DevOps
I have very similar issue for quite a time and I was unable to find its
root cause. Are you using sssd and AD as a data source with only a subtree
of entries searched - this is my case.
Did you disable users enumeration? It also what I have. I didn’t find ang
evidence that it’s related but...
Can anyone shed some light on where the _virtual_ memory limit comes from?
We're getting jobs killed with the message
slurmstepd: error: Step 3664.0 exceeded virtual memory limit (79348101120 >
72638634393), being killed
Is this a limit that's dictated by cgroup.conf or by some srun option
Hi all,
It looks like we can use the api to avoid having to manually parse the '2='
value from the stats{tres_usage_in_max} value.
I've submitted a bug report and patch:
https://bugs.schedmd.com/show_bug.cgi?id=6004
The minimal changes needed would be in the attched seff.patch.
Hope that