Re: [slurm-users] SLURM upgrade from 20.11.3 to 20.11.9 misidentification of job steps

2022-05-18 Thread John DeSantis
Hello, It also appears that random jobs are being identified as using too much memory, despite being well within limits. For example, a job is running that requested 2048 MB per CPU and all processes are within the limit. But, the job is identified as being over limit when it isn't. Please

[slurm-users] SLURM upgrade from 20.11.3 to 20.11.9 misidentification of job steps

2022-05-18 Thread John DeSantis
Hello, Due to the recent CVE posted by Tim, we did upgrade from SLURM 20.11.3 to 20.11.9. Today, I received a ticket from a user with their output files populated with the "slurmstepd: error: Exceeded job memory limit" message. But, the jobs are still running and it seems that the

Re: [slurm-users] container on slurm cluster

2022-05-18 Thread Brian Andrus
Ghui, It seems that things are doing what they should. You are allowing an account to become root inside the pod and the pod is considered a trusted environment by slurm (you are running munge inside it). So as far as slurm is concerned, 'root' from a trusted environment is submitting a job.

Re: [slurm-users] Slurm notifications, a more comprehensive solution - goslmailer

2022-05-18 Thread Petar Jager
Hi Hermann, You're welcome, looking forward to hearing some feedback from you. Regarding the matrix integration, or any other for that matter, gosl code was written with extensibility in mind. Meaning, all the helper code required to create a new connector is packaged and easily reusable. If you

Re: [slurm-users] container on slurm cluster

2022-05-18 Thread Josef Dvoracek
> I had config the right slurm and munge inside the container. this is the reason. Who has access to munge.key can effectively became root at slurm cluster. you should not disclose munge.key to containers. cheers josef On 18. 05. 22 9:13, GHui wrote: ...I had config the right slurm and

Re: [slurm-users] container on slurm cluster

2022-05-18 Thread Markus Kötter
Hi, On 18.05.22 08:25, Stephan Roth wrote: Personal note: I'm not sure what I'd choose as a successor to Singularity 3.8, yet. Thoughts are welcome. I can recommend nvidia enroot/pyxis. enroot does unprivileged sandboxes/containers, pyxis is the slurm SPANK glue.

Re: [slurm-users] container on slurm cluster

2022-05-18 Thread Stephan Roth
On 17.05.22 17:17, Timo Rothenpieler wrote: On 17.05.2022 15:58, Brian Andrus wrote: You are starting to understand a major issue with most containers. I suggest you check out Singularity, which was built from the ground up to address most issues. And it can run other container types (eg: