date:20180204

Re: [slurm-users] Strange problem with Slurm 17.11.0: "batch job complete failure"

2018-02-04 Thread Alan Orth

I came here looking for this! The last time I tried it in early 2017-12 it was still "broken" with SLURM 17.11.0. Glad to see that it was fixed with 17.11.1 (and to know why). I've now got PAM limits being applied correctly on my cluster. Thanks for the link, Andy. Cheers, On Fri, Dec 8, 2017 at

[slurm-users] slurm issues with CUDA unified memory

2018-02-04 Thread Jan Dettmer

Hello, I am operating a small cluster of 8 nodes that have 20 cores (2 10-core cpus) and 2 GPUs each (Nvidia K80). To date, I have been successfully running CUDA code where I typically submit single-cpu single-gpu jobs to nodes via slurm with the cons_res and CR_CPU options. More recently, I