Re: [slurm-users] [EXT] job_submit.lua - choice of error on failure / job_desc.gpus?

2020-12-07 Thread Sean Crosby
Hi Loris, We have a completely separate test system, complete with a few worker nodes, separate slurmctld/slurmdbd, so we can test Slurm upgrades etc. Sean -- Sean Crosby | Senior DevOpsHPC Engineer and HPC Team Lead Research Computing Services | Business Services The University of Melbourne,

Re: [slurm-users] [EXT] job_submit.lua - choice of error on failure / job_desc.gpus?

2020-12-07 Thread Loris Bennett
Hi Sean, Thanks for the code - looks like you have put a lot more thought into it than I have into mine. I'll certainly have to look at handling the 'tres-per-*' options. By the way, how to you do your testing? As I don't have at test cluster, currently I'm doing "open heart" testing, but I

Re: [slurm-users] [EXT] job_submit.lua - choice of error on failure / job_desc.gpus?

2020-12-04 Thread Sean Crosby
Hi Loris, This is our submit filter for what you're asking. It checks for both --gres and --gpus ESLURM_INVALID_GRES=2072 ESLURM_BAD_TASK_COUNT=2025 if ( job_desc.partition ~= slurm.NO_VAL ) then if (job_desc.partition ~= nil) then if (string.match(job_desc.partition,"gpgpu") or