Hi Paddy, Paddy Doyle <[email protected]> writes:
> Hi all, > > We've noticed an oddity when it comes to how JOBSIZE is calculated in a > priority/multifactor setup. > > Here are two jobs both asking for 128 cores (the nodes have 8 cores each, so > that's 16 nodes), but one ends up with almost double the JOBSIZE value: > > user1: salloc --ntasks 128 .... > user2: sbatch --ntasks 128 --nodes 16 .... > > (I don't think that there's a difference whether it's salloc/sbatch) > > The relevant values from "sprio -l": > > JOBID USER PRIORITY AGE FAIRSHARE JOBSIZE > 48020 user1 138877 28989 2546 6343 > 48081 user2 138911 24503 1468 11940 > > > In "scontrol show job" the relevant values are: > > user1: NumNodes=16 NumCPUs=128 > user2: NumNodes=16-16 NumCPUs=128 > > > Basically it's a problem because user2 has figured this out and it using it to > game the system, and user1 is getting annoyed (their FAIRSHARE *should* win in > this case). > > I've noticed this before in older versions (possibly back to 2.x), so it's > not a > recent change. > > Has anyone else noticed this? > > > We will speak to the people involved (they're both in the same group in this > instance, so we can ask them to play nicely with each other). > > But it would be good if there was a way to harden the priority system against > it. I've looked in slurm.conf and can't see any parameter which might be > relevant. > > Or is the current behaviour a desired feature for some reason that I'm not > seeing? > > Thanks, > Paddy If jobs are sharing nodes, I would say it is desirable to be able to distinguish between job which wants just any 128 cores and one which wants 16 complete 8-core-nodes. The former, if cpu-intensive, might well be able to fill up cores left empty by memory-intensive jobs; the latter requires that complete nodes be drained to make space for it, which can be a waste of resources. Because our nodes are shared, we usually try to talk users out of specifying the number of nodes for MPI jobs, because the reduced wait-time often makes up for any loss of efficiency due to the increased spread across nodes and switches. Cheers, Loris -- This signature is currently under construction.
