Hey guys,
So in the past we had 3 prioritization factors in effect: partition, age
and fairshare and they were working wonderfully. Currently partition has
no effect for us as it's all one large shared partition so everyone gets
the same value there. So everything is balanced in age and fairshare, In
the past age and fairshare worked splendidly, and we have it set as I
understand to refresh counters every 2 weeks... so basically everyone
had a blank slate this past weekend. What our current issue is as follows...
A problematic user has submitted 70k jobs to a partition with 512 slots
and she is currently consuming all slots... basically locking up the
queue for anybody else that wants to try and work.
Normally fairshare kicks in and jumps other users to the top of the
queue but when a new user submitted 25 jobs (vs the 70k) he didn't get
any fairshare weighting at all...
JOBID USER PRIORITY AGE FAIRSHARE JOBSIZE PARTITION
QOS NICE
162986 uid1 8371 371 0 0 8000
0 0
162987 uid1 8371 371 0 0 8000
0 0
162988 uid1 8371 371 0 0 8000
0 0
180698 uid2 8320 321 0 0 8000
0 0
180699 uid2 8320 321 0 0 8000
0 0
180700 uid2 8320 321 0 0 8000
0 0
180701 uid2 8320 321 0 0 8000
0 0
I'm used to seeing a user like that get 5000 fairshare to start out
with... Thoughts?
AC
- [slurm-dev] fairshare incrementing Alan V. Cowles
-