Hello, I'm looking into why some jobs are getting cancelled/requeued on my cluster. The default hypothesis is that it is priority (QOS) preemption, which was recently turned on. But it seems to be happening way more than it should based on how many jobs are actually being submitted to a preemption-capable QOS. I tried looking for jobs which were in the PREEMPTED state at some point:
$ sacct --allusers --qos=normal --state=PREEMPTED --starttime=2017-06-1 --duplicates --format=jobid,elapsed,qos,user,state,exitcode There were very few results, and none of the jobs from users who recently reported lots of preemptions. When I tried searching for the information of the jobs of one of these users, many jobs had been in the REQUEUED (but not PREEMPTED) state. But what is the REQUEUED state? I can't find any mention of it in the documentation <https://slurm.schedmd.com/sacct.html> (searched 'state_list'). Does this mean that the jobs aren't being preempted due to priority? We're running Slurm 16.05.4. Thanks, Evan
