Dear All, Any tips on troubleshooting when jobs are waiting even when I have plenty of idle nodes. I have looked at FAQ and made sure noting obvious is falling out.
I have PriorityType=priority/multifactor enabled & SchedulerType=sched/backfill Also I can successfully run jobs in a partition by just running srun /bin/hostname and use up the entire queue without any issues. But when I have sbatch jobs queued up they just stay there for eternity. I had bumped up the priority and saw no change. Reason for waiting in the queue was originally priority and then when I bumped it up reason becomes none. I have about 100,000 jobs waiting in the queue, so debugging is becoming a little painful and chatty. Any hints/options to debug this will be very helpful. Please advise. Thank you, Amit
