Hi Taras, I see jobs get requeued like this when slurm fails to lookup a user id while starting the job. This happens when our campus ldap directory goes down briefly :( .
Best, Chris — Christopher Coffey High-Performance Computing Northern Arizona University 928-523-1167 On 10/14/15, 4:40 AM, "Taras Shapovalov" <[email protected]> wrote: >Hi John, > > >Thanks for the advice. It would be nice to document this pending reason on the >web site (and maybe rename). > > >Best regards, > > >Taras > > >On Wed, Oct 14, 2015 at 2:03 PM, John Desantis ><[email protected]> wrote: > >Taras, >We see this message when a scheduled node has experienced an issue with slurmd >and/or munge, and can no longer accept jobs. >You can use 'scontrol release job_id' to reschedule the job. Please note >though, that 'job_id' js the actual job number reported in squeue. >John DeSantis >On Oct 14, 2015 5:51 AM, "Taras Shapovalov" ><[email protected]> wrote: > >Hi guys, > > >We have faced a weird pending reason, which is not in the list of the known >pending reasons ( >http://slurm.schedmd.com/squeue.html): > > >[root@head ~]# scontrol show job | grep Reason > JobState=PENDING Reason=job_requeued_in_held_state Dependency=(null) >[root@head ~]# > > > >Is it expected? If yes, then where I can find some description of the pending >reason? > > >Slurm version: 14.11.6 > > >Best regards, > > >Taras > > > > > > > > > > > > >
