Hi Taras,

I see jobs get requeued like this when slurm fails to lookup a user id while 
starting the job. This happens when our campus ldap directory goes down briefly 
:( .

Best,
Chris

—
Christopher Coffey
High-Performance Computing
Northern Arizona University
928-523-1167








On 10/14/15, 4:40 AM, "Taras Shapovalov" <[email protected]> 
wrote:

>Hi John,
>
>
>Thanks for the advice. It would be nice to document this pending reason on the 
>web site (and maybe rename).
>
>
>Best regards,
>
>
>Taras
>
>
>On Wed, Oct 14, 2015 at 2:03 PM, John Desantis 
><[email protected]> wrote:
>
>Taras,
>We see this message when a scheduled node has experienced an issue with slurmd 
>and/or munge, and can no longer accept jobs.
>You can use 'scontrol release job_id' to reschedule the job.  Please note 
>though, that 'job_id' js the actual job number reported in squeue.
>John DeSantis
>On Oct 14, 2015 5:51 AM, "Taras Shapovalov" 
><[email protected]> wrote:
>
>Hi guys,
>
>
>We have faced a weird pending reason, which is not in the list of the known 
>pending reasons (
>http://slurm.schedmd.com/squeue.html):
>
>
>[root@head ~]# scontrol show job | grep Reason
>   JobState=PENDING Reason=job_requeued_in_held_state Dependency=(null)
>[root@head ~]# 
>
>
>
>Is it expected? If yes, then where I can find some description of the pending 
>reason?
>
>
>Slurm version: 14.11.6
>
>
>Best regards,
>
>
>Taras
>
>
>
>
>
>
>
>
>
>
>
>
>

Reply via email to