Chris,
can the user start an 'srun' session?
________________________________________
From: Chris Woelkers - NOAA Affiliate [chris.woelk...@noaa.gov]
Sent: 03 April 2017 20:31
To: slurm-dev
Subject: [slurm-dev] Fwd: job requeued in held state

I am running a small HPC, only 24 nodes, via slurm and am having an
issue where one of the users is unable to submit any jobs.
The user is new and whenever a job is submitted it shows the "job
requeued in held state" state and is never actually ran. We have left
the job sitting for over three days and it does not start. We have
tried releasing the job and it does not start. Here are the log
entries after an attempted release:

[2017-04-03T19:16:24.173] sched: update_job: releasing hold for job_id
1938 uid 0
[2017-04-03T19:16:24.174] _slurm_rpc_update_job complete JobId=1938
uid=0 usec=375
[2017-04-03T19:16:24.919] sched: Allocate JobId=1938
NodeList=rhinonode[07-14] #CPUs=192
[2017-04-03T19:16:25.017] _slurm_rpc_requeue: Processing RPC:
REQUEST_JOB_REQUEUE from uid=0
[2017-04-03T19:16:25.035] Requeuing JobID=1938 State=0x0 NodeCnt=0

The user has the same permissions as the older users that can run jobs.
The script that is being run is a simple test script and no matter
where the output is redirected, an NFS mount(for our SAN), the local
home directory, or the tmp directory, the result is the same.

Any idea as to what might be happening?

Thanks,

Chris Woelkers
Caelum Research Corp.
Linux Server and Network Administrator
NOAA GLERL
Any views or opinions presented in this email are solely those of the author 
and do not necessarily represent those of the company. Employees of XMA Ltd are 
expressly required not to make defamatory statements and not to infringe or 
authorise any infringement of copyright or any other legal right by email 
communications. Any such communication is contrary to company policy and 
outside the scope of the employment of the individual concerned. The company 
will not accept any liability in respect of such communication, and the 
employee responsible will be personally liable for any damages or other 
liability arising. XMA Limited is registered in England and Wales (registered 
no. 2051703). Registered Office: Wilford Industrial Estate, Ruddington Lane, 
Wilford, Nottingham, NG11 7EP

Reply via email to