I am scheduling an HPCC job on a certain set of nodes using -nodelist
I am getting informed that a node is not available - but for the life of me I
cannot expand the NODELIST(REASON)
fied to show it.
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
340 defq run-hpcc johnh PD 0:00 4 (Resources)
344 defq run-hpcc johnh PD 0:00 4
(ReqNodeNotAvail(Unavailable:co
346 defq run-hpcc johnh PD 0:00 1
(ReqNodeNotAvail(Unavailable:co
350 defq run-hpcc johnh PD 0:00 1
(ReqNodeNotAvail(Unavailable:co
348 defq run-hpcc johnh R 1:17 2 comp[15-16]
Also this may be relevant - I have the known problem of a job not teminating
properly.
In slurmdbd
[2016-05-26T09:00:00.838] error: We have more allocated time than is possible
(172800 > 126000) for cluster slurm_cluster(35) from 2016-05-26T08:00:00 -
2016-05-26T09:00:00
I ruan the lost.pl script from the bugzilla and it finds not still-running
jobs.
slurm version 14.11.6
Any views or opinions presented in this email are solely those of the author
and do not necessarily represent those of the company. Employees of XMA Ltd are
expressly required not to make defamatory statements and not to infringe or
authorise any infringement of copyright or any other legal right by email
communications. Any such communication is contrary to company policy and
outside the scope of the employment of the individual concerned. The company
will not accept any liability in respect of such communication, and the
employee responsible will be personally liable for any damages or other
liability arising. XMA Limited is registered in England and Wales (registered
no. 2051703). Registered Office: Wilford Industrial Estate, Ruddington Lane,
Wilford, Nottingham, NG11 7EP