Dear all,
we are experiencing some strange job cancellations from time to time.
Some of our users report that their jobs have been cancelled, but they
or an admin user did not execute such a command.
If I do a command like the following:
sacct -s CA -S 2015-04-01 -E 2015-11-30 -o JobID%20,User,State%30 | grep
CANC x |grep by | awk '{ if ($2 != $5 ) print $1,$2,$5}' |grep -v
CANCELLE |grep -v " 0"
I get the following output:
JobID owner cancled_by
1730636 70531 70497
1730647 70531 71187
1730648 70531 71187
1740541 70531 71187
1740548 70531 71187
1741050 70531 71187
1741051 70531 71187
1742205 70531 71187
1742212 70032 71187
When switching to user '71187' and executing a 'scancel' to a job from
from user '70032' (similar to the last entry of the output above), it is
impossible to get a job cancelled.
Is this a known issue? We are running slurm Version 14.11.8.
best regards,
Markus
--
=====================================================
Dr. Markus Stöhr
Zentraler Informatikdienst BOKU Wien / TU Wien
Wiedner Hauptstraße 8-10
1040 Wien
Tel. +43-1-58801-420754
Fax +43-1-58801-9420754
Email: [email protected]
=====================================================