Dear all,

we are experiencing some strange job cancellations from time to time. Some of our users report that their jobs have been cancelled, but they or an admin user did not execute such a command.

If I do a command like the following:

sacct -s CA -S 2015-04-01 -E 2015-11-30 -o JobID%20,User,State%30 | grep CANC x |grep by | awk '{ if ($2 != $5 ) print $1,$2,$5}' |grep -v CANCELLE |grep -v " 0"

I get the following output:

JobID   owner cancled_by                
1730636 70531 70497
1730647 70531 71187
1730648 70531 71187
1740541 70531 71187
1740548 70531 71187
1741050 70531 71187
1741051 70531 71187
1742205 70531 71187
1742212 70032 71187


When switching to user '71187' and executing a 'scancel' to a job from from user '70032' (similar to the last entry of the output above), it is impossible to get a job cancelled.

Is this a known issue? We are running slurm version 14.11.8.

best regards,
Markus

--
=====================================================
Dr. Markus Stöhr
Zentraler Informatikdienst BOKU Wien / TU Wien
Wiedner Hauptstraße 8-10
1040 Wien

Tel. +43-1-58801-420754
Fax  +43-1-58801-9420754

Email: [email protected]
=====================================================

Reply via email to