Hi,
I have set
MinJobAge=43200
meaning that I want to keep job information
for several hours after job termination,
so I can run "scontrol show job" for the
job when debugging problems.
After an upgrade, I think to version 2.3.0,
I got a lot of error lines in the slurmctld.log,
and they are still there in version 2.3.1.
Here are some examples:
[2011-10-28T15:30:01] error: Error opening file
/usr/local/slurm-state/job.1522753/script, No such file or directory
[2011-10-28T15:32:00] error: Error opening file
/usr/local/slurm-state/job.1522753/script, No such file or directory
[2011-10-28T15:33:39] error: Error opening file
/usr/local/slurm-state/job.1522753/script, No such file or directory
[2011-10-28T15:43:20] error: Error opening file
/usr/local/slurm-state/job.1522753/script, No such file or directory
[2011-10-28T15:43:32] error: Error opening file
/usr/local/slurm-state/job.1522753/script, No such file or directory
[2011-10-28T15:44:02] error: Error opening file
/usr/local/slurm-state/job.1522753/script, No such file or directory
[2011-10-28T15:45:01] error: Error opening file
/usr/local/slurm-state/job.1522753/script, No such file or directory
[2011-10-28T15:48:48] error: Error opening file
/usr/local/slurm-state/job.1522753/script, No such file or directory
[2011-10-28T15:48:48] error: Error opening file
/usr/local/slurm-state/job.1522753/script, No such file or directory
Job number 1522753 is a finished job, so I can understand
that there is no state directory for it.
Quite often some part of slurmctld tries to find a file
that is not there, for a finished job. I think, but am
not sure, that these lines appear after a restart of
slurmctld and disappears after several hours (perhaps
after 43200 minutes, I do not know?).
My simple wish is that I get none of these error lines,
or perhaps one warning only and no errors.
For the most part, I like SLURM very much, though!
Best regards,
-- Lennart Karlsson, UPPMAX, Uppsala University, Sweden
http://www.uppmax.uu.se