So when I tried to do an archive dump I got the following error. What
does this mean?
[root@holy-slurm01 slurm]# sacctmgr -i archive dump
sacctmgr: error: slurmdbd: Getting response to message type 1459
sacctmgr: error: slurmdbd: DBD_ARCHIVE_DUMP failure: No error
Problem dumping archive: Unspecified error
It also caused the slurmdbd to crash so I had to restart it. Here is
the log:
Mar 10 08:18:21 holy-slurm01 slurmctld[20144]: slurmdbd: agent queue size 50
Mar 10 09:20:57 holy-slurm01 slurmdbd[47429]: error: mysql_query failed:
1205 Lock wait timeout exceeded; try restarting transaction
Mar 10 09:20:57 holy-slurm01 slurmdbd[47429]: fatal: mysql gave
ER_LOCK_WAIT_TIMEOUT as an error. The only way to fix this is restart
the calling program
Mar 10 09:20:57 holy-slurm01 slurmctld[20144]: error: slurmdbd: Getting
response to message type 1407
Mar 10 09:20:57 holy-slurm01 slurmctld[20144]: slurmdbd: reopening
connection
Mar 10 09:21:02 holy-slurm01 slurmctld[20144]: error: slurmdbd: Sending
message type 1472: 11: Connection refused
Mar 10 09:21:02 holy-slurm01 slurmctld[20144]: error: slurmdbd:
DBD_SEND_MULT_JOB_START failure: Connection refused
It did manage to dump:
[root@holy-slurm01 archive]# ls -ltr
total 160260
-rw------- 1 slurm slurm_users 163847476 Mar 10 09:16
odyssey_event_archive_2013-08-01T00:00:00_2014-08-31T23:59:59
-rw------- 1 slurm slurm_users 253639 Mar 10 09:17
odyssey_suspend_archive_2013-08-01T00:00:00_2014-08-31T23:59:59
Is it safe to try again?
-Paul Edmon-
On 03/06/2015 03:07 PM, Paul Edmon wrote:
Ah, okay, that was the command I was looking for. I wasn't sure how
to force it. Thanks.
-Paul Edmon-
On 03/06/2015 01:43 PM, Danny Auble wrote:
It looks like I might stand corrected though. It looks like you will
have to wait for the month to go by before the purge starts.
With a lot of jobs it may take a while depending on the speed of your
disk and such. If you have the debugflag=DB_USAGE you should see the
sql statements go by. This will be quite verbose, so it might not be
that great of an idea. You can edit the slurmdbd.conf file with the
flag and run "sacctmgr reconfig" to add or remove the flag.
You can force a purge which in turn will force an archive with
sacctmgr archive dump
see http://slurm.schedmd.com/sacctmgr.html
Danny
On 03/06/2015 10:12 AM, Paul Edmon wrote:
How long does that typically take? Because I have done it on our
large job database and I have seen nothing yet.
-Paul Edmon
On 03/06/2015 01:07 PM, Danny Auble wrote:
If you had older than 6 month data I would expect it to purge on
restart of the slurmdbd. You will see a message in the log when
the archive file is created.
On 03/06/2015 09:58 AM, Paul Edmon wrote:
Okay, that's what I suspected. We set it to 6 months. So I guess
then the purge will happen on April 1st.
-Paul Edmon-
On 03/06/2015 12:33 PM, Danny Auble wrote:
Paul, do you have Purge* set up in the slurmdbd.conf? Archiving
takes place during the Purge process. If no Purge values are set
archiving will never take place since nothing is ever purged.
When Purge values are set the related archivings take place on an
hourly, daily, and monthly basis depending on the units your
purge values are set to.
If PurgeJobs=2months the archive would take place at the
beginning of each month. If it were set to 2hours it would
happen each hour. The purge will also happen on slurmdbd startup
as well if running things for the first time.
Danny
On 03/06/2015 08:20 AM, Paul Edmon wrote:
So we recently turned this on to archive jobs older than 6
months. However when we restarted slurmdbd nothing happened, at
least no file was deposited at the specified archive location.
Is there a way to force it to purge? When is the archiving
scheduled to be done? We definitely have jobs older than 6
months in the database, I'm just curious about the schedule of
when the archiving is done.
-Paul Edmon-