So when I tried to do an archive dump I got the following error. What does this mean?

[root@holy-slurm01 slurm]# sacctmgr -i archive dump
sacctmgr: error: slurmdbd: Getting response to message type 1459
sacctmgr: error: slurmdbd: DBD_ARCHIVE_DUMP failure: No error
 Problem dumping archive: Unspecified error

It also caused the slurmdbd to crash so I had to restart it. Here is the log:

Mar 10 08:18:21 holy-slurm01 slurmctld[20144]: slurmdbd: agent queue size 50
Mar 10 09:20:57 holy-slurm01 slurmdbd[47429]: error: mysql_query failed: 1205 Lock wait timeout exceeded; try restarting transaction Mar 10 09:20:57 holy-slurm01 slurmdbd[47429]: fatal: mysql gave ER_LOCK_WAIT_TIMEOUT as an error. The only way to fix this is restart the calling program Mar 10 09:20:57 holy-slurm01 slurmctld[20144]: error: slurmdbd: Getting response to message type 1407 Mar 10 09:20:57 holy-slurm01 slurmctld[20144]: slurmdbd: reopening connection Mar 10 09:21:02 holy-slurm01 slurmctld[20144]: error: slurmdbd: Sending message type 1472: 11: Connection refused Mar 10 09:21:02 holy-slurm01 slurmctld[20144]: error: slurmdbd: DBD_SEND_MULT_JOB_START failure: Connection refused

It did manage to dump:

[root@holy-slurm01 archive]# ls -ltr
total 160260
-rw------- 1 slurm slurm_users 163847476 Mar 10 09:16 odyssey_event_archive_2013-08-01T00:00:00_2014-08-31T23:59:59 -rw------- 1 slurm slurm_users 253639 Mar 10 09:17 odyssey_suspend_archive_2013-08-01T00:00:00_2014-08-31T23:59:59

Is it safe to try again?

-Paul Edmon-

On 03/06/2015 03:07 PM, Paul Edmon wrote:

Ah, okay, that was the command I was looking for. I wasn't sure how to force it. Thanks.

-Paul Edmon-

On 03/06/2015 01:43 PM, Danny Auble wrote:

It looks like I might stand corrected though. It looks like you will have to wait for the month to go by before the purge starts.

With a lot of jobs it may take a while depending on the speed of your disk and such. If you have the debugflag=DB_USAGE you should see the sql statements go by. This will be quite verbose, so it might not be that great of an idea. You can edit the slurmdbd.conf file with the flag and run "sacctmgr reconfig" to add or remove the flag.

You can force a purge which in turn will force an archive with

sacctmgr archive dump

see http://slurm.schedmd.com/sacctmgr.html

Danny

On 03/06/2015 10:12 AM, Paul Edmon wrote:

How long does that typically take? Because I have done it on our large job database and I have seen nothing yet.

-Paul Edmon

On 03/06/2015 01:07 PM, Danny Auble wrote:

If you had older than 6 month data I would expect it to purge on restart of the slurmdbd. You will see a message in the log when the archive file is created.

On 03/06/2015 09:58 AM, Paul Edmon wrote:

Okay, that's what I suspected. We set it to 6 months. So I guess then the purge will happen on April 1st.

-Paul Edmon-

On 03/06/2015 12:33 PM, Danny Auble wrote:

Paul, do you have Purge* set up in the slurmdbd.conf? Archiving takes place during the Purge process. If no Purge values are set archiving will never take place since nothing is ever purged. When Purge values are set the related archivings take place on an hourly, daily, and monthly basis depending on the units your purge values are set to.

If PurgeJobs=2months the archive would take place at the beginning of each month. If it were set to 2hours it would happen each hour. The purge will also happen on slurmdbd startup as well if running things for the first time.

Danny


On 03/06/2015 08:20 AM, Paul Edmon wrote:

So we recently turned this on to archive jobs older than 6 months. However when we restarted slurmdbd nothing happened, at least no file was deposited at the specified archive location. Is there a way to force it to purge? When is the archiving scheduled to be done? We definitely have jobs older than 6 months in the database, I'm just curious about the schedule of when the archiving is done.

-Paul Edmon-

Reply via email to