Re: [slurm-users] Restoring Slurm

2018-04-10 Thread Ole Holm Nielsen

On 04/09/2018 10:54 PM, Roberts, John E. wrote:

The documentation is a little unclear to me, so I was wondering how do a 
complete backup and restore of Slurm for testing and/or disaster recovery.

I'm looking to upgrade Slurm from 16.05.10 to the latest and I'm not sure all 
of what should go. I stood up some VMs to test this upgrade and most things 
looked good after running through it. I imported the mysql DB from our slurmdbd 
instance, installed all of the relevant packages and copied over configs. 
Everything looked ok between the slurmctld node, DB node and test compute 
nodes. Accounts and associations all looked right and I could run jobs.

Some things, however, that seemed to be missing were any kind of usage data. Using Slurm bank, all 
accounts were reset to 0 (rawusage), but they all had the correct limits. Job history was also 
gone. After reading into it a bit farther and testing some more, it seemed all of this was stored 
in the "StateSaveLocation". Should this also not be in the DB? Is there a way to get this 
into the DB? Is this related to the field I saw "JobCompType"? This specifically didn't 
seem clear to me if it needed to be set if we were using slurmdbd.

With that said, for this and disaster recovery, do we need to save copies of 
anything besides the DB, config files and state saves?


John, perhaps you may find some useful information in my Slurm Wiki, for 
example, the page about Slurm upgrading: 
https://wiki.fysik.dtu.dk/niflheim/Slurm_installation#upgrading-slurm


The database backup/restore may be useful as well: 
https://wiki.fysik.dtu.dk/niflheim/Slurm_database#backup-and-restore-of-database


Before major upgrades I've made dry-run test upgrades as documented in 
https://wiki.fysik.dtu.dk/niflheim/Slurm_installation#make-a-dry-run-database-upgrade


/Ole



[slurm-users] Restoring Slurm

2018-04-09 Thread Roberts, John E.
Hi,

The documentation is a little unclear to me, so I was wondering how do a 
complete backup and restore of Slurm for testing and/or disaster recovery.

I'm looking to upgrade Slurm from 16.05.10 to the latest and I'm not sure all 
of what should go. I stood up some VMs to test this upgrade and most things 
looked good after running through it. I imported the mysql DB from our slurmdbd 
instance, installed all of the relevant packages and copied over configs. 
Everything looked ok between the slurmctld node, DB node and test compute 
nodes. Accounts and associations all looked right and I could run jobs.

Some things, however, that seemed to be missing were any kind of usage data. 
Using Slurm bank, all accounts were reset to 0 (rawusage), but they all had the 
correct limits. Job history was also gone. After reading into it a bit farther 
and testing some more, it seemed all of this was stored in the 
"StateSaveLocation". Should this also not be in the DB? Is there a way to get 
this into the DB? Is this related to the field I saw "JobCompType"? This 
specifically didn't seem clear to me if it needed to be set if we were using 
slurmdbd.

With that said, for this and disaster recovery, do we need to save copies of 
anything besides the DB, config files and state saves?

--
Thanks!
John