Hello,

We have been trying to upgrade slurm on our cluster from 16.05.6 to 17.11.3. 
I'm thinking this should be doable? Past upgrades have been a breeze, and I 
believe during the last one, the db upgrade took like 25 minutes. Well now, the 
db upgrade process is taking far too long. We previously attempted the upgrade 
during a maintenance window and the upgrade process did not complete after 24 
hrs. I gave up on the upgrade and reverted the slurm version back by restoring 
a backup db.

Since the failed attempt at the upgrade, I've archived a bunch of jobs as we 
had 4 years of jobs in the database. Now only keeping last 1.5 years worth. 
This reduced our db size down from 3.7GB to 1.1GB. We are now archiving jobs 
regularly through slurm.

I've finally had time to look at this a bit more and we've restored the reduced 
database onto another system to test the upgrade process in a dev environment, 
hoping to prove that the slimmed down db will upgrade within a reasonable 
amount of time. Yet, the current upgrade on this dev system has already taken 
20 hrs. The database has 1.8M jobs. That doesn't seem like that many jobs!

The conversion is stuck on this command:

update "monsoon_job_table" as job left outer join ( select job_db_inx, 
SUM(consumed_energy) 'sum_energy' from "monsoon_step_table" where id_step >= 0 
and consumed_energy != 18446744073709551614 group by job_db_inx ) step on 
job.job_db_inx=step.job_db_inx set job.tres_alloc=concat(job.tres_alloc, 
concat(',3=', case when step.sum_energy then step.sum_energy else 
18446744073709551614 END)) where job.tres_alloc != '' && job.tres_alloc not 
like '%,3=%':

The system is no slouch:

28 core, E5-2680 v4 2.4GHz
SSD
128GB memory

Anyone have this issue? Anyone have a suggestion? This seems like a ridiculous 
amount of time needed to perform the upgrade! The database is healthy as far as 
I see. No errors in the slurmdbd log, etc.

Let me know if you need more info!

Best,
Chris

—
Christopher Coffey
High-Performance Computing
Northern Arizona University
928-523-1167
 

Reply via email to