Typically the long db upgrades are only for major version upgrades.
Most of the time minor versions don't take nearly as long.
At least with our upgrade from 17.02.9 to 17.11.3 the upgrade only took
1.5 hours with 6 months worth of jobs (about 10 million jobs). We don't
track energy usage though so perhaps we avoided that particular query
due to that.
From past experience these major upgrades can take quite a bit of time
as they typically change a lot about the DB structure in between major
versions.
-Paul Edmon-
On 02/22/2018 06:17 AM, Malte Thoma wrote:
FYI:
* We broke our upgrade from 17.02.1-2 to 17.11.2 after about 18 h.
* Dropped the job table ("truncate xyz_job_table;")
* Executed the everlasting sql command by hand on a back-up database
* Meanwhile we did the slurm upgrade (fast&easy)
* Reset the First-Job-ID to a high number
* Inserted the converted datatable in the real database again.
It took two experts for this task and we would appreciate a better
upgrade-concept very much!
I fact, we hesitate to upgrade from 17.11.2 to 17.11.3, because we
are afraid of similar problems. Does anyone has experience with this?
It would be good to know if there is ANY chance if future upgrades
will cause the same problems or if this will become better?
Regards,
Malte
Am 22.02.2018 um 01:30 schrieb Christopher Benjamin Coffey:
This is great to know Kurt. We can't be the only folks running into
this.. I wonder if the mysql update code gets into a deadlock or
something. I'm hoping a slurm dev will chime in ...
Kurt, out of band if need be, I'd be interested in the details of
what you ended up doing.
Best,
Chris
—
Christopher Coffey
High-Performance Computing
Northern Arizona University
928-523-1167
On 2/21/18, 5:08 PM, "slurm-users on behalf of Kurt H Maier"
<slurm-users-boun...@lists.schedmd.com on behalf of k...@sciops.net>
wrote:
On Wed, Feb 21, 2018 at 11:56:38PM +0000, Christopher Benjamin
Coffey wrote:
> Hello,
>
> We have been trying to upgrade slurm on our cluster from
16.05.6 to 17.11.3. I'm thinking this should be doable? Past upgrades
have been a breeze, and I believe during the last one, the db upgrade
took like 25 minutes. Well now, the db upgrade process is taking far
too long. We previously attempted the upgrade during a maintenance
window and the upgrade process did not complete after 24 hrs. I gave
up on the upgrade and reverted the slurm version back by restoring a
backup db.
We hit this on our try as well: upgrading from 17.02.9 to
17.11.3. We
truncated our job history for the upgrade, and then did the rest
of the
conversion out-of-band and re-imported it after the fact. It
took us
almost sixteen hours to convert a 1.5 million-job store.
We got hung up on precisely the same query you did, on a
similarly hefty
machine. It caused us to roll back an upgrade and try again
during our
subsequent maintenance window with the above approach.
khm