Re: [slurm-users] Extreme long db upgrade 16.05.6 -> 17.11.3

2019-04-05 Thread Ole Holm Nielsen
Hi Lech, Thanks! I added the 18.08 Release Notes reference to https://wiki.fysik.dtu.dk/niflheim/Slurm_installation#database-upgrade-from-slurm-17-02-and-older I've already upgraded from 17.11 to 18.08 without your patch, and this went smoothly as expected. We upgraded from 17.02 to 17.11

Re: [slurm-users] Extreme long db upgrade 16.05.6 -> 17.11.3

2019-04-05 Thread Lech Nieroda
Hi Ole, your summary is correct as far as I can tell and will hopefully help some users. One thing I’d add is the remark from the 18.08 Release Notes ( https://github.com/SchedMD/slurm/blob/slurm-18.08/RELEASE_NOTES ), which adds mysql 5.5 to the list. They’ve mentioned that mysql 5.5 is the

Re: [slurm-users] Extreme long db upgrade 16.05.6 -> 17.11.3

2019-04-04 Thread Prentice Bisbal
Lech, Thanks for the explanation. Now that you explained it like that, I understand SchedMD's decision. I was misreading the situation. I was under the impression that this affected *all* db upgrades, not just those from one old version a slightly less older version. Prentice On 4/4/19

Re: [slurm-users] Extreme long db upgrade 16.05.6 -> 17.11.3

2019-04-04 Thread Lech Nieroda
> Upgrading more than 2 releases isn't supported, so I don't believe the 19.05 > slurmdbd will have the code in it to upgrade tables from earlier than 17.11. I haven’t found any mention of this in the upgrade section of the QuickStart guide (see

Re: [slurm-users] Extreme long db upgrade 16.05.6 -> 17.11.3

2019-04-04 Thread Chris Samuel
On 4/4/19 4:07 am, Lech Nieroda wrote: Furthermore, upgrades shouldn’t skip more than one release, as that would lead to loss of state files and other important information, so users probably won’t upgrade from 17.02 to 19.05 directly. If they’d do that then yes, the patch would be

Re: [slurm-users] Extreme long db upgrade 16.05.6 -> 17.11.3

2019-04-04 Thread Lech Nieroda
That’s correct but let’s keep in mind that it only concerns the upgrade process and not production runtime which has certain implications. The affected database structures have been introduced in 17.11 and an upgrade affects only versions 17.02 or prior, it wouldn’t be a problem for users who

Re: [slurm-users] Extreme long db upgrade 16.05.6 -> 17.11.3

2019-04-04 Thread Chris Samuel
On Wednesday, 3 April 2019 6:33:17 AM PDT Prentice Bisbal wrote: > Anyone else as disappointed by this as I am? I get that it's too late to > add something like this to 17.11 or 18.08, but it seems like SchedMD > isn't even interested in looking at this for 19.x Not really surprising, it's not

Re: [slurm-users] Extreme long db upgrade 16.05.6 -> 17.11.3

2019-04-03 Thread Prentice Bisbal
the dev stated that they’d rather keep that warning than fixing the issue, so I’m not sure if that’ll be enough to convince them. Anyone else as disappointed by this as I am? I get that it's too late to add something like this to 17.11 or 18.08, but it seems like SchedMD isn't even interested

Re: [slurm-users] Extreme long db upgrade 16.05.6 -> 17.11.3

2019-04-03 Thread Lech Nieroda
Hi Ole, since we aren’t using RHEL7/CentOS7 we haven’t tested it with mysql 5.5 and it’d probably carry more weight if someone running that OS would test it and add an appropriate comment. You are welcome to try it out. That being said, the release notes explicitly mention that versions 5.1 and

Re: [slurm-users] Extreme long db upgrade 16.05.6 -> 17.11.3

2019-04-03 Thread Ole Holm Nielsen
Hi Lech, Maybe you could add your arguments to the bug report https://bugs.schedmd.com/show_bug.cgi?id=6796 hoping that SchedMD may be convinced that this is a useful patch for future versions of Slurm, also for MySQL/MariaDB versions 5.5 and newer. Best regards, Ole On 4/3/19 1:17 PM,

Re: [slurm-users] Extreme long db upgrade 16.05.6 -> 17.11.3

2019-04-03 Thread Lech Nieroda
Hello Chris, I’ve submitted the bug report together with a patch. We don’t have a support contract but I suppose they’ll at least read it ;) The code is identical for 18.08.x and 19.05.x, it’s just a different offset. Kind regards, Lech > Am 02.04.2019 um 15:18 schrieb Ole Holm Nielsen : > >

Re: [slurm-users] Extreme long db upgrade 16.05.6 -> 17.11.3

2019-04-01 Thread Chris Samuel
On Monday, 1 April 2019 7:55:09 AM PDT Lech Nieroda wrote: > Further analysis of the query has shown that the mysql optimizer has choosen > the wrong execution plan. This may depend on the mysql version, ours was > 5.1.69. I suspect this is the issue documented in the release notes for 17.11:

Re: [slurm-users] Extreme long db upgrade 16.05.6 -> 17.11.3

2019-04-01 Thread Lech Nieroda
We’ve run into exactly the same problem, i.e. an extremely long upgrade process to the 17.11.x major release. Luckily, we’ve found a solution. The first approach was to tune various innodb options, like increasing the buffer pool size (8G), the log file size (64M) or the lock wait timeout (900)

Re: [slurm-users] Extreme long db upgrade 16.05.6 -> 17.11.3

2018-07-18 Thread Ole Holm Nielsen
On 07/18/2018 10:56 AM, Roshan Thomas Mathew wrote: We ran into this issue trying to move from 16.05.3 -> 17.11.7 with 1.5M records in job table. In our first attempt, MySQL reported "ERROR 1206 The total number of locks exceeds the lock table size" after about 7 hours. Increased InnoDB

Re: [slurm-users] Extreme long db upgrade 16.05.6 -> 17.11.3

2018-03-01 Thread Peter Kjellström
On Wed, 28 Feb 2018 06:51:15 +1100 Chris Samuel wrote: > On Wednesday, 28 February 2018 2:13:41 AM AEDT Miguel Gila wrote: > > > Microcode patches were not applied to the physical system, only the > > kernel was upgraded, so I'm not sure whether the performance hit > > could

Re: [slurm-users] Extreme long db upgrade 16.05.6 -> 17.11.3

2018-02-27 Thread Chris Samuel
On Wednesday, 28 February 2018 2:13:41 AM AEDT Miguel Gila wrote: > Microcode patches were not applied to the physical system, only the kernel > was upgraded, so I'm not sure whether the performance hit could come from > that or not. Yes it would, it's the kernel changes that cause the impact.

Re: [slurm-users] Extreme long db upgrade 16.05.6 -> 17.11.3

2018-02-27 Thread Miguel Gila
Microcode patches were not applied to the physical system, only the kernel was upgraded, so I'm not sure whether the performance hit could come from that or not. Reducing the size of the DB to make the upgrade process complete in a reasonable time is like shooting a mosquito with a shotgun.

Re: [slurm-users] Extreme long db upgrade 16.05.6 -> 17.11.3

2018-02-26 Thread Christopher Benjamin Coffey
Good thought Chris. Yet in our case our system does not have the spectre/meltdown kernel fix. Just to update everyone, we performed the upgrade successfully after we purged more data jobs/steps first. We did the following to ensure the purge happened right away per Hendryk's recommendation:

Re: [slurm-users] Extreme long db upgrade 16.05.6 -> 17.11.3

2018-02-23 Thread Chris Samuel
On Friday, 23 February 2018 8:04:50 PM AEDT Miguel Gila wrote: > Interestingly enough, a poor vmare VM (2CPUs, 3GB/RAM) with MariaDB 5.5.56 > outperformed our central MySQL 5.5.59 (128GB, 14core, SAN) by a factor of > at least 3 on every table conversion. Wild idea completely out of left field..

Re: [slurm-users] Extreme long db upgrade 16.05.6 -> 17.11.3

2018-02-23 Thread Ole Holm Nielsen
On 22-02-2018 21:27, Christopher Benjamin Coffey wrote: Thanks Paul. I didn't realize we were tracking energy ( . Looks like the best way to stop tracking energy is to specify what you want to track with AccountingStorageTRES ? I'll give that a try. Perhaps it's a good idea for a lot of

Re: [slurm-users] Extreme long db upgrade 16.05.6 -> 17.11.3

2018-02-23 Thread Miguel Gila
We recently ran a similar exercise: when updating from 17.02.7 to 17.11.03-2, we had to stop the upgrade on our production DB (shared with other databases) after nearly half-day into it. It had reached a job table for a system with 6 million jobs and still had to go thru another one with >7

Re: [slurm-users] Extreme long db upgrade 16.05.6 -> 17.11.3

2018-02-22 Thread Christopher Benjamin Coffey
Thanks Paul. I didn't realize we were tracking energy ( . Looks like the best way to stop tracking energy is to specify what you want to track with AccountingStorageTRES ? I'll give that a try. Best, Chris — Christopher Coffey High-Performance Computing Northern Arizona University 928-523-1167

Re: [slurm-users] Extreme long db upgrade 16.05.6 -> 17.11.3

2018-02-22 Thread Paul Edmon
Typically the long db upgrades are only for major version upgrades.  Most of the time minor versions don't take nearly as long. At least with our upgrade from 17.02.9 to 17.11.3 the upgrade only took 1.5 hours with 6 months worth of jobs (about 10 million jobs).  We don't track energy usage

Re: [slurm-users] Extreme long db upgrade 16.05.6 -> 17.11.3

2018-02-22 Thread Jessica Nettelblad
We experienced the same problem. On our two new clusters with smaller databases (<1 million jobs), the upgrade from 17.02.9 to 17.11.2 and 17.11.3 was quick and smooth. On the third, older cluster, where we have a larger database (>30 million jobs) the upgrade was a mess, both in mysql and

[slurm-users] Extreme long db upgrade 16.05.6 -> 17.11.3

2018-02-21 Thread Christopher Benjamin Coffey
Hello, We have been trying to upgrade slurm on our cluster from 16.05.6 to 17.11.3. I'm thinking this should be doable? Past upgrades have been a breeze, and I believe during the last one, the db upgrade took like 25 minutes. Well now, the db upgrade process is taking far too long. We