Re: [slurm-users] SLURM upgrade from 20.11.3 to 20.11.9 misidentification of job steps

2022-05-19 Thread John DeSantis
DeSantis Sent: 18 May 2022 15:39 To: slurm-users@lists.schedmd.com Subject: Re: [slurm-users] SLURM upgrade from 20.11.3 to 20.11.9 misidentification of job steps Hello, It also appears that random jobs are being identified as using too much memory, despite being well within limits. For example

Re: [slurm-users] SLURM upgrade from 20.11.3 to 20.11.9 misidentification of job steps

2022-05-19 Thread Luke Sudbery
. -Original Message- From: slurm-users On Behalf Of John DeSantis Sent: 18 May 2022 15:39 To: slurm-users@lists.schedmd.com Subject: Re: [slurm-users] SLURM upgrade from 20.11.3 to 20.11.9 misidentification of job steps Hello, It also appears that random jobs are being identified as using too much

Re: [slurm-users] SLURM upgrade from 20.11.3 to 20.11.9 misidentification of job steps

2022-05-18 Thread John DeSantis
Hello, It also appears that random jobs are being identified as using too much memory, despite being well within limits. For example, a job is running that requested 2048 MB per CPU and all processes are within the limit. But, the job is identified as being over limit when it isn't. Please

[slurm-users] SLURM upgrade from 20.11.3 to 20.11.9 misidentification of job steps

2022-05-18 Thread John DeSantis
Hello, Due to the recent CVE posted by Tim, we did upgrade from SLURM 20.11.3 to 20.11.9. Today, I received a ticket from a user with their output files populated with the "slurmstepd: error: Exceeded job memory limit" message. But, the jobs are still running and it seems that the

[slurm-users] Slurm upgrade to 20.11.3, slurmdbd still trying to start old version 20.02.3

2021-03-03 Thread Robert Kudyba
Slurmdbd has an issue and from the logs is still trying to load the old version: [2021-01-22T14:17:18.430] MySQL server version is: 5.5.68-MariaDB [2021-01-22T14:17:18.433] error: Database settings not recommended values: innodb_buffer_pool_size innodb_log_file_size innodb_lock_wait_timeout

Re: [slurm-users] Slurm Upgrade Philosophy?

2020-12-24 Thread Chris Samuel
On 24/12/20 6:24 am, Paul Edmon wrote: We then have a test cluster that we install the release on a run a few test jobs to make sure things are working, usually MPI jobs as they tend to hit most of the features of the scheduler. One thing I meant to mention last night was that we use Reframe

Re: [slurm-users] Slurm Upgrade Philosophy?

2020-12-24 Thread Paul Edmon
We are the same way, though we tend to keep pace with minor releases.  We typically wait until the .1 release of a new major release before considering upgrade so that many of the bugs are worked out.  We then have a test cluster that we install the release on a run a few test jobs to make

Re: [slurm-users] Slurm Upgrade Philosophy?

2020-12-23 Thread Chris Samuel
On Friday, 18 December 2020 10:10:19 AM PST Jason Simms wrote: > Thanks to several helpful members on this list, I think I have a much better > handle on how to upgrade Slurm. Now my question is, do most of you upgrade > with each major release? We do, though not immediately and not without a

Re: [slurm-users] Slurm Upgrade Philosophy?

2020-12-18 Thread Alex Chekholko
Hi Jason, Ultimately each site decides how/why to do it; in my case I tend to do big "forklift upgrades", so I'm running 18.08 on the current cluster and will go to latest SLURM for my next cluster build. But you may have good reasons to upgrade slurm more often on your existing cluster. I

[slurm-users] Slurm Upgrade Philosophy?

2020-12-18 Thread Jason Simms
Hello all, Thanks to several helpful members on this list, I think I have a much better handle on how to upgrade Slurm. Now my question is, do most of you upgrade with each major release? I recognize that, normally, if something is working well, then don't upgrade it! In our case, we're running

Re: [slurm-users] Slurm Upgrade

2020-11-04 Thread Ole Holm Nielsen
On 11/5/20 7:14 AM, navin srivastava wrote: Thank you all for the response. but my question here is I have already built a new server slurm 20.2 with the latest DB. my question is,  shall i do a mysqldump into this server from existing server running with version slurm version 17.11.8 and

Re: [slurm-users] Slurm Upgrade

2020-11-04 Thread Christopher Samuel
Hi Navin, On 11/4/20 10:14 pm, navin srivastava wrote: I have already built a new server slurm 20.2 with the latest DB. my question is,  shall i do a mysqldump into this server from existing server running with version slurm version 17.11.8 This won't work - you must upgrade your 17.11

Re: [slurm-users] Slurm Upgrade

2020-11-04 Thread navin srivastava
Thank you all for the response. but my question here is I have already built a new server slurm 20.2 with the latest DB. my question is, shall i do a mysqldump into this server from existing server running with version slurm version 17.11.8 and then i will upgrade all client with 20.x followed

Re: [slurm-users] Slurm Upgrade

2020-11-02 Thread Ole Holm Nielsen
On 11/2/20 2:25 PM, navin srivastava wrote: Currently we are running slurm version 17.11.x and wanted to move to 20.x. We are building the New server with Slurm 20.2 version and planning to upgrade the client nodes from 17.x to 20.x. wanted to check if we can upgrade the Client from 17.x to

Re: [slurm-users] Slurm Upgrade

2020-11-02 Thread Paul Edmon
We have hit this when we naively ran using the service and it timed out and borked the database.  Fortunately we had a backup to go back to.  Since then we have run it straight from the command line.  Like yours our production DB is now 23 GB for 6 months worth of data so major schema updates

Re: [slurm-users] Slurm Upgrade

2020-11-02 Thread Chris Samuel
On 11/2/20 7:31 am, Paul Edmon wrote: e. Run slurmdbd -Dv to do the database upgrade. Depending on the upgrade this can take a while because of database schema changes. I'd like to emphasis the importance of doing the DB upgrade in this way, do not use systemctl for this as if systemd

Re: [slurm-users] Slurm Upgrade

2020-11-02 Thread Paul Edmon
We haven't really had MPI ugliness with the latest versions. Plus we've been rolling our own PMIx and building against that which seems to have solved most of the cross compatibility issues. -Paul Edmon- On 11/2/2020 10:38 AM, Fulcomer, Samuel wrote: Our strategy is a bit simpler. We're

Re: [slurm-users] Slurm Upgrade

2020-11-02 Thread Fulcomer, Samuel
Our strategy is a bit simpler. We're migrating compute nodes to a new cluster running 20.x. This isn't an upgrade. We'll keep the old slurmdbd running for at least enough time to suck the remaining accounting data into XDMoD. The old cluster will keep running jobs until there are no more to run.

Re: [slurm-users] Slurm Upgrade

2020-11-02 Thread Paul Edmon
We don't follow the recommended procedure here but rather build RPMs and upgrade using those.  We haven't and any issues.  Here is our procedure: 1. Build rpms from source using a version of the slurm.spec file that we maintain. It's the version SchedMD provides but modified with some

Re: [slurm-users] Slurm Upgrade

2020-11-02 Thread Paul Edmon
We don't follow the recommended procedure here but rather build RPMs and upgrade using those.  We haven't and any issues.  Here is our procedure: 1. Build rpms from source using a version of the slurm.spec file that we maintain. It's the version SchedMD provides but modified with some

Re: [slurm-users] Slurm Upgrade

2020-11-02 Thread Paul Edmon
In general  I would follow this: https://slurm.schedmd.com/quickstart_admin.html#upgrade Namely: Almost every new major release of Slurm (e.g. 19.05.x to 20.02.x) involves changes to the state files with new data structures, new options, etc. Slurm permits upgrades to a new major release

Re: [slurm-users] Slurm Upgrade

2020-11-02 Thread Fulcomer, Samuel
We're doing something similar. We're continuing to run production on 17.x and have set up a new server/cluster running 20.x for testing and MPI app rebuilds. Our plan had been to add recently purchased nodes to the new cluster, and at some point turn off submission on the old cluster and switch

Re: [slurm-users] Slurm Upgrade

2020-11-02 Thread Christopher J Cawley
​ From: slurm-users on behalf of Christopher J Cawley Sent: Monday, November 2, 2020 8:33 AM To: Slurm User Community List Subject: Re: [slurm-users] Slurm Upgrade I do not think so. In any case, make sure that you stop services and make a backup of the database

Re: [slurm-users] Slurm Upgrade

2020-11-02 Thread Christopher J Cawley
...@gmu.edu ​ From: slurm-users on behalf of navin srivastava Sent: Monday, November 2, 2020 8:25 AM To: Slurm User Community List Subject: [slurm-users] Slurm Upgrade Dear All, Currently we are running slurm version 17.11.x and wanted to move to 20.x. We

[slurm-users] Slurm Upgrade

2020-11-02 Thread navin srivastava
Dear All, Currently we are running slurm version 17.11.x and wanted to move to 20.x. We are building the New server with Slurm 20.2 version and planning to upgrade the client nodes from 17.x to 20.x. wanted to check if we can upgrade the Client from 17.x to 20.x directly or we need to go

Re: [slurm-users] Slurm Upgrade from 17.02

2020-02-20 Thread Steven Senator (slurm-dev-list)
When upgrading to 18.08 it is prudent to add following lines into your /etc/my.cnf as per https://slurm.schedmd.com/accounting.html https://slurm.schedmd.com/SLUG19/High_Throughput_Computing.pdf (slide #6) [mysqld] innodb_buffer_pool_size=1G innodb_log_file_size=64M

Re: [slurm-users] Slurm Upgrade from 17.02

2020-02-20 Thread Ricardo Gregorio
done. Regards, Ricardo Gregorio -Original Message- From: slurm-users On Behalf Of Ole Holm Nielsen Sent: 19 February 2020 14:41 To: slurm-users@lists.schedmd.com Subject: Re: [slurm-users] Slurm Upgrade from 17.02 On 2/19/20 3:10 PM, Ricardo Gregorio wrote: > I am putting toget

Re: [slurm-users] Slurm Upgrade from 17.02

2020-02-19 Thread Marcus Wagner
Hi Ricardo, If I remember right, you can only upgrade two versions further. So you WILL have to upgrade to 18.08, even if you want to use 19.05 or the coming 20.02 17.02 -> 17.11 -> 18.08 -> 19.05 -> 20.02 ^  ^ |  | |- you are here    |- "farthest jump" to a

Re: [slurm-users] Slurm Upgrade from 17.02

2020-02-19 Thread Chris Samuel
On 19/2/20 6:10 am, Ricardo Gregorio wrote: I am putting together an upgrade plan for slurm on our HPC. We are currently running old version 17.02.11. Would you guys advise us upgrading to 18.08 or 19.05? Slurm versions only support upgrading from 2 major versions back, so you could only

Re: [slurm-users] Slurm Upgrade from 17.02

2020-02-19 Thread Ole Holm Nielsen
On 2/19/20 3:10 PM, Ricardo Gregorio wrote: I am putting together an upgrade plan for slurm on our HPC. We are currently running old version 17.02.11. Would you guys advise us upgrading to 18.08 or 19.05? You should be able to upgrade 2 Slurm major versions in one step. The 18.08 version is

[slurm-users] Slurm Upgrade from 17.02

2020-02-19 Thread Ricardo Gregorio
hi all, I am putting together an upgrade plan for slurm on our HPC. We are currently running old version 17.02.11. Would you guys advise us upgrading to 18.08 or 19.05? I understand we will have to also upgrade the version of mariadb from 5.5 to 10.X and pay attention to 'long db upgrade from

[slurm-users] slurm upgrade from slurm-17.11.12-1 to slurm-19.05.2-1

2019-09-23 Thread Costin
Hi all, After the upgrade of our cluster from slurm 17.11.12 to 19.05.2 we started noticing that jobs above ~ 10 nodes start failing with: [2019-09-23T16:51:34.310] debug: Checking credential with 640 bytes of sig data [2019-09-23T16:51:34.311] error: Credential signature check: Credential data