Just an info for people who consider updating: After the update, we experienced that the slurmctld hang from time to time (we had to restart it). The latest version from the git branch resolved this problem. I guess it was fixed by this commit:
https://github.com/SchedMD/slurm/commit/0c5e35089c9ea775d0ff66fbbbc774cba5009468 We use "scontrol show config" regularly to check whether primary and backup daemon are online. 2017-03-01 6:14 GMT+01:00 Nicholas C Santucci <[email protected]>: > which also means starting in 17.02 > > in "RPMs INSTALLED" section of > https://slurm.schedmd.com/quickstart_admin.html should be revised as follows > > slurm-sjobexit > slurm-sjstat > > should be changed to > > slurm-sjobexit (only prior to 17.02) > slurm-sjstat (only prior to 17.02) > > On Mon, Feb 27, 2017 at 9:56 AM, <[email protected]> wrote: >> >> Thanks for the patch. Committed here: >> >> https://github.com/SchedMD/slurm/commit/95cf960afcdb77cae644b7d0709ede123896626d >> >> ----- Message from Daniel Letai <[email protected]> --------- >> Date: Mon, 27 Feb 2017 05:07:08 -0800 >> From: Daniel Letai <[email protected]> >> Reply-To: slurm-dev <[email protected]> >> Subject: [slurm-dev] Re: Slurm version 17.02.0 is now available >> To: slurm-dev <[email protected]> >> >> $ git diff >> diff --git a/slurm.spec b/slurm.spec >> index 941b360..6bb3014 100644 >> --- a/slurm.spec >> +++ b/slurm.spec >> @@ -346,6 +346,7 @@ Includes the Slurm proctrack/lua and job_submit/lua >> plugin >> Summary: Perl tool to print Slurm job state information >> Group: Development/System >> Requires: slurm >> +Obsoletes: slurm-sjobexit slurm-sjstat slurm-seff >> %description contribs >> seff is a mail program used directly by the Slurm daemons. On completion >> of a >> job, wait for it's accounting information to be available and include >> that >> >> >> On 02/27/2017 01:35 PM, dani wrote: >> >> Seems like no obsoletes was set on slurm-contribs, so yum complains of >> conflicts with slurm-sjobs and friends. >> >> >> On 24/02//2017 01:41, Danny Auble wrote: >> >> >> After 9 months of development we are pleased to announce the availability >> of Slurm version 17.02.0. >> >> A brief description of what is contained in this release and other notes >> about it is contained below. For a fuller description please consult the >> RELEASE_NOTES file available in the source. >> >> Thanks to all involved! >> >> Slurm downloads are available from https://schedmd.com/downloads.php. >> >> RELEASE NOTES FOR SLURM VERSION 17.02 >> 23 February 2017 >> >> IMPORTANT NOTES: >> THE MAXJOBID IS NOW 67,108,863. ANY PRE-EXISTING JOBS WILL CONTINUE TO RUN >> BUT >> NEW JOB IDS WILL BE WITHIN THE NEW MAXJOBID RANGE. Adjust your configured >> MaxJobID value as needed to eliminate any confusion. >> >> If using the slurmdbd (Slurm DataBase Daemon) you must update this first. >> The 17.02 slurmdbd will work with Slurm daemons of version 15.08 and >> above. >> You will not need to update all clusters at the same time, but it is very >> important to update slurmdbd first and having it running before updating >> any other clusters making use of it. No real harm will come from updating >> your systems before the slurmdbd, but they will not talk to each other >> until you do. Also at least the first time running the slurmdbd you need >> to >> make sure your my.cnf file has innodb_buffer_pool_size equal to at least >> 64M. >> You can accomplish this by adding the line >> >> innodb_buffer_pool_size=64M >> >> under the [mysqld] reference in the my.cnf file and restarting the mysqld. >> The >> buffer pool size must be smaller than the size of the MySQL tmpdir. This >> is >> needed when converting large tables over to the new database schema. >> >> Slurm can be upgraded from version 15.08 or 16.05 to version 17.02 without >> loss >> of jobs or other state information. Upgrading directly from an earlier >> version >> of Slurm will result in loss of state information. >> >> If using SPANK plugins that use the Slurm APIs, they should be recompiled >> when >> upgrading Slurm to a new major release. >> >> NOTE: systemd services files are installed automatically, but not enabled. >> You will need to manually enable them on the appropriate systems: >> - Controller: systemctl enable slurmctld >> - Database: systemctl enable slurmdbd >> - Compute Nodes: systemctl enable slurmd >> >> NOTE: If you are not using Munge, but are using the "service" scripts to >> start Slurm daemons, then you will need to remove this check from >> the >> etc/slurm*service scripts. >> >> NOTE: If you are upgrading with any jobs from 14.03 or earlier >> (i.e. quick upgrade from 14.03 -> 15.08 -> 17.02) you will need >> to wait until after those jobs are gone before you upgrade to 17.02. >> >> HIGHLIGHTS >> ========== >> -- Added infrastructure for managing workload across a federation of >> clusters. >> (partial functionality in version 17.02, fully operational in May >> 2017) >> -- In order to support federated jobs, the MaxJobID configuration >> parameter >> default value has been reduced from 2,147,418,112 to 67,043,328 and >> its >> maximum value is now 67,108,863. Upon upgrading, any pre-existing jobs >> that >> have a job ID above the new range will continue to run and new jobs >> will get >> job IDs in the new range. >> -- Added "MailDomain" configuration parameter to qualify email addresses. >> -- Automatically clean up task/cgroup cpuset and devices cgroups after >> steps >> are completed. >> -- Added burst buffer support for job arrays. Added new >> SchedulerParameters >> configuration parameter of bb_array_stage_cnt=# to indicate how many >> pending >> tasks of a job array should be made available for burst buffer >> resource >> allocation. >> -- Added new sacctmgr commands: "shutdown" (shutdown the server), "list >> stats" >> (get server statistics) "clear stats" (clear server statistics). >> -- The database index for jobs is now 64 bits. If you happen to be close >> to >> 4 billion jobs in your database you will want to update your slurmctld >> at >> the same time as your slurmdbd to prevent roll over of this variable >> as >> it is 32 bit previous versions of Slurm. >> -- All memory values (in MB) are now 64 bit. Previously, nodes with more >> than >> of memory would not schedule or enforce memory limits correctly. >> -- Removed AIX, BlueGene/L and BlueGene/P support. >> -- Removed sched/wiki and sched/wiki2 plugins and associated code. >> -- Added PrologFlags=Serial to disable concurrent execution of >> prolog/epilog >> scripts. >> >> >> >> >> >> >> >> ----- End message from Daniel Letai <[email protected]> ----
