After 9 months of development we are pleased to announce the
availability of Slurm version 17.02.0.
A brief description of what is contained in this release and other notes
about it is contained below. For a fuller description please consult
the RELEASE_NOTES file available in the source.
Thanks to all involved!
Slurm downloads are available from https://schedmd.com/downloads.php.
RELEASE NOTES FOR SLURM VERSION 17.02
23 February 2017
IMPORTANT NOTES:
THE MAXJOBID IS NOW 67,108,863. ANY PRE-EXISTING JOBS WILL CONTINUE TO
RUN BUT
NEW JOB IDS WILL BE WITHIN THE NEW MAXJOBID RANGE. Adjust your configured
MaxJobID value as needed to eliminate any confusion.
If using the slurmdbd (Slurm DataBase Daemon) you must update this first.
The 17.02 slurmdbd will work with Slurm daemons of version 15.08 and above.
You will not need to update all clusters at the same time, but it is very
important to update slurmdbd first and having it running before updating
any other clusters making use of it. No real harm will come from updating
your systems before the slurmdbd, but they will not talk to each other
until you do. Also at least the first time running the slurmdbd you need to
make sure your my.cnf file has innodb_buffer_pool_size equal to at least
64M.
You can accomplish this by adding the line
innodb_buffer_pool_size=64M
under the [mysqld] reference in the my.cnf file and restarting the
mysqld. The
buffer pool size must be smaller than the size of the MySQL tmpdir. This is
needed when converting large tables over to the new database schema.
Slurm can be upgraded from version 15.08 or 16.05 to version 17.02
without loss
of jobs or other state information. Upgrading directly from an earlier
version
of Slurm will result in loss of state information.
If using SPANK plugins that use the Slurm APIs, they should be
recompiled when
upgrading Slurm to a new major release.
NOTE: systemd services files are installed automatically, but not enabled.
You will need to manually enable them on the appropriate systems:
- Controller: systemctl enable slurmctld
- Database: systemctl enable slurmdbd
- Compute Nodes: systemctl enable slurmd
NOTE: If you are not using Munge, but are using the "service" scripts to
start Slurm daemons, then you will need to remove this check from the
etc/slurm*service scripts.
NOTE: If you are upgrading with any jobs from 14.03 or earlier
(i.e. quick upgrade from 14.03 -> 15.08 -> 17.02) you will need
to wait until after those jobs are gone before you upgrade to 17.02.
HIGHLIGHTS
==========
-- Added infrastructure for managing workload across a federation of
clusters.
(partial functionality in version 17.02, fully operational in May 2017)
-- In order to support federated jobs, the MaxJobID configuration
parameter
default value has been reduced from 2,147,418,112 to 67,043,328 and its
maximum value is now 67,108,863. Upon upgrading, any pre-existing
jobs that
have a job ID above the new range will continue to run and new jobs
will get
job IDs in the new range.
-- Added "MailDomain" configuration parameter to qualify email addresses.
-- Automatically clean up task/cgroup cpuset and devices cgroups after
steps
are completed.
-- Added burst buffer support for job arrays. Added new
SchedulerParameters
configuration parameter of bb_array_stage_cnt=# to indicate how
many pending
tasks of a job array should be made available for burst buffer resource
allocation.
-- Added new sacctmgr commands: "shutdown" (shutdown the server),
"list stats"
(get server statistics) "clear stats" (clear server statistics).
-- The database index for jobs is now 64 bits. If you happen to be
close to
4 billion jobs in your database you will want to update your
slurmctld at
the same time as your slurmdbd to prevent roll over of this variable as
it is 32 bit previous versions of Slurm.
-- All memory values (in MB) are now 64 bit. Previously, nodes with
more than
of memory would not schedule or enforce memory limits correctly.
-- Removed AIX, BlueGene/L and BlueGene/P support.
-- Removed sched/wiki and sched/wiki2 plugins and associated code.
-- Added PrologFlags=Serial to disable concurrent execution of
prolog/epilog
scripts.