SLURM versions 2.2.3 and 2.3.0-pre3 are now available from SourceForge:
http://sourceforge.net/projects/slurm/
The changes made since version 2.2.1 was released in January are noted
below. Version 2.3.0-pre3 is for development purposes and includes all
changes in SLURM versions 2.2.2 and 2.2.3 plus the enhancements noted
below. Upgrades from version 2.3.0-pre2 to version 2.3.0-pre3 will result
in all jobs being purged due to the version 2.3 data structures changing.
(This is a temporary problem. We expect no lost jobs when upgrading
from v2.1 or v2.2 to v2.3).
md5sum:
d46e38fbe8368298d8399fe29b6f0342 slurm-2.2.2.tar.bz2
1ee747c02f119294380a844a7a6a1d35 slurm-2.2.3.tar.bz2
b9ff3d736abc4f863a188a809fa41951 slurm-2.3.0-0.pre3.tar.bz2
* Changes in SLURM 2.2.3
========================
-- Update srun, salloc, and sbatch man page description of --distribution
option. Patches from Rod Schulz, Bull.
-- Applied patch from Martin Perry to fix "Incorrect results for task/affinity
block second distribution and cpus-per-task > 1" bug.
-- Avoid setting a job's eligible time while held (priority == 0).
-- Substantial performance improvement to backfill scheduling. Patch from
Bjorn-Helge Mevik, University of Oslo.
-- Make timeout for communications to the slurmctld be based upon the
MessageTimeout configuration parameter rather than always 3 seconds.
Patch from Matthieu Hautreux, CEA.
-- Add new scontrol option of "show aliases" to report every NodeName that is
associated with a given NodeHostName when running multiple slurmd daemons
per compute node (typically used for testing purposes). Patch from
Matthieu Hautreux, CEA.
-- Fix for handling job names with a "'" in the name within MySQL accounting.
Patch from Gerrit Renker, CSCS.
-- Modify condition under which salloc execution delayed until moved to the
foreground. Patch from Gerrit Renker, CSCS.
Job control for interactive salloc sessions: only if ...
a) input is from a terminal (stdin has valid termios attributes),
b) controlling terminal exists (non-negative tpgid),
c) salloc is not run in allocation-only (--no-shell) mode,
d) salloc runs in its own process group (true in interactive
shells that support job control),
e) salloc has been configured at compile-time to support background
execution and is not currently in the background process group.
-- Abort salloc if no controlling terminal and --no-shell option is not used
("setsid salloc ..." is disabled). Patch from Gerrit Renker, CSCS.
-- Fix to gang scheduling logic which could cause jobs to not be suspended
or resumed when appropriate.
-- Applied patch from Martin Perry to fix "Slurmd abort when using task
affinity with plane distribution" bug.
-- Applied patch from Yiannis Georgiou to fix "Problem with cpu binding to
sockets option" behavior. This change causes "--cpu_bind=sockets" to bind
tasks only to the CPUs on each socket allocated to the job rather than all
CPUs on each socket.
-- Advance daily or weekly reservations immediately after termination to avoid
having a job start that runs into the reservation when later advanced.
-- Fix for enabling users to change there own default account, wckey, or QOS.
-- BLUEGENE - If using OVERLAP mode fixed issue with multiple overlapping
blocks in error mode.
-- Fix for sacctmgr to display correctly default accounts.
-- scancel -s SIGKILL will always sent the RPC to the slurmctld rather than
the slurmd daemon(s). This insures that tasks in the process of getting
spawned are killed.
* Changes in SLURM 2.2.2
========================
-- Correct logic to set correct job hold state (admin or user) when setting
the job's priority using scontrol's "update jobid=..." rather than its
"hold" or "holdu" commands.
-- Modify squeue to report unset --mincores, --minthreads or --extra-node-info
values as "*" rather than 65534. Patch from Rod Schulz, BULL.
-- Report the StartTime of a job as "Unknown" rather than the year 2106 if its
expected start time was too far in the future for the backfill scheduler
to compute.
-- Prevent a pending job reason field from inappropriately being set to
"Priority".
-- In sched/backfill with jobs having QOS_FLAG_NO_RESERVE set, then don't
consider the job's time limit when attempting to backfill schedule. The job
will just be preempted as needed at any time.
-- Eliminated a bug in sbatch when no valid target clusters are specified.
-- When explicitly sending a signal to a job with the scancel command and that
job is in a pending state, then send the request directly to the slurmctld
daemon and do not attempt to send the request to slurmd daemons, which are
not running the job anyway.
-- In slurmctld, properly set the up_node_bitmap when setting it's state to
IDLE (in case the previous node state was DOWN).
-- Fix smap to process block midplane names correctly when on a bluegene
system.
-- Fix smap to once again print out the Letter 'ID' for each line of a block/
partition view.
-- Corrected the NOTES section of the scancel man page
-- Fix for accounting_storage/mysql plugin to correctly query cluster based
transactions.
-- Fix issue when updating database for clusters that were previously deleted
before upgrade to 2.2 database.
-- BLUEGENE - Handle mesh torus check better in dynamic mode.
-- BLUEGENE - Fixed race condition when freeing block, most likely only would
happen in emulation.
-- Fix for calculating used QOS limits correctly on a slurmctld reconfig.
-- BLUEGENE - Fix for bad conn-type set when running small blocks in HTC mode.
-- If salloc's --no-shell option is used, then do not attempt to preserve the
terminal's state.
-- Add new SLURM configure time parameter of --disable-salloc-background. If
set, then salloc can only execute in the foreground. If started in the
background, then a message will be printed and the job allocation halted
until brought into the foreground.
NOTE: THIS IS A CHANGE IN DEFAULT SALLOC BEHAVIOR FROM V2.2.1, BUT IS
CONSISTENT WITH V2.1 AND EARLIER.
-- Added the Multi-Cluster Operation web page.
-- Removed remnant code for enforcing max sockets/cores/threads in the
cons_res plugin (see last item in 2.1.0-pre5). This was responsible
for a bug reported by Rod Schultz.
-- BLUEGENE - Set correct env vars for HTC mode on a P system to get correct
block.
-- Correct RunTime reported by "scontrol show job" for pending jobs.
* Changes in SLURM 2.3.0.pre3
=============================
-- BGQ - Appears to work correctly in emulation mode, no sub blocks just yet.
-- Minor typos fixed
-- Various bug fixes for Cray systems.
-- Fix bug that when setting a compute node to idle state, it was failing to
set the systems up_node_bitmap.
-- BLUEGENE - code reorder
-- BLUEGENE - Now only one select plugin for all Bluegene systems.
-- Modify srun to set the SLURM_JOB_NAME environment variable when srun is
used to create a new job allocation. Not set when srun is used to create a
job step within an existing job allocation.
-- Modify init.d/slurm script to start multiple slurmd daemons per compute
node if so configured. Patch from Matthieu Hautreux, CEA.
-- Change license data structure counters from uint16_t to uint32_t to support
larger license counts.