SLURM versions 2.3.2 and 2.4.0-pre2 are now available from
http://www.schedmd.com/#repos
Version 2.3.2 includes bugs found and fixed in the past six weeks as
shown below. Version 2.4.0-pre2 includes new development for version
2.4 with a release scheduled in the second quarter of 2012.
* Changes in SLURM 2.3.2
========================
-- Add configure option of "--without-rpath" which builds SLURM tools without
the rpath option, which will work if Munge and BlueGene libraries are in
the default library search path and make system updates easier.
-- Fixed issue where if a job ended with ESLURMD_UID_NOT_FOUND and
ESLURMD_GID_NOT_FOUND where slurm would be a little over zealous
in treating missing a GID or UID as a fatal error.
-- Backfill scheduling - Add SchedulerParameters configuration parameter of
"bf_res" to control the resolution in the backfill scheduler's data about
when jobs begin and end. Default value is 60 seconds (used to be
1 second).
-- Cray - Remove the "family" specification from the GPU reservation request.
-- Updated set_oomadj.c, replacing deprecated oom_adj reference with
oom_score_adj
-- Fix resource allocation bug, generic resources allocation was ignoring the
job's ntasks_per_node and cpus_per_task parameters. Patch from Carles
Fenoy, BSC.
-- Avoid orphan job step if slurmctld is down when a job step completes.
-- Fix Lua link order, patch from Pär Andersson, NSC.
-- Set SLURM_CPUS_PER_TASK=1 when user specifies --cpus-per-task=1.
-- Fix for fatal error managing GRES. Patch by Carles Fenoy, BSC.
-- Fixed race condition when using the DBD in accounting where if a job
wasn't started at the time the eligible message was sent but started
before the db_index was returned information like start time
would be lost.
-- Fix issue in accounting where normalized shares could be updated
incorrectly when getting fairshare from the parent.
-- Fixed if not enforcing associations but want QOS support for a default
qos on the cluster to fill that in correctly.
-- Fix in select/cons_res for "fatal: cons_res: sync loop not progressing"
with some configurations and job option combinations.
* Changes in SLURM 2.4.0.pre2
=============================
-- CRAY - Add support for GPU memory allocation using SLURM GRES (Generic
RESource) support. Work by Steve Trofinoff, CSCS.
-- Add support for job allocations with multiple job constraint counts. For
example: salloc -C "[rack1*2&rack2*4]" ... will allocate the job 2 nodes
from rack1 and 4 nodes from rack2. Support for only a single constraint
name been added to job step support.
-- BGQ - Remove old method for marking cnodes down.
-- BGQ - Remove BGP images from view in sview.
-- BGQ - print out failed cnodes in scontrol show nodes.
-- BGQ - Add srun option of "--runjob-opts" to pass options to the runjob
command.
-- FRONTEND - handle step launch failure better.
-- BGQ - Added a mutex to protect the now changing ba_system pointers.
-- BGQ - added new functionality for sub-block allocations - no preemption
for this yet though.
-- Add --name option to squeue to filter output by job name. Patch from Yuri
D'Elia.
-- BGQ - Added linking to runjob client libary which gives support
to totalview
to use srun instead of runjob.
-- Add numeric range checks to scontrol update options. Patch from Phil
Eckert, LLNL.
-- Add ReconfigFlags configuration option to control actions of "scontrol
reconfig". Patch from Don Albert, Bull.
-- BGQ - handle reboots with multiple jobs running on a block.