Slurm version 15.08.9 is now available and includes about 40 bug fixes
developed over the past six weeks as listed below.
Slurm version 16.05.0-pre2 is also available and includes new
development for the next major release in May.
Slurm downloads are available from:
http://www.schedmd.com/#repos
* Changes in Slurm 15.08.9
==========================
-- BurstBuffer/cray - Defer job cancellation or time limit while
"pre-run"
operation in progress to avoid inconsistent state due to multiple
calls
to job termination functions.
-- Fix issue with resizing jobs and limits not be kept track of
correctly.
-- BGQ - Remove redeclaration of job_read_lock.
-- BGQ - Tighter locks around structures when nodes/cables change
state.
-- Make it possible to change CPUsPerTask with scontrol.
-- Make it so scontrol update part qos= will take away a partition QOS
from
a partition.
-- Fix issue where SocketsPerBoard didn't translate to Sockets when
CPUS=
was also given.
-- Add note to slurm.conf man page about setting "--cpu_bind=no" as
part
of SallocDefaultCommand if a TaskPlugin is in use.
-- Set correct reason when a QOS' MaxTresMins is violated.
-- Insure that a job is completely launched before trying to suspend
it.
-- Remove historical presentations and design notes. Only distribute
maintained doc/html and doc/man directories.
-- Remove duplicate xmalloc() in task/cgroup plugin.
-- Backfill scheduler to validate correct job partition for job
submitted to
multiple partitions.
-- Force close on exec on first 256 file descriptors when launching a
slurmstepd to close potential open ones.
-- Step GRES value changed from type "int" to "int64_t" to support
larger
values.
-- Fix getting reservations to database when database is down.
-- Fix issue with sbcast not doing a correct fanout.
-- Fix issue where steps weren't always getting the gres/tres involved.
-- Fixed double read lock on getting job's gres/tres.
-- Fix display for RoutePlugin parameter to display the correct value.
-- Fix route/topology plugin to prevent segfault in sbcast when in use.
-- Fix Cray slurmconfgen_smw.py script to use nid as nid, not nic.
-- Fix Cray NHC spawning on job requeue. Previous logic would leave
nodes
allocated to a requeued job as non-usable on job termination.
-- burst_buffer/cray plugin: Prevent a requeued job from being
restarted while
file stage-out is still in progress. Previous logic could restart
the job
and not perform a new stage-in.
-- Fix job array formatting to allow return [0-100:2] display for
arrays with
step functions rather than [0,2,4,6,8,...] .
-- FreeBSD - replace Linux-specific set_oom_adj to avoid errors in
slurmd log.
-- Add option for TopologyParam=NoInAddrAnyCtld to make the slurmctld
listen
on only one port like TopologyParam=NoInAddrAny does for everything
else.
-- Fix burst buffer plugin to prevent corruption of the CPU TRES data
when bb
is not set as an AccountingStorageTRES type.
-- Surpress error messages in acct_gather_energy/ipmi plugin after
repeated
failures.
-- Change burst buffer use completion email message from
"SLURM Job_id=1360353 Name=tmp Staged Out, StageOut time 00:01:47"
to
"SLURM Job_id=1360353 Name=tmp StageOut/Teardown time 00:01:47"
-- Generate burst buffer use completion email immediately afer teardown
completes rather than at job purge time (likely minutes later).
-- Fix issue when adding a new TRES to AccountingStorageTRES for the
first
time.
-- Update gang scheduling tables when job manually suspended or
resumed. Prior
logic could mess up job suspend/resume sequencing.
-- Update gang scheduling data structures when job changes in size.
-- Associations - prevent hash table corruption if uid initially unset
for
a user, which can cause slurmctld to crash if that user is deleted.
-- Avoid possibly aborting srun on SIGSTOP while creating the job step
due to
threading bug.
-- Fix deadlock issue with burst_buffer/cray when a newly created burst
buffer is found.
-- burst_buffer/cray: Set environment variables just before starting
job rather
than at job submission time to reflect persistent buffers created or
modified while the job is pending.
-- Fix check of per-user qos limits on the initial run by a user.
-- Fix gang scheduling resource selection bug which could prevent
multiple jobs
from being allocated the same resources. Bug was introduced in
15.08.6.
-- Don't print the Rgt value of an association from the cache as it
isn't
kept up to date.
-- burst_buffer/cray - If the pre-run operation fails then don't issue
duplicate job cancel/requeue unless the job is still in run state.
Prevents
jobs hung in COMPLETING state.
-- task/cgroup - Fix bug in task binding to CPUs.
* Changes in Slurm 16.05.0pre2
==============================
-- Split partition's "Priority" field into "PriorityTier" (used to
order
partitions for scheduling and preemption) plus "PriorityJobFactor"
(used by
priority/multifactor plugin in calculating job priority, which is
used to
order jobs within a partition for scheduling).
-- Revert call to getaddrinfo, restoring gethostbyaddr (introduced in
Slurm
16.05.0pre1) which was failing on some systems.
-- knl_cray.conf - Added AllowMCDRAM, AllowNUMA and ALlowUserBoot
configuration options.
-- Add node_features_p_user_update() function to node_features plugin.
-- Don't print Weight=1 lines in 'scontrol write config' (its the
default).
-- Remove PARAMS macro from slurm.h.
-- Remove BEGIN_C_DECLS and END_C_DECLS macros.
-- Check that PowerSave mode configured for node_features/knl_cray
plugin.
It is required to reconfigure and reboot nodes.
-- Update documentation to reflect new cgroup default location change
from
/cgroup to /sys/fs/cgroup.
-- If NodeHealthCheckProgram configured HealthCheckInterval is
non-zero, then
modify slurmd to run it before registering with slurmctld.
-- Fix for tasks being packed onto cores when the requested
--cpus-per-task is
greater than the number of threads on a core and --ntasks-per-core
is 1.
-- Make it so jobs/steps track ':' named gres/tres, before hand
gres/gpu:tesla
would only track gres/gpu, now it will track both gres/gpu and
gres/gpu:tesla as separate gres if configured like
AccountingStorageTRES=gres/gpu,gres/gpu:tesla
-- Added new job dependency type of "aftercorr" which will start a task
of a
job array after the corresponding task of another job array
completes.
-- Increase default MaxTasksPerNode configuration parameter from 128 to
512.
-- Enable sbcast data compression logic (compress option previously
ignored).
-- Add --compress option to srun command for use with --bcast option.
-- Add TCPTimeout option to slurm[dbd].conf. Decouples MessageTimeout
from TCP
connections.
-- Don't call primary controller for every RPC when backup is in
control.
-- Add --gres-flags=enforce-binding option to salloc, sbatch and srun
commands.
If set, the only CPUs available to the job will be those bound to
the
selected GRES (i.e. the CPUs identified in the gres.conf file will
be
strictly enforced rather than advisory).
-- Change how a node's allocated CPU count is calculated to avoid
double
counting CPUs allocated to multiple jobs at the same time.
-- Added SchedulingParameters option of "bf_min_prio_reserve". Jobs
below
the specified threshold will not have resources reserved for them.
-- Added "sacctmgr show lostjobs" to report any orphaned jobs in the
database.
-- When a stepd is about to shutdown and send it's response to srun
make the wait to return data only hit after 500 nodes and
configurable
based on the TcpTimeout value.
-- Add functionality to reset the lft and rgt values of the association
table
with the slurmdbd.
-- Add SchedulerParameter no_env_cache, if set no env cache will be use
when
launching a job, instead the job will fail and drain the node if the
env
isn't loaded normally.