Slurm version 16.05.2 is now available and includes 16 bug fixes
developed over the past week, including two which can cause the
slurmctld daemon to crash.
Slurm downloads are available from:
http://www.schedmd.com/#repos
* Changes in Slurm 16.05.2
==========================
-- CRAY - Fix issue where the proctrack plugin could hang if the
container
id wasn't able to be made.
-- Move test for job wait reason value of BurstBufferResources and
BurstBufferStageIn later in the scheduling logic.
-- Document which srun options apply to only job, only step, or job and
step
allocations.
-- Use more compatible function to get thread name (>= 2.6.11).
-- Fix order of job then step id when noting cleaning flag being set.
-- Make it so the extern step sends a message with accounting
information
back to the slurmctld.
-- Make it so the extern step calls the select_g_step_start|finish
functions.
-- Don't print error when extern step is canceled because job is
ending.
-- Handle a few error codes when dealing with the extern step to make
sure
we have the pids added to the system correctly.
-- Add support for job dependencies with job array expressions.
Previous logic
required listing each task of job array individually.
-- Make sure tres_cnt is set before creating a slurmdb_assoc_usage_t.
-- Prevent backfill scheduler from starting a second "singleton" job if
another
one started during a backfill sleep.
-- Fix for invalid array pointer when creating advanced reservation
when job
allocations span heterogeneous nodes (differing core or socket
counts).
-- Fix hostlist_ranged_string_xmalloc_dims to correctly not put
brackets on
hostlists when brackets == 0.
-- Make sure we don't get brackets when making a range of reserved
ports
for a step.
-- Change fatal to an error if port ranges aren't correct when reading
state
for steps.