We are pleased to announce the availability of Slurm version 15.08.7. It contains 46 relatively minor bug fixes you may find interesting. We recommend upgrading to 15.08.7 at your earliest convenience.

Slurm downloads are available from http://schedmd.com/#repos.

Here is a list of what has changed...

* Changes in Slurm 15.08.7
==========================
 -- sched/backfill: If a job can not be started within the configured
    backfill_window, set it's start time to 0 (unknown) rather than the end
    of the backfill_window.
 -- Remove the 1024-character limit on lines in batch scripts.
 -- burst_buffer/cray: Round up swap size by configured granularity.
 -- select/cray: Log repeated aeld reconnects.
-- task/affinity: Disable core-level task binding if more CPUs required than
    available cores.
-- Preemption/gang scheduling: If a job is suspended at slurmctld restart or reconfiguration time, then leave it suspended rather than resume+suspend. -- Don't use lower weight nodes for job allocation when topology/tree used.
 -- BGQ - If a cable goes into error state remove the under lying block on
a dynamic system and mark the block in error on a static/overlap system.
 -- BGQ - Fix regression in 9cc4ae8add7f where blocks would be deleted on
    static/overlap systems when some hardware issue happens when restarting
    the slurmctld.
-- Log if CLOUD node configured without a resume/suspend program or suspend
    time.
-- MYSQL - Better locking around g_qos_count which was previously unprotected.
 -- Correct size of buffer used for jobid2str to avoid truncation.
 -- Fix allocation/distribution of tasks across multiple nodes when
    --hint=nomultithread is requested.
-- If a reservation's nodes value is "all" then track the current nodes in the
    system, even if those nodes change.
 -- Fix formatting if using "tree" option with sreport.
 -- Make it so sreport prints out a line for non-existent TRES instead of
    error message.
-- Set job's reason to "Priority" when higher priority job in that partition
    (or reservation) can not start rather than leaving the reason set to
    "Resources".
 -- Fix memory corruption when a new non-generic TRES is added to the
    DBD for the first time.  The corruption is only noticed at shutdown.
-- burst_buffer/cray - Improve tracking of allocated resources to handle race
    condition when reading state while buffer allocation is in progress.
 -- If a job is submitted only with -c option and numcpus is updated before
    the job starts update the cpus_per_task appropriately.
 -- Update salloc/sbatch/srun documentation to mention time granularity.
 -- Fixed memory leak when freeing assoc_mgr_info_msg_t.
 -- Prevent possible use of empty reservation core bitmap, causing abort.
 -- Remove unneeded pack32's from qos_rec when qos_rec is NULL.
 -- Make sacctmgr print MaxJobsPerUser when adding/altering a QOS.
 -- Correct dependency formatting to print array task ids if set.
 -- Update sacctmgr help with current QOS options.
 -- Update slurmstepd to initialize authentication before task launch.
 -- burst_cray/cray: Eliminate need for dedicated nodes.
 -- If no MsgAggregationParams is set don't set the internal string to
    anything.  The slurmd will process things correctly after the fact.
 -- Fix output from api when printing job step not found.
 -- Don't allow user specified reservation names to disrupt the normal
    reservation sequeuece numbering scheme.
 -- Fix scontrol to be able to accept TRES as an option when creating
    a reservation.
 -- contrib/torque/qstat.pl - return exit code of zero even with no records
    printed for 'qstat -u'.
-- When a reservation is created or updated, compress user provided node names using hostlist functions (e.g. translate user input of "Nodes=tux1,tux2"
    into "Nodes=tux[1-2]").
-- Change output routines for scontrol show partition/reservation to handle
    unexpectedly large strings.
 -- Add more partition fields to "scontrol write config" output file.
-- Backfill scheduling fix: If a job can't be started due to a "group" resource limit, rather than reserve resources for it when the next job ends, don't
    reserve any resources for it.
-- Avoid slurmstepd abort if malloc fails during accounting gather operation. -- Fix nodes from being overallocated when allocation straddles multiple nodes.
 -- Fix memory leak in slurmctld job array logic.
-- Prevent decrementing of TRESRunMins when AccountingStorageEnforce=limits is
    not set.
-- Fix backfill scheduling bug which could postpone the scheduling of jobs due
    to avoidance of nodes in COMPLETING state.
-- Properly account for memory, CPUs and GRES when slurmctld is reconfigured while there is a suspended job. Previous logic would add the CPUs, but not memory or GPUs. This would result in underflow/overflow errors in select
    cons_res plugin.
 -- Strip flags from a job state in qstat wrapper before evaluating.
 -- Add missing job states from the qstat wrapper.

Reply via email to