Slurm version 14.03.10 includes quite a few relatively minor bug fixes, and will most likely be the last 14.03 release. Thanks to all those who helped make this a very stable release.

We hope to officially tag 14.11.0 before SC14. Version 14.11.0-rc3 includes a few bug fixes discovered in recent testing but is looking very stable. Thanks to everyone participating in the testing! If you can, please test this release so we can attempt to fix as many issues as we can before we tag 14.11.0.

Just a heads up, version 15.08 is already starting development we will most likely tag a pre1 of this later this month.

Slurm downloads are available from http://www.schedmd.com/#repos.

Here are some snips from the NEWS file on what has changed since the last releases.

* Changes in Slurm 14.03.10
===========================
 -- Fix a few sacctmgr error messages.
 -- Treat non-zero SlurmSchedLogLevel without SlurmSchedLogFile as a fatal
    error.
 -- Correct sched_config.html documentation SchedulingParameters
    should be SchedulerParameters.
 -- When using gres and cgroup ConstrainDevices set correct access
    permission for the batch step.
 -- Fix minor memory leak in jobcomp/mysql on slurmctld reconfig.
 -- Fix bug that prevented preservation of a job's GRES bitmap on slurmctld
restart or reconfigure (bug was introduced in 14.03.5 "Clear record of a
    job's gres when requeued" and only applies when GRES mapped to specific
    files).
 -- BGQ: Fix race condition when job fails due to hardware failure and is
requeued. Previous code could result in slurmctld abort with NULL pointer.
 -- Prevent negative job array index, which could cause slurmctld to crash.
 -- Fix issue with squeue/scontrol showing correct node_cnt when only tasks
    are specified.
 -- Check the status of the database connection before using it.
 -- ALPS - If an allocation requests -n set the BASIL -N option to the
    amount of tasks / number of node.
-- ALPS - Don't set the env var APRUN_DEFAULT_MEMORY, it is not needed anymore.
 -- Fix potential buffer overflow.
-- Give better estimates on pending node count if no node count is requested.
 -- BLUEGENE - Fix issue where requeuing jobs could cause an assert.

* Changes in Slurm 14.11.0rc3
=============================
 -- Allow envs to override autotools binaries in autogen.sh
 -- Added system services files.
-- If the jobs pends with DependencyNeverSatisfied keep it pending even after
    the job which it was depending upon was cleaned.
-- Let operators (in addition to user root and SlurmUser) see job script for
    other user's jobs.
-- Perl API modified to return node state of MIXED rather than ALLOCATED if
    only some CPUs allocated.
 -- Double Munge connect retry timeout from 1 to 2 seconds.
 -- sview - Remove unneeded code that was resolved globally in commit
    98e24b0dedc.
 -- Collect and report the accounting of the batch step and its children.
 -- Add configure checks for faccessat and eaccess, and make use of one of
    them if available.
 -- Make configure --enable-developer also set --enable-debug
 -- Introduce a SchedulerParameters variable kill_invalid_depend, if set
    then jobs pending with invalid dependency are going to be terminated.
 -- Move spank_user_task() call in slurmstepd after the task_g_pre_launch()
    so that the task affinity information is available to spank.
 -- Make /etc/init.d/slurm script return value 3 when the daemon is
    not running. This is required by Linux Standard Base Core
    Specification 3.1

Reply via email to