Slurm versions 14.11.2 and 15.08.0-pre1 are now available. Version 14.11.2 includes quite a few relatively minor bug fixes.

Version 15.08.0 is under active development and its release is planned in August 2015. While this is the first pre-release there is already quite a bit of new functionality.

Both versions can be downloaded from http://schedmd.com/#repos

Highlights of the 2 versions are these

* Changes in Slurm 14.11.2
==========================
 -- Fix Centos5 compile errors.
 -- Fix issue with association hash not getting the correct index which
    could result in seg fault.
 -- Fix salloc/sbatch -B segfault.
 -- Avoid huge malloc if GRES configured with "Type" and huge "Count".
-- Fix jobs from starting in overlapping reservations that won't finish before
    a "maint" reservation begins.
-- When node gets drained while in state mixed display its status as draining
    in sinfo output.
 -- Allow priority/multifactor to work with sched/wiki(2) if all priorities
have no weight. This allows for association and QOS decay limits to work.
 -- Fix "squeue --start" to override SQUEUE_FORMAT env variable.
-- Fix scancel to be able to cancel multiple jobs that are space delimited. -- Log Cray MPI job calling exit() without mpi_fini(), but do not treat it as
    a fatal error. This partially reverts logic added in version 14.03.9.
 -- sview - Fix displaying of suspended steps elapsed times.
 -- Increase number of messages that get cached before throwing them away
    when the DBD is down.
-- Fix jobs from starting in overlapping reservations that won't finish before
    a "maint" reservation begins.
 -- Restore GRES functionality with select/linear plugin. It was broken in
    version  14.03.10.
 -- Fix bug with GRES having multiple types that can cause slurmctld abort.
 -- Fix squeue issue with not recognizing "localhost" in --nodelist option.
 -- Make sure the bitstrings for a partitions Allow/DenyQOS are up to date
    when running from cache.
 -- Add smap support for job arrays and larger job ID values.
-- Fix possible race condition when attempting to use QOS on a system running
    accounting_storage/filetxt.
-- Fix issue with accounting_storage/filetxt and job arrays not being printed
    correctly.
 -- In proctrack/linuxproc and proctrack/pgid, check the result of strtol()
for error condition rather than errno, which might have a vestigial error
    code.
 -- Improve information recording for jobs deferred due to advanced
    reservation.
 -- Exports eio_new_initial_obj to the plugins and initialize kvs_seq on
    mpi/pmi2 setup to support launching.

* Changes in Slurm 15.08.0pre1
==============================
-- Add sbcast support for file transfer to resources allocated to a job step
    rather than a job allocation.
 -- Change structures with association in them to assoc to save space.
 -- Add support for job dependencies jointed with OR operator (e.g.
    "--depend=afterok:123?afternotok:124").
-- Add "--bb" (burst buffer specification) option to salloc, sbatch, and srun. -- Added configuration parameters BurstBufferParameters and BurstBufferType.
 -- Added burst_buffer plugin infrastructure (needs many more functions).
-- Make it so when the fanout logic comes across a node that is down we abandon the tree to avoid worst case scenarios when the entire branch is down and
    we have to try each serially.
 -- Add better error reporting of invalid partitions at submission time.
-- Move will-run test for multiple clusters from the sbatch code into the API
    so that it can be used with DRMAA.
 -- If a non-exclusive allocation requests --hint=nomultithread on a
    CR_CORE/SOCKET system lay out tasks correctly.
-- Avoid including unused CPUs in a job's allocation when cores or sockets are
    allocated.
-- Added new job state of STOPPED indicating processes have been stopped with a SIGSTOP (using scancel or sview), but retain its allocated CPUs. Job state
    returns to RUNNING when SIGCONT is sent (also using scancel or sview).
-- Added EioTimeout parameter to slurm.conf. It is the number of seconds srun
    waits for slurmstepd to close the TCP/IP connection used to relay data
between the user application and srun when the user application terminates. -- Remove slurmctld/dynalloc plugin as the work was never completed, so it is
    not worth the effort of continued support at this time.
 -- Remove DynAllocPort configuration parameter.
-- Add advance reservation flag of "replace" that causes allocated resources
    to be replaced with idle resources. This maintains a pool of available
    resources that maintains a constant size (to the extent possible).
 -- Added SchedulerParameters option of "bf_busy_nodes". When selecting
resources for pending jobs to reserve for future execution (i.e. the job can not be started immediately), then preferentially select nodes that are
    in use. This will tend to leave currently idle resources available for
backfilling longer running jobs, but may result in allocations having less than optimal network topology. This option is currently only supported by
    the select/cons_res plugin.
-- Permit "SuspendTime=NONE" as slurm.conf value rather than only a numeric
    value to match "scontrol show config" output.
 -- Add the 'scontrol show cache' command which displays the associations
    in slurmctld.
 -- Test more frequently for node boot completion before starting a job.
    Provides better responsiveness.
 -- Fix PMI2 singleton initialization.
-- Permit PreemptType=qos and PreemptMode=suspend,gang to be used together. A high-priority QOS job will now oversubscribe resources and gang schedule,
    but only if there are insufficient resources for the job to be started
    without preemption. NOTE: That with PreemptType=qos, the partition's
Shared=FORCE:# configuration option will permit one job more per resource
    to be run than than specified, but only if started by preemption.
 -- Remove the CR_ALLOCATE_FULL_SOCKET configuration option.  It is now the
    default.
 -- Fix a race condition in PMI2 when fencing counters can be out of sync.
 -- Increase the MAX_PACK_MEM_LEN define to avoid PMI2 failure when fencing
    with large amount of ranks.
 -- Add QOS option to a partition.  This will allow a partition to have
    all the limits a QOS has.  If a limit is set in both QOS the partition
    QOS will override the job's QOS unless the job's QOS has the
    PartitionQOS flag set.
 -- The task_dist_states variable has been split into "flags" and "base"
components. Added SLURM_DIST_PACK_NODES and SLURM_DIST_NO_PACK_NODES values to give user greater control over task distribution. The srun --dist options has been modified to accept a "Pack" and "NoPack" option. These options can
    be used to override the CR_PACK_NODE configuration option.

Reply via email to