[slurm-dev] Slurm version 14.11.2 and 15.08.0-pre1 are now available

Danny Auble Fri, 12 Dec 2014 11:57:07 -0800

Slurm versions 14.11.2 and 15.08.0-pre1 are now available. Version14.11.2 includes quite a few relatively minor bug fixes.

Version 15.08.0 is under active development and its release is plannedin August 2015. While this is the first pre-release there is alreadyquite a bit of new functionality.


Both versions can be downloaded from http://schedmd.com/#repos

Highlights of the 2 versions are these

* Changes in Slurm 14.11.2
==========================
 -- Fix Centos5 compile errors.
 -- Fix issue with association hash not getting the correct index which
    could result in seg fault.
 -- Fix salloc/sbatch -B segfault.
 -- Avoid huge malloc if GRES configured with "Type" and huge "Count".

-- Fix jobs from starting in overlapping reservations that won'tfinish before

    a "maint" reservation begins.

-- When node gets drained while in state mixed display its status asdraining

    in sinfo output.
 -- Allow priority/multifactor to work with sched/wiki(2) if all priorities

have no weight. This allows for association and QOS decay limitsto work.

 -- Fix "squeue --start" to override SQUEUE_FORMAT env variable.

-- Fix scancel to be able to cancel multiple jobs that are spacedelimited.-- Log Cray MPI job calling exit() without mpi_fini(), but do nottreat it as

    a fatal error. This partially reverts logic added in version 14.03.9.
 -- sview - Fix displaying of suspended steps elapsed times.
 -- Increase number of messages that get cached before throwing them away
    when the DBD is down.

-- Fix jobs from starting in overlapping reservations that won'tfinish before

    a "maint" reservation begins.
 -- Restore GRES functionality with select/linear plugin. It was broken in
    version  14.03.10.
 -- Fix bug with GRES having multiple types that can cause slurmctld abort.
 -- Fix squeue issue with not recognizing "localhost" in --nodelist option.
 -- Make sure the bitstrings for a partitions Allow/DenyQOS are up to date
    when running from cache.
 -- Add smap support for job arrays and larger job ID values.

-- Fix possible race condition when attempting to use QOS on a systemrunning

    accounting_storage/filetxt.

-- Fix issue with accounting_storage/filetxt and job arrays not beingprinted

    correctly.
 -- In proctrack/linuxproc and proctrack/pgid, check the result of strtol()

for error condition rather than errno, which might have a vestigialerror

    code.
 -- Improve information recording for jobs deferred due to advanced
    reservation.
 -- Exports eio_new_initial_obj to the plugins and initialize kvs_seq on
    mpi/pmi2 setup to support launching.

* Changes in Slurm 15.08.0pre1
==============================

-- Add sbcast support for file transfer to resources allocated to ajob step

    rather than a job allocation.
 -- Change structures with association in them to assoc to save space.
 -- Add support for job dependencies jointed with OR operator (e.g.
    "--depend=afterok:123?afternotok:124").

-- Add "--bb" (burst buffer specification) option to salloc, sbatch,and srun.-- Added configuration parameters BurstBufferParameters andBurstBufferType.

 -- Added burst_buffer plugin infrastructure (needs many more functions).

-- Make it so when the fanout logic comes across a node that is downwe abandonthe tree to avoid worst case scenarios when the entire branch isdown and

    we have to try each serially.
 -- Add better error reporting of invalid partitions at submission time.

-- Move will-run test for multiple clusters from the sbatch code intothe API

    so that it can be used with DRMAA.
 -- If a non-exclusive allocation requests --hint=nomultithread on a
    CR_CORE/SOCKET system lay out tasks correctly.

-- Avoid including unused CPUs in a job's allocation when cores orsockets are

    allocated.

-- Added new job state of STOPPED indicating processes have beenstopped with aSIGSTOP (using scancel or sview), but retain its allocated CPUs.Job state

    returns to RUNNING when SIGCONT is sent (also using scancel or sview).

-- Added EioTimeout parameter to slurm.conf. It is the number ofseconds srun

    waits for slurmstepd to close the TCP/IP connection used to relay data

between the user application and srun when the user applicationterminates.-- Remove slurmctld/dynalloc plugin as the work was never completed,so it is

    not worth the effort of continued support at this time.
 -- Remove DynAllocPort configuration parameter.

-- Add advance reservation flag of "replace" that causes allocatedresources

    to be replaced with idle resources. This maintains a pool of available
    resources that maintains a constant size (to the extent possible).
 -- Added SchedulerParameters option of "bf_busy_nodes". When selecting

resources for pending jobs to reserve for future execution (i.e.the jobcan not be started immediately), then preferentially select nodesthat are

    in use. This will tend to leave currently idle resources available for

backfilling longer running jobs, but may result in allocationshaving lessthan optimal network topology. This option is currently onlysupported by

    the select/cons_res plugin.

-- Permit "SuspendTime=NONE" as slurm.conf value rather than only anumeric

    value to match "scontrol show config" output.
 -- Add the 'scontrol show cache' command which displays the associations
    in slurmctld.
 -- Test more frequently for node boot completion before starting a job.
    Provides better responsiveness.
 -- Fix PMI2 singleton initialization.

-- Permit PreemptType=qos and PreemptMode=suspend,gang to be usedtogether.A high-priority QOS job will now oversubscribe resources and gangschedule,

    but only if there are insufficient resources for the job to be started
    without preemption. NOTE: That with PreemptType=qos, the partition's

Shared=FORCE:# configuration option will permit one job more perresource

    to be run than than specified, but only if started by preemption.
 -- Remove the CR_ALLOCATE_FULL_SOCKET configuration option.  It is now the
    default.
 -- Fix a race condition in PMI2 when fencing counters can be out of sync.
 -- Increase the MAX_PACK_MEM_LEN define to avoid PMI2 failure when fencing
    with large amount of ranks.
 -- Add QOS option to a partition.  This will allow a partition to have
    all the limits a QOS has.  If a limit is set in both QOS the partition
    QOS will override the job's QOS unless the job's QOS has the
    PartitionQOS flag set.
 -- The task_dist_states variable has been split into "flags" and "base"

components. Added SLURM_DIST_PACK_NODES andSLURM_DIST_NO_PACK_NODES valuesto give user greater control over task distribution. The srun--dist optionshas been modified to accept a "Pack" and "NoPack" option. Theseoptions can

    be used to override the CR_PACK_NODE configuration option.

[slurm-dev] Slurm version 14.11.2 and 15.08.0-pre1 are now available

Reply via email to