Greetings everyone.

We are pleased to announce the release of 15.08.0! It contains many new features and performance enhancements. Please read the RELEASE_NOTES file to get an idea of the new items that have been added. The on-line Slurm documentation has been updated to reflect this release. Thanks to everyone that helped in this release.

Some notable changes are listed here.

 -- Added TRES (Trackable resources) to track utilization of memory, GRES,
    burst buffer, license, and any other configurable resources in the
    accounting database.
-- Add configurable billing weight that takes into consideration any TRES when
    calculating a job's resource utilization.
-- Add configurable prioritization factor that takes into consideration any
    TRES when calculating a job's resource utilization.
-- Add burst buffer support infrastructure. Currently available plugin include burst_buffer/generic (uses administrator supplied programs to manage file
    staging) and burst_buffer/cray (uses Cray APIs to manage buffers).
-- Add power capping support for Cray systems with automatic rebalancing of
    power allocation between nodes.
-- Modify slurmctld outgoing RPC logic to support more parallel tasks (up to
    85 RPCs and 256 pthreads; the old logic supported up to 21 RPCs and 256
    threads).
 -- Add support for job dependencies joined with OR operator (e.g.
    "--depend=afterok:123?afternotok:124").
-- Add advance reservation flag of "replace" that causes allocated resources
    to be replaced with idle resources. This maintains a pool of available
    resources that maintains a constant size (to the extent possible).
-- Permit PreemptType=qos and PreemptMode=suspend,gang to be used together. A high-priority QOS job will now oversubscribe resources and gang schedule,
    but only if there are insufficient resources for the job to be started
    without preemption. NOTE: That with PreemptType=qos, the partition's
Shared=FORCE:# configuration option will permit one job more per resource
    to be run than than specified, but only if started by preemption.
-- A partition can now have an associated QOS. This will allow a partition
    to have all the limits a QOS has.  If a limit is set in both QOS
the partition QOS will override the job's QOS unless the job's QOS has the
    'OverPartQOS' flag set.
-- Expanded --cpu-freq parameters to include min-max:governor specifications.
    --cpu-freq now supported on salloc and sbatch.
 -- Add support for optimized job allocations with respect to SGI Hypercube
    topology.
    NOTE: Only supported with select/linear plugin.
    NOTE: The program contribs/sgi/netloc_to_topology can be used to build
    Slurm's topology.conf file.
-- Add the ability for a compute node to be allocated to multiple jobs, but
    restricted to a single user. Added "--exclusive=user" option to salloc,
the scontrol and sview commands. Added new partition configuration parameter
    "ExclusiveUser=yes|no".
 -- Verify that all plugin version numbers are identical to the component
attempting to load them. Without this verification, the plugin can reference Slurm functions in the caller which differ (e.g. the underlying function's
    arguments could have changed between Slurm versions).
    NOTE: All plugins (except SPANK) must be built against the identical
version of Slurm in order to be used by any Slurm command or daemon. This should eliminate some very difficult to diagnose problems due to use of old
    plugins.
 -- Optimize resource allocation for systems with dragonfly networks.
 -- Added plugin to record job completion information using Elasticsearch.
    Libcurl is required for build. Configure slurm.conf as follows
    JobCompType=jobcomp/elasticsearch
    JobCompLoc=http://YOUR_ELASTICSEARCH_SERVER:9200
-- DATABASE SCHEME HAS CHANGED. WHEN UPDATING THE MIGRATION PROCESS MAY TAKE SOME AMOUNT OF TIME DEPENDING ON HOW LARGE YOUR DATABASE IS. WHILE UPDATING NO RECORDS WILL BE LOST, BUT THE SLURMDBD MAY NOT BE RESPONSIVE DURING THE UPDATE. IT WILL ALSO NOT BE POSSIBLE TO AUTOMATICALLY REVERT THE DATABASE
    TO THE FORMAT FOR AN EARLIER VERSION OF SLURM. PLAN ACCORDINGLY.
-- The performance of Profiling with HDF5 is improved. In addition, internal
    structures are changed to make it easier to add new profile types,
particularly energy sensors. This has introduced an operational issue. See
    OTHER CHANGES.
 -- MPI/MVAPICH plugin now requires Munge for authentication.
 -- In order to support inter-cluster job dependencies, the MaxJobID
configuration parameter default value has been reduced from 4,294,901,760
    to 2,147,418,112 and it's maximum value is now 2,147,463,647.
    ANY JOBS WITH A JOB ID ABOVE 2,147,463,647 WILL BE PURGED WHEN SLURM IS
    UPGRADED FROM AN OLDER VERSION!


We have also release one of the last tags of 14.11 in the form of 14.11.9.

Changes are listed here

 -- Correct "sdiag" backfill cycle time calculation if it yields locks. A
    microsecond value was being treated as a second value resulting in an
    overflow in the calcuation.
 -- Fix segfault when updating timelimit on jobarray task.
-- Fix to job array update logic that can result in a task ID of 4294967294.
 -- Fix of job array update, previous logic could fail to update some tasks
    of a job array for some fields.
 -- CRAY - Fix seg fault if a blade is replaced and slurmctld is restarted.
 -- Fix plane distribution to allocate in blocks rather than cyclically.
 -- squeue - Remove newline from job array ID value printed.
 -- squeue - Enable filtering for job state SPECIAL_EXIT.
 -- Prevent job array task ID being inappropriately set to NO_VAL.
 -- MYSQL - Make it so you don't have to restart the slurmctld
    to gain the correct limit when a parent account is root and you
    remove a subaccount's limit which exists on the parent account.
 -- MYSQL - Close chance of setting the wrong limit on an association
    when removing a limit from an association on multiple clusters
    at the same time.
 -- MYSQL - Fix minor memory leak when modifying an association but no
    change was made.
-- srun command line of either --mem or --mem-per-cpu will override both the
    SLURM_MEM_PER_CPU and SLURM_MEM_PER_NODE environment variables.
-- Prevent slurmctld abort on update of advanced reservation that contains no
    nodes.
-- ALPS - Revert commit 2c95e2d22 which also removes commit 2e2de6a4 allowing
    cray with the SubAllocate option to work as it did with 2.5.
 -- Properly parse CPU frequency data on POWER systems.
 -- Correct sacct.a man pages describing -i option.
 -- Capture salloc/srun information in sdiag statistics.
 -- Fix bug in node selection with topology optimization.
 -- Don't set distribution when srun requests 0 memory.
-- Read in correct number of nodes from SLURM_HOSTFILE when specifying nodes
    and --distribution=arbitrary.
 -- Fix segfault in Bluegene setups where RebootQOSList is defined in
    bluegene.conf and accounting is not setup.
 -- MYSQL - Update mod_time when updating a start job record or adding one.
 -- MYSQL - Fix issue where if an association id ever changes on at least a
    portion of a job array is pending after it's initial start in the
    database it could create another row for the remain array instead
    of using the already existing row.
-- Fix scheduling anomaly with job arrays submitted to multiple partitions,
    jobs could be started out of priority order.
 -- If a host has suspended jobs do not reboot it. Reboot only hosts
    with no jobs in any state.
 -- ALPS - Fix issue when using --exclusive flag on srun to do the correct
    thing (-F exclusive) instead of -F share.
 -- Fix various memory leaks in the Perl API.
 -- Fix a bug in the controller which display jobs in CF state as RUNNING.
-- Preserve advanced _core_ reservation when nodes added/removed/resized on
    slurmctld restart. Rebuild core_bitmap as needed.
 -- Fix for non-standard Munge port location for srun/pmi use.
 -- Fix gang scheduling/preemption issue that could cancel job at startup.
 -- Fix a bug in squeue which prevented squeue -tPD to print array jobs.
-- Sort job arrays in job queue according to array_task_id when priorities are
    equal.
 -- Fix segfault in sreport when there was no response from the dbd.
 -- ALPS - Fix compile to not link against -ljob and -lexpat with every lib
    or binary.
-- Fix testing for CR_Memory when CR_Memory and CR_ONE_TASK_PER_CORE are used
    with select/linear.
-- MySQL - Fix minor memory leak if a connection ever goes away whist using it.
 -- ALPS - Make it so srun --hint=nomultithread works correctly.
-- Prevent job array task ID from being reported as NO_VAL if last task in the
    array gets requeued.
 -- Fix some potential deadlock issues when state files don't exist in the
    association manager.
 -- Correct RebootProgram logic when executed outside of a maintenance
    reservation.
 -- Requeue job if possible when slurmstepd aborts.

Both versions can be downloaded from the normal spot http://schedmd.com/#repos.

--
Danny Auble
President, SchedMD LLC
Commercial Slurm Development and Support
===============================================================
Slurm User Group Meeting, 15-16 September 2015, Washington D.C.
http://slurm.schedmd.com/slurm_ug_agenda.html

Reply via email to