[slurm-dev] Slurm versions 2.5.4 and 2.6.0-pre2 are now available

Moe Jette Fri, 08 Mar 2013 12:37:07 -0800

Slurm versions 2.5.4 is now available with the bug fixes listed below  
and version 2.6.0-pre2 with the enhancements listed below.


The latest versions of Slurm are available from www.schedmd.com/#repos.

* Changes in Slurm 2.5.4
========================
  -- Fix bug in PrologSlurmctld use that would block job steps until node
     responds.
  -- CRAY - If a partition has MinNodes=0 and a batch job doesn't request nodes
     put the allocation to 1 instead of 0 which prevents the allocation to
     happen.
  -- Better debug when the database is down and using the --cluster option in
     the user commands.
  -- When asking for job states with sacct, default to 'now' instead  
of midnight
     of the current day.
  -- Fix for handling a test-only job or immediate job that fails while being
     built.
  -- Comment out all of the logic in the job_submit/defaults plugin. The logic
     is only an example and not meant for actual use.
  -- Eliminate configuration file 4096 character line limitation.
  -- More robust logic for tree message forward
  -- BGQ - When cnodes fail in a timeout fashion correctly look up parent
     midplane.
  -- Correct sinfo "%c" (node's CPU count) output value for Bluegene systems.
  -- Backfill - Responsive improvements for systems with large numbers of jobs
     (>5000) and using the SchedulerParameters option bf_max_job_user.
  -- slurmstepd: ensure that IO redirection openings from/to files correctly
     handle interruption
  -- BGQ - Able to handle when midplanes go into Hardware::SoftwareFailure
  -- GRES - Correct tracking of specific resources used after  
slurmctld restart.
     Counts would previously go negative as jobs terminate and decrement from
     a base value of zero.
  -- Fix for priority/multifactor2 plugin to not assert when configured with
     --enable-debug.
  -- Select/cons_res - If the job request specified --ntasks-per-socket and the
     allocation using is cores, then pack the tasks onto the sockets up to the
     specified value.
  -- BGQ - If a cnode goes into an 'error' state and the block containing the
     cnode does not have a job running on it do not resume the block.
  -- BGQ - Handle blocks that don't free themselves in a reasonable  
time better.
  -- BGQ - Fix for signaling steps when allocation ends before step.
  -- Fix for backfill scheduling logic with job preemption; starts more jobs.
  -- xcgroup - remove bugs with EINTR management in write calls
  -- jobacct_gather - fix total values to not always == the max values.
  -- Fix for handling node registration messages from older versions without
     energy data.
  -- BGQ - Allow user to request full dimensional mesh.
  -- sdiag command - Correction to jobs started value reported.
  -- Prevent slurmctld assert when invalid change to reservation with running
     jobs is made.
  -- BGQ - If signal is NODE_FAIL allow forward even if job is completing
     and timeout in the runjob_mux trying to send in this situation.
  -- BGQ - More robust checking for correct node, task, and ntasks-per-node
     options in srun, and push that logic to salloc and sbatch.
  -- GRES topology bug in core selection logic fixed.
  -- Fix to handle init.d script for querying status and not return 1 on
     success.

* Changes in Slurm 2.6.0pre2
============================
  -- Do not purge inactive interactive jobs that lack a port to ping (added
     for MR+ operation).
  -- Advanced reservations with hostname and core counts now supports asymetric
     reservations (e.g. specific different core count for each node).
  -- Added slurmctld/dynalloc plugin for MapReduce+ support.
  -- Added "DynAllocPort" configuration parameter.
  -- Added partition paramter of SelectTypeParameters to override system-wide
     value.
  -- Added cr_type to partition_info data structure.
  -- Added allocated memory to node information available (within the existing
     select_nodeinfo field of the node_info_t data structure). Added Allocated
     Memory to node information displayed by sview and scontrol commands.
  -- Make sched/backfill the default scheduling plugin rather than  
sched/builtin
     (FIFO).
  -- Added support for a job having different priorities in different  
partitions.
  -- Added new SchedulerParameters configuration parameter of "bf_continue"
     which permits the backfill scheduler to continue considering jobs for
     backfill scheduling after yielding locks even if new jobs have been
     submitted. This can result in lower priority jobs from being backfill
     scheduled instead of newly arrived higher priority jobs, but will permit
     more queued jobs to be considered for backfill scheduling.
  -- Added support to purge reservation records from accounting.
  -- Cray - Add support for Basil 1.3

[slurm-dev] Slurm versions 2.5.4 and 2.6.0-pre2 are now available

Reply via email to