We are pleased to announce the availability of Slurm version 2.5.7 with the bug fixes listed below, plus version 2.6.0-rc1 (release candidate 1) with the bug fixes and enhancements listed below. We plan to release version 2.6.0 after more testing. See the "RELEASE_NOTES" file in the distribution for a description of the major changes in version 2.6.
A great way to find out about Slurm development is to attend the Slurm User Group Meeting, September 18 - 19 in Oakland, California, USA: http://www.schedmd.com/slurmdocs/slurm_ug_agenda.html The Slurm distributions are available from: http://www.schedmd.com/#repos * Changes in Slurm 2.5.7 ======================== -- Fix for linking to the select/cray plugin to not give warning about undefined variable. -- Add missing symbols to the xlator.h -- Avoid placing pending jobs in AdminHold state due to backfill scheduler interactions with advanced reservation. -- Accounting - make average by task not cpu. -- CRAY - Change logging of transient ALPS errors from error() to debug(). -- POE - Correct logic to support poe option "-euidevice sn_all" and "-euidevice sn_single". -- Accounting - Fix minor initialization error. -- POE - Correct logic to support srun network instances count with POE. -- POE - With the srun --launch-cmd option, report proper task count when the --cpus-per-task option is used without the --ntasks option. -- POE - Fix logic binding tasks to CPUs. -- sview - Fix race condition where new information could of slipped past the node tab and we didn't notice. -- Accounting - Fix an invalid memory read when slurmctld sends data about start job to slurmdbd. -- If a prolog or epilog failure occurs, drain the node rather than setting it down and killing all of its jobs. -- Priority/multifactor - Avoid underflow in half-life calculation. -- POE - pack missing variable to allow fanout (more than 32 nodes) -- Prevent clearing reason field for pending jobs. This bug was introduced in v2.5.5 (see "Reject job at submit time ..."). -- BGQ - Fix issue with preemption on sub-block jobs where a job would kill all preemptable jobs on the midplane instead of just the ones it needed to. -- switch/nrt - Validate dynamic window allocation size. -- BGQ - When --geo is requested do not impose the default conn_types. -- CRAY - Support CLE 4.2.0 -- RebootNode logic - Defers (rather than forgets) reboot request with job running on the node within a reservation. -- switch/nrt - Correct network_id use logic. Correct support for user sn_all and sn_single options. -- sched/backfill - Modify logic to reduce overhead under heavy load. -- Fix job step allocation with --exclusive and --hostlist option. -- Select/cons_res - Fix bug resulting in error of "cons_res: sync loop not progressing, holding job #" -- checkpoint/blcr - Reset max_nodes from zero to NO_VAL on job restart. -- launch/poe - Fix for hostlist file support with repeated host names. -- priority/multifactor2 - Prevent possible divide by zero. -- srun - Don't check for executable if --test-only flag is used. -- energy - On a single node only use the last task for gathering energy. Since we don't currently track energy usage per task (only per step). Otherwise we get double the energy consumption value. * Changes in Slurm 2.6.0rc1 =========================== -- Added helper script for launching symmetric and MIC-only MPI tasks within SLURM (in contribs/mic/mpirun-mic). -- Change maximum delay for state save from 2 secs to 5 secs. Make timeout configurable at build time by defining SAVE_MAX_WAIT. -- Modify slurmctld data structure locking to interleave read and write locks rather than always favor write locks over read locks. -- Added sacct format option of "ALL" to print all fields. -- Deprecate the SchedulerParameters value of "interval" use "bf_interval" instead as documented. -- Add acct_gather_profile/hdf5 to profile jobs with hdf5 -- Added MaxCPUsPerNode partition configuration parameter. This can be especially useful to schedule systems with GPUs. -- Permit "scontrol reboot_node" for nodes in MAINT reservation. -- Added "PriorityFlags" value of "SMALL_RELATIVE_TO_TIME". If set, the job's size component will be based upon not the job size alone, but the job's size divided by it's time limit. -- Added sbatch option "--ignore-pbs" to ignore "#PBS" options in the batch script. -- Rename slurm_step_ctx_params_t field from "mem_per_cpu" to "pn_min_memory". Job step now accepts memory specification in either per-cpu or per-node basis. -- Add ability to specify host repitition count in the srun hostfile (e.g. "host1*2" is equivalent to "host1,host1").
