Slurm version 2.5.2 is now available with the bug fixes described below. We have also made availablea pre-release of version 2.6,(still under development). Notable features in v2.6 include support for job arrays and accounting for a job's energy consumption using IPMI. The job array documentation is available here: http://www.schedmd.com/slurmdocs/job_array.html
The latest versions of Slurm are available from: http://www.schedmd.com/#repos * Changes in SLURM 2.5.2 ======================== -- Fix advanced reservation recovery logic when upgrading from version 2.4. -- BLUEGENE - fix for QOS/Association node limits. -- Add missing "safe" flag from print of AccountStorageEnforce option. -- Fix logic to optimize GRES topology with respect to allocated CPUs. -- Add job_submit/all_partitions plugin to set a job's default partition to ALL available partitions in the cluster. -- Modify switch/nrt logic to permit build without libnrt.so library. -- Handle srun task launch failure without duplicate error messages or abort. -- Fix bug in QoS limits enforcement when slurmctld restarts and user not yet added to the QOS list. -- Fix issue where sjstat and sjobexitmod was installed in 2 different RPMs. -- Fix for job request of multiple partitions in which some partitions lack nodes with required features. -- Permit a job to use a QOS they do not have access to if an administrator manually set the job's QOS (previously the job would be rejected). -- Make more variables available to job_submit/lua plugin: slurm.MEM_PER_CPU, slurm.NO_VAL, etc. -- Fix topology/tree logic when nodes defined in slurm.conf get re-ordered. -- In select/cons_res, correct logic to allocate whole sockets to jobs. Work by Magnus Jonsson, Umea University. -- In select/cons_res, correct logic when job removed from only some nodes. -- Avoid apparent kernel bug in 2.6.32 which apparently is solved in at least 3.5.0. This avoids a stack overflow when running jobs on more than 120k nodes. -- BLUEGENE - If we made a block that isn't runnable because of a overlapping block, destroy it correctly. -- Switch/nrt - Dynamically load libnrt.so from within the plugin as needed. This eliminates the need for libnrt.so on the head node. -- BLUEGENE - Fix in reservation logic that could cause abort. * Changes in SLURM 2.6.0-pre1 ============================= -- Add "state" field to job step information reported by scontrol. -- Notify srun to retry step creation upon completion of other job steps rather than polling. This results in much faster throughput for job step execution with --exclusive option. -- Added "ResvEpilog" and "ResvProlog" configuration parameters to execute a program at the beginning and end of each reservation. -- Added "slurm_load_job_user" function. This is a variation of "slurm_load_jobs", but accepts a user ID argument, potentially resulting in substantial performance improvement for "squeue --user=ID" -- Added "slurm_load_node_single" function. This is a variation of "slurm_load_nodes", but accepts a node name argument, potentially resulting in substantial performance improvement for "sinfo --nodes=NAME". -- Added "HealthCheckNodeState" configuration parameter identify node states on which HealthCheckProgram should be executed. -- Remove sacct --dump --formatted-dump options which were deprecated in 2.5. -- Added support for job arrays (phase 1 of effort). See "man sbatch" option -a/--array for details. -- Add new AccountStorageEnforce options of 'nojobs' and 'nosteps' which will allow the use of accounting features like associations, qos and limits but not keep track of jobs or steps in accounting. -- Cray - Add new cray.conf parameter of "AlpsEngine" to specify the communication protocol to be used for ALPS/BASIL. -- select/cons_res plugin: Correction to CPU allocation count logic in for cores without hyperthreading. -- Added new SelectTypeParameter value of "CR_ALLOCATE_FULL_SOCKET". -- Added PriorityFlags value of "TICKET_BASED" and merged priority/multifactor2 plugin into priority/multifactor plugin. -- Add "KeepAliveTime" configuration parameter controlling how long sockets used for srun/slurmstepd communications are kept alive after disconnect. -- Added SLURM_SUBMIT_HOST to salloc, sbatch and srun job environment. -- Added SLURM_ARRAY_TASK_ID to environment of job array. -- Added squeue --array/-r option to optimize output for job arrays. -- Added "SlurmctldPlugstack" configuration parameter for generic stack of slurmctld daemon plugins. -- Removed contribs/arrayrun tool. Use native support for job arrays. -- Modify default installation locations for RPMs to match "make install": _prefix /usr/local _slurm_sysconfdir %{_prefix}/etc/slurm _mandir %{_prefix}/share/man _infodir %{_prefix}/share/info -- Add acct_gather_energy/ipmi which works off freeipmi for energy gathering
