SLURM version 2.3.0-rc2 is now available. "rc1" indicates this is a release candidate for version 2.3.0. We do not recommend its use for production computers yet, but can be considered close to the final release of version 2.3.0 for testing purposes. Our intent is to perform testing and debugging of this code over the coming weeks and release version 2.3.0 when stable. We do not intend to perform major development work in version 2.3, but target future development work for version 2.4. Major changes to the code since the last pre-release of version 2.3 are noted below. A summary of the major changes in version 2.3 from version 2.2 can be found in the file "RELEASE_NOTES" with the distributed files.
The code is available here: http://www.schedmd.com/#repos * Changes in SLURM 2.3.0.rc2 ============================ -- With sched/wiki or sched/wiki2 (Maui or Moab scheduler), insure that a requeued job's priority is reset to zero. -- BLUEGENE - fix to run steps correctly in a BGL/P emulated system. -- Fixed issue where if there was a network issue between the slurmctld and the DBD where both remained up but were disconnected the slurmctld would get registered again with the DBD. -- Fixed issue where if the DBD connection from the ctld goes away because of a POLLERR the dbd_fail callback is called. -- BLUEGENE - Fix to smap command-line mode display. -- Change in GRES behavior for job steps: A job step's default generic resource allocation will be set to that of the job. If a job step's --gres value is set to "none" then none of the generic resources which have been allocated to the job will be allocated to the job step. -- Add srun environment value of SLURM_STEP_GRES to set default --gres value for a job step. -- Require SchedulerTimeSlice configuration parameter to be at least 5 seconds to avoid thrashing slurmd daemon. -- Cray - Fix to make nodes state in accounting consistent with state set by ALPS. -- Cray - A node DOWN to ALPS will be marked DOWN to SLURM only after reaching SlurmdTimeout. In the interim, the node state will be NO_RESPOND. This change makes behavior makes SLURM handling of the node DOWN state more consistent with ALPS. This change effects only Cray systems. -- Cray - Fix to work with 4.0.* instead of just 4.0.0 -- Cray - Modify srun/aprun wrapper to map --exclusive to -F exclusive and --share to -F share. Note this does not consider the partition's Shared configuration, so it is an imperfect mapping of options. -- BLUEGENE - Added notice in the print config to tell if you are emulated or not. -- BLUEGENE - Fix job step scalability issue with large task count. -- BGQ - Improved c-node selection when asked for a sub-block job that cannot fit into the available shape. -- BLUEGENE - Modify "scontrol show step" to show I/O nodes (BGL and BGP) or c-nodes (BGQ) allocated to each step. Change field name from "Nodes=" to "BP_List=". -- Code cleanup on step request to get the correct select_jobinfo. -- Memory leak fixed for rolling up accounting with down clusters. -- BGQ - fix issue where if first job step is the entire block and then the next parallel step is ran on a sub block, SLURM won't over subscribe cnodes. -- Treat duplicate switch name in topology.conf as fatal error. Patch from Rod Schultz, Bull -- Minor update to documentation describing the AllowGroups option for a partition in the slurm.conf. -- Fix problem with _job_create() when not using qos's. It makes _job_create() consistent with similar logic in select_nodes(). -- GrpCPURunMins in a QOS flushed out. -- Fix for squeue -t "CONFIGURING" to actually work. -- CRAY - Add cray.conf parameter of SyncTimeout, maximum time to defer job scheduling if SLURM node or job state are out of synchronization with ALPS. -- If salloc was run as interactive, with job control, reset the foreground process group of the terminal to the process group of the parent pid before exiting. Patch from Don Albert, Bull. -- BGQ - set up the corner of a sub block correctly based on a relative position in the block instead of absolute. -- BGQ - make sure the recently added select_jobinfo of a step launch request isn't sent to the slurmd where environment variables would be overwritten incorrectly.
