SLURM versions 2.3.5 and 2.4.0-rc1 are now available from http://www.schedmd.com/#repos A description of the changes is appended.
This will most likely be the last 2.3 release unless a 2.3.6 is really warranted. Development for 2.4 has been halted and only bug fixes will be applied from now on. Our plans are to release an rc2 in a couple of weeks and a 2.4.0-1 a couple of weeks after that. Please test 2.4 and report any bugs to us through http://bugs.schedmd.com or through the slurm-dev list. Future developments will be in 2.5 released later this year (planned for October). We will release a 2.5.0-pre1 shortly. * Changes in SLURM 2.3.5 ======================== -- Improve support for overlapping advanced reservations. Patch from Bill Brophy, Bull. -- Modify Makefiles for support of Debian hardening flags. Patch from Simon Ruderich. -- CRAY: Fix support for configuration with SlurmdTimeout=0 (never mark node that is DOWN in ALPS as DOWN in SLURM). -- Fixed the setting of SLURM_SUBMIT_DIR for jobs submitted by Moab (BZ#1467). Patch by Don Lipari, LLNL. -- Correction to init.d/slurmdbd exit code for status option. Patch by Bill Brophy, Bull. -- When the optional max_time is not specified for --switches=count, the site max (SchedulerParameters=max_switch_wait=seconds) is used for the job. Based on patch from Rod Schultz. -- Fix bug in select/cons_res plugin when used with topology/tree and a node range count in job allocation request. -- Fixed moab_2_slurmdb.pl script to correctly work for end records. -- Add support for new SchedulerParameters of max_depend_depth defining the maximum number of jobs to test for circular dependencies (i.e. job A waits for job B to start and job B waits for job A to start). Default value is 10 jobs. -- Fix potential race condition if MinJobAge is very low (i.e. 1) and using slurmdbd accounting and running large amounts of jobs (>50 sec). Job information could be corrupted before it had a chance to reach the DBD. -- Fix state restore of job limit set from admin value for min_cpus. -- Fix clearing of limit values if an admin removes the limit for max cpus and time limit where it was previously set by an admin. -- Fix issue where log message is more than 256 chars and then has a format. -- Fix sched/wiki2 to support job account name, gres, partition name, wckey, or working directory that contains "#" (a job record separator). Also fix for wckey or working directory that contains a double quote '\"'. -- CRAY - fix for handling memory requests from user for an allocation. -- Add support for switches parameter to the job_submit/lua plugin. Work by Par Andersson, NSC. -- Fix to job preemption logic to preempt multiple jobs at the same time. -- Fix minor issue where uid and gid were switched in sview for submitting batch jobs. -- Fix possible illegal memory reference in slurmctld for job step with relative option. Work by Matthieu Hautreux (CEA). -- Reset priority of system held jobs when dependency is satisfied. Work by Don Lipari, LLNL. * Changes in SLURM 2.4.0.rc1 ============================= -- Improve task binding logic by making fuller use of HWLOC library, especially with respect to Opteron 6000 series processors. Work contributed by Komoto Masahiro. -- Add new configuration parameter PriorityFlags, based upon work by Carles Fenoy (Barcelona Supercomputer Center). -- Modify the step completion RPC between slurmd and slurmstepd in order to eliminate a possible deadlock. Based on work by Matthieu Hautreux, CEA. -- Change the owner of slurmctld and slurmdbd log files to the appropriate user. Without this change the files will be created by and owned by the user starting the daemons (likely user root). -- Reorganize the slurmstepd logic in order to better support NFS and Kerberos credentials via the AUKS plugin. Work by Matthieu Hautreux, CEA. -- Fix bug in allocating GRES that are associated with specific CPUs. In some cases the code allocated first available GRES to job instead of allocating GRES accessible to the specific CPUs allocated to the job. -- spank: Add callbacks in slurmd: slurm_spank_slurmd_{init,exit} and job epilog/prolog: slurm_spank_job_{prolog,epilog} -- spank: Add spank_option_getopt() function to api -- Change resolution of switch wait time from minutes to seconds. -- Added CrpCPUMins to the output of sshare -l for those using hard limit accounting. Work contributed by Mark Nelson. -- Added mpi/pmi2 plugin for complete support of pmi2 including acquiring additional resources for newly launched tasks. Contributed by Hongjia Cao, NUDT. -- BGQ - fixed issue where if a user asked for a specific node count and more tasks than possible without overcommit the request would be allowed on more nodes than requested. -- Add support for new SchedulerParameters of bf_max_job_user, maximum number of jobs to attempt backfilling per user. Work by Bj�rn-Helge Mevik, University of Oslo. -- BLUEGENE - fixed issue where MaxNodes limit on a partition only limited larger than midplane jobs. -- Added cpu_run_min to the output of sshare --long. Work contributed by Mark Nelson. -- BGQ - allow regular users to resolve Rack-Midplane to AXYZ coords. -- Add sinfo output format option of "%R" for partition name without "*" appended for default partition. -- Cray - Add support for zero compute note resource allocation to run batch script on front-end node with no ALPS reservation. Useful for pre- or post- processing. -- Support for cyclic distribution of cpus in task/cgroup plugin from Martin Perry, Bull. -- GrpMEM limit for QOSes and associations added Patch from Bj�rn-Helge Mevik, University of Oslo. -- Various performance improvements for up to 500% higher throughput depending upon configuration. Work supported by the Oak Ridge National Laboratory Extreme Scale Systems Center. -- Added jobacct_gather/cgroup plugin. It is not advised to use this in production as it isn't currently complete and doesn't provide an equivalent substitution for jobacct_gather/linux yet. Work by Martin Perry, Bull.
