Hooray for memory accounting! Does this mean it will be possible to include memory usage in the Fairshare calculation too?
Chris On Thu, May 21, 2015 at 3:40 PM, Danny Auble <[email protected]> wrote: > > Slurm version 14.11.7 is now available with quite a few bug fixes as > listed below. > > A development tag for 15.08 (pre5) has also been made. It represents the > current state of Slurm development for the release planned in August 2015 > and is intended for development and test purposes only. One notable > enhancement included is the idea of Trackable Resources (TRES) for > accounting for cpu, memory, energy, GRES, licenses, etc. > > Both are available for download at > http://slurm.schedmd.com/download.html > > Notable changes for these versions are these... > > * Changes in Slurm 14.11.7 > ========================== > -- Initialize some variables used with the srun --no-alloc option that may > cause random failures. > -- Add SchedulerParameters option of sched_min_interval that controls the > minimum time interval between any job scheduling action. The default > value > is zero (disabled). > -- Change default SchedulerParameters=max_sched_time from 4 seconds to 2. > -- Refactor scancel so that all pending jobs are cancelled before starting > cancellation of running jobs. Otherwise they happen in parallel and the > pending jobs can be scheduled on resources as the running jobs are > being > cancelled. > -- ALPS - Add new cray.conf variable NoAPIDSignalOnKill. When set to yes > this > will make it so the slurmctld will not signal the apid's in a batch > job. > Instead it relies on the rpc coming from the slurmctld to kill the job > to > end things correctly. > -- ALPS - Have the slurmstepd running a batch job wait for an ALPS release > before ending the job. > -- Initialize variables in consumable resource plugin to prevent core > dump. > -- Fix scancel bug which could return an error on attempt to signal a job > step. > -- In slurmctld communication agent, make the thread timeout be the > configured > value of MessageTimeout rather than 30 seconds. > -- sshare -U/--Users only flag was used uninitialized. > -- Cray systems, add "plugstack.conf.template" sample SPANK configuration > file. > -- BLUEGENE - Set DB2NOEXITLIST when starting the slurmctld daemon to > avoid > random crashing in db2 when the slurmctld is exiting. > -- Make full node reservations display correctly the core count instead of > cpu count. > -- Preserve original errno on execve() failure in task plugin. > -- Add SLURM_JOB_NAME env variable to an salloc's environment. > -- Overwrite SLURM_JOB_NAME in an srun when it gets an allocation. > -- Make sure each job has a wckey if that is something that is tracked. > -- Make sure old step data is cleared when job is requeued. > -- Load libtinfo as needed when building ncurses tools. > -- Fix small memory leak in backup controller. > -- Fix segfault when backup controller takes control for second time. > -- Cray - Fix backup controller running native Slurm. > -- Provide prototypes for init_setproctitle()/fini_setproctitle on NetBSD. > -- Add configuration test to find out the full path to su command. > -- preempt/job_prio plugin: Fix for possible infinite loop when > identifying > preemptable jobs. > -- preempt/job_prio plugin: Implement the concept of Warm-up Time here. > Use > the QoS GraceTime as the amount of time to wait before preempting. > Basically, skip preemption if your time is not up. > -- Make srun wait KillWait time when a task is cancelled. > -- switch/cray: Revert logic added to 14.11.6 that set > "PMI_CRAY_NO_SMP_ENV=1" > if CR_PACK_NODES is configured. > -- Prevent users from setting job's partition to an invalid partition. > > * Changes in Slurm 15.08.0pre5 > ============================== > -- Add jobcomp/elasticsearch plugin. Libcurl is required for build. > Configure > the server as follows: "JobCompLoc= > http://YOUR_ELASTICSEARCH_SERVER:9200". > -- Scancel logic large re-written to better support job arrays. > -- Added a slurm.conf parameter PrologEpilogTimeout to control how long > prolog/epilog can run. > -- Added TRES (Trackable resources) to track Mem, GRES, license, etc > utilization. > -- Add re-entrant versions of glibc time functions (e.g. localtime) to > Slurm > in order to eliminate rare deadlock of slurmstepd fork and exec calls. > -- Constrain kernel memory (if available) in cgroups. > -- Add PrologFlags option of "Contain" to create a proctrack container at > job resource allocation time. > -- Disable the OOM Killer in slurmd and slurmstepd's memory cgroup when > using > MemSpecLimit. >
