Most likely
http://bugs.schedmd.com/show_bug.cgi?id=858
On 05/21/15 13:59, Chris Read wrote:
Re: [slurm-dev] Slurm versions 14.11.7 and 15.08.0-0pre5 are now
available
Hooray for memory accounting!
Does this mean it will be possible to include memory usage in the
Fairshare calculation too?
Chris
On Thu, May 21, 2015 at 3:40 PM, Danny Auble <[email protected]
<mailto:[email protected]>> wrote:
Slurm version 14.11.7 is now available with quite a few bug fixes as
listed below.
A development tag for 15.08 (pre5) has also been made. It
represents the current state of Slurm development for the release
planned in August 2015 and is intended for development and test
purposes only. One notable enhancement included is the idea of
Trackable Resources (TRES) for accounting for cpu, memory, energy,
GRES, licenses, etc.
Both are available for download at
http://slurm.schedmd.com/download.html
Notable changes for these versions are these...
* Changes in Slurm 14.11.7
==========================
-- Initialize some variables used with the srun --no-alloc option
that may
cause random failures.
-- Add SchedulerParameters option of sched_min_interval that
controls the
minimum time interval between any job scheduling action. The
default value
is zero (disabled).
-- Change default SchedulerParameters=max_sched_time from 4
seconds to 2.
-- Refactor scancel so that all pending jobs are cancelled before
starting
cancellation of running jobs. Otherwise they happen in
parallel and the
pending jobs can be scheduled on resources as the running jobs
are being
cancelled.
-- ALPS - Add new cray.conf variable NoAPIDSignalOnKill. When set
to yes this
will make it so the slurmctld will not signal the apid's in a
batch job.
Instead it relies on the rpc coming from the slurmctld to kill
the job to
end things correctly.
-- ALPS - Have the slurmstepd running a batch job wait for an
ALPS release
before ending the job.
-- Initialize variables in consumable resource plugin to prevent
core dump.
-- Fix scancel bug which could return an error on attempt to
signal a job step.
-- In slurmctld communication agent, make the thread timeout be
the configured
value of MessageTimeout rather than 30 seconds.
-- sshare -U/--Users only flag was used uninitialized.
-- Cray systems, add "plugstack.conf.template" sample SPANK
configuration file.
-- BLUEGENE - Set DB2NOEXITLIST when starting the slurmctld
daemon to avoid
random crashing in db2 when the slurmctld is exiting.
-- Make full node reservations display correctly the core count
instead of
cpu count.
-- Preserve original errno on execve() failure in task plugin.
-- Add SLURM_JOB_NAME env variable to an salloc's environment.
-- Overwrite SLURM_JOB_NAME in an srun when it gets an allocation.
-- Make sure each job has a wckey if that is something that is
tracked.
-- Make sure old step data is cleared when job is requeued.
-- Load libtinfo as needed when building ncurses tools.
-- Fix small memory leak in backup controller.
-- Fix segfault when backup controller takes control for second time.
-- Cray - Fix backup controller running native Slurm.
-- Provide prototypes for init_setproctitle()/fini_setproctitle
on NetBSD.
-- Add configuration test to find out the full path to su command.
-- preempt/job_prio plugin: Fix for possible infinite loop when
identifying
preemptable jobs.
-- preempt/job_prio plugin: Implement the concept of Warm-up Time
here. Use
the QoS GraceTime as the amount of time to wait before preempting.
Basically, skip preemption if your time is not up.
-- Make srun wait KillWait time when a task is cancelled.
-- switch/cray: Revert logic added to 14.11.6 that set
"PMI_CRAY_NO_SMP_ENV=1"
if CR_PACK_NODES is configured.
-- Prevent users from setting job's partition to an invalid
partition.
* Changes in Slurm 15.08.0pre5
==============================
-- Add jobcomp/elasticsearch plugin. Libcurl is required for
build. Configure
the server as follows:
"JobCompLoc=http://YOUR_ELASTICSEARCH_SERVER:9200".
-- Scancel logic large re-written to better support job arrays.
-- Added a slurm.conf parameter PrologEpilogTimeout to control
how long
prolog/epilog can run.
-- Added TRES (Trackable resources) to track Mem, GRES, license, etc
utilization.
-- Add re-entrant versions of glibc time functions (e.g.
localtime) to Slurm
in order to eliminate rare deadlock of slurmstepd fork and
exec calls.
-- Constrain kernel memory (if available) in cgroups.
-- Add PrologFlags option of "Contain" to create a proctrack
container at
job resource allocation time.
-- Disable the OOM Killer in slurmd and slurmstepd's memory
cgroup when using
MemSpecLimit.