Most likely

http://bugs.schedmd.com/show_bug.cgi?id=858


On 05/21/15 13:59, Chris Read wrote:
Re: [slurm-dev] Slurm versions 14.11.7 and 15.08.0-0pre5 are now available
Hooray for memory accounting!

Does this mean it will be possible to include memory usage in the Fairshare calculation too?

Chris

On Thu, May 21, 2015 at 3:40 PM, Danny Auble <[email protected] <mailto:[email protected]>> wrote:


    Slurm version 14.11.7 is now available with quite a few bug fixes as
    listed below.

    A development tag for 15.08 (pre5) has also been made.  It
    represents the current state of Slurm development for the release
    planned in August 2015 and is intended for development and test
    purposes only.  One notable enhancement included is the idea of
    Trackable Resources (TRES) for accounting for cpu, memory, energy,
    GRES, licenses, etc.

    Both are available for download at
    http://slurm.schedmd.com/download.html

    Notable changes for these versions are these...

    * Changes in Slurm 14.11.7
    ==========================
     -- Initialize some variables used with the srun --no-alloc option
    that may
        cause random failures.
     -- Add SchedulerParameters option of sched_min_interval that
    controls the
        minimum time interval between any job scheduling action. The
    default value
        is zero (disabled).
     -- Change default SchedulerParameters=max_sched_time from 4
    seconds to 2.
     -- Refactor scancel so that all pending jobs are cancelled before
    starting
        cancellation of running jobs. Otherwise they happen in
    parallel and the
        pending jobs can be scheduled on resources as the running jobs
    are being
        cancelled.
     -- ALPS - Add new cray.conf variable NoAPIDSignalOnKill. When set
    to yes this
        will make it so the slurmctld will not signal the apid's in a
    batch job.
        Instead it relies on the rpc coming from the slurmctld to kill
    the job to
        end things correctly.
     -- ALPS - Have the slurmstepd running a batch job wait for an
    ALPS release
        before ending the job.
     -- Initialize variables in consumable resource plugin to prevent
    core dump.
     -- Fix scancel bug which could return an error on attempt to
    signal a job step.
     -- In slurmctld communication agent, make the thread timeout be
    the configured
        value of MessageTimeout rather than 30 seconds.
     -- sshare -U/--Users only flag was used uninitialized.
     -- Cray systems, add "plugstack.conf.template" sample SPANK
    configuration file.
     -- BLUEGENE - Set DB2NOEXITLIST when starting the slurmctld
    daemon to avoid
        random crashing in db2 when the slurmctld is exiting.
     -- Make full node reservations display correctly the core count
    instead of
        cpu count.
     -- Preserve original errno on execve() failure in task plugin.
     -- Add SLURM_JOB_NAME env variable to an salloc's environment.
     -- Overwrite SLURM_JOB_NAME in an srun when it gets an allocation.
     -- Make sure each job has a wckey if that is something that is
    tracked.
     -- Make sure old step data is cleared when job is requeued.
     -- Load libtinfo as needed when building ncurses tools.
     -- Fix small memory leak in backup controller.
     -- Fix segfault when backup controller takes control for second time.
     -- Cray - Fix backup controller running native Slurm.
     -- Provide prototypes for init_setproctitle()/fini_setproctitle
    on NetBSD.
     -- Add configuration test to find out the full path to su command.
     -- preempt/job_prio plugin: Fix for possible infinite loop when
    identifying
        preemptable jobs.
     -- preempt/job_prio plugin: Implement the concept of Warm-up Time
    here. Use
        the QoS GraceTime as the amount of time to wait before preempting.
        Basically, skip preemption if your time is not up.
     -- Make srun wait KillWait time when a task is cancelled.
     -- switch/cray: Revert logic added to 14.11.6 that set
    "PMI_CRAY_NO_SMP_ENV=1"
        if CR_PACK_NODES is configured.
     -- Prevent users from setting job's partition to an invalid
    partition.

    * Changes in Slurm 15.08.0pre5
    ==============================
     -- Add jobcomp/elasticsearch plugin. Libcurl is required for
    build. Configure
        the server as follows:
    "JobCompLoc=http://YOUR_ELASTICSEARCH_SERVER:9200";.
     -- Scancel logic large re-written to better support job arrays.
     -- Added a slurm.conf parameter PrologEpilogTimeout to control
    how long
        prolog/epilog can run.
     -- Added TRES (Trackable resources) to track Mem, GRES, license, etc
        utilization.
     -- Add re-entrant versions of glibc time functions (e.g.
    localtime) to Slurm
        in order to eliminate rare deadlock of slurmstepd fork and
    exec calls.
     -- Constrain kernel memory (if available) in cgroups.
     -- Add PrologFlags option of "Contain" to create a proctrack
    container at
        job resource allocation time.
     -- Disable the OOM Killer in slurmd and slurmstepd's memory
    cgroup when using
        MemSpecLimit.



Reply via email to