[slurm-dev] Slurm version 15.08.8 and 16.05.0-pre1 now available

jette Thu, 18 Feb 2016 15:43:33 -0800


Slurm version 15.08.8 is now available and includes about 30 bug fixes
developed over the past four weeks.

Slurm version 16.05.0-pre1 is also available and includes newdevelopment for

the next major release in May.

Slurm downloads are available from
http://www.schedmd.com/#repos

* Changes in Slurm 15.08.8
==========================

-- Backfill scheduling properly synchronized with Cray Node HealthCheck.

    Prior logic could result in highest priority job getting improperly
    postponed.
 -- Make it so daemons also support TopologyParam=NoInAddrAny.

-- If scancel is operating on large number of jobs and RPC responsesfromslurmctld daemon are slow then introduce a delay in sending thecancel job

    requests from scancel in order to reduce load on slurmctld.
 -- Remove redundant logic when updating a job's task count.

-- MySQL - Fix querying jobs with reservations when the id's haverolled.

 -- Perl - Fix use of uninitialized variable in slurm_job_step_get_pids.
 -- Launch batch job requsting --reboot after the boot completes.

-- Move debug messages like "not the right user" from associationmanager

    to debug3 when trying to find the correct association.
 -- Fix incorrect logic when querying assoc_mgr information.

-- Move debug messages to debug3 notifying a gres_bit_alloc was NULLfor

    gres types without a file.
 -- Sanity Check Patch to setup variables for RAPL if in a race for it.
 -- GRES - Fix minor typecast issues.

-- burst_buffer/cray - Increase size of intermediate variable used tostorebuffer byte size read from DW instance from 32 to 64-bits to avoidoverflow

    and reporting invalid buffer sizes.

-- Allow an existing reservation with running jobs to be modifiedwithout

    Flags=IGNORE_JOBS.

-- srun - don't attempt to execve() a directory with a name matchingthe

    requested command

-- Do not automatically relocate an advanced reservation for individualcoresthat spans multiple nodes when nodes in that reservation go down(e.g.a 1 core reservation on node "tux1" will be moved if node "tux1"goesdown, but a reservation containing 2 cores on node "tux1" and 3cores on"tux2" will not be moved node "tux1" goes down). Advancedreservations for

    whole nodes will be moved by default for down nodes.

-- Avoid possible double free of memory (and likely abort) forslurmctld in

    background mode.

-- contribs/cray/csm/slurmconfgen_smw.py - avoid including repurposedcompute

    nodes in configs.

-- Support AuthInfo in slurmdbd.conf that is different from the valuein

    slurm.conf.
 -- Fix build on FreeBSD 10.

-- Fix hdf5 build on ppc64 by using correct fprintf formatting fortypes.

 -- Fix cosmetic printing of NO_VALs in scontrol show assoc_mgr.
 -- Fix perl api for newer perl versions.

-- Fix for jobs requesting cpus-per-task (eg. -c3) that exceed thenumber of

    cpus on a core.
 -- Remove unneeded perl files from the .spec file.
 -- Flesh out filters for scontrol show assoc_mgr.

-- Add function to remove assoc_mgr_info_request_t members withoutfreeing

    structure.
 -- Fix build on some non-glibc systems by updating includes.

-- Add new PowerParameters options of get_timeout and set_timeout. Thedefaultset_timeout was increased from 5 seconds to 30 seconds. Also re-readcurrent

    power caps periodically or after any failed "set" operation.
 -- Fix slurmdbd segfault when listing users with blank user condition.
 -- Save the ClusterName to a file in SaveStateLocation, and use that to

verify the state directory belongs to the given cluster at startupto avoidcorruption from multiple clusters attempting to share a statedirectory.-- MYSQL - Fix issue when rerolling monthly data to work off correcttimeperiod. This would only hit you if you rerolled a 15.08 prior tothis

    commit.

-- If FastSchedule=0 is used make sure TRES are set up correctly inaccounting.

 -- Fix sreport's truncation of columns with large TRES and not using
    a parsing option.
 -- Make sure count of boards are restored when slurmctld has option -R.

-- When determine if a job can fit into a TRES time limit afterresourceshave been selected set the time limit appropriately if the jobdidn't

    request one.
 -- Fix inadequate locks when updating a partition's TRES.
 -- Add new assoc_limit_continue flag to SchedulerParameters.

-- Avoid race in acct_gather_energy_cray if energy requested beforeavailable.-- MYSQL - Avoid having multiple default accounts when a user is addedto

    a new account and making it a default all at once.

* Changes in Slurm 16.05.0pre1
===============================

-- Add sbatch "--wait" option that waits for job completion beforeexiting.

    Exit code will match that of spawned job.

-- Modify advanced reservation save/restore logic for core reservationstosupport configuration changes (changes in configured nodes or corescounts).-- Allow ControlMachine, BackupController, DbdHost and DbdBackupHost tobe

    either short or long hostname.

-- Job output and error files can now contain "%" character byspecifying

    a file name with two consecutive "%" characters. For example,

"sbatch -o "slurm.%%.%j" for job ID 123 will generate an output filenamed

    "slurm.%.123".
 -- Pass user name in Prolog RPC from controller to slurmd when using

PrologFlags=Alloc. Allows SLURM_JOB_USER env variable to be set whenusing

    Native Slurm on a Cray.
 -- Add "NumTasks" to job information visible to Slurm commands.

-- Add mail wrapper script "smail" that will include job statistics inemail

    notification messages.

-- Remove vestigial "SICP" job option (inter-cluster job option).Completely

    different logic will be forthcoming.

-- Fix case where the primary and backup dbds would both be performingrollup.-- Add an ack reply from slurmd to slurmstepd when job setup is doneand the

    job is ready to be executed.

-- Removed support for authd. authd has not been developed andsupported since

    several years.

-- Introduce a new parameter requeue_setup_env_fail inSchedulerParameters.A job that fails to setup the environment will be requeued and thenode

    drained.

-- Add ValidateTimeout and OtherTimeout to "scontrol show burst"output.

 -- Increase default sbcast buffer size from 512KB to 8MB.
 -- Enable the hdf5 profiling of the batch step.
 -- Eliminate redundant environment and script files for job arrays.
 -- Stop searching sbatch scripts for #PBS directives after 100 lines of

non-comments. Stop parsing #PBS or #SLURM directives after 1024characters

    into a line. Required for decent perforamnce with huge scripts.
 -- Add debug flag for timing Cray portions of the code.
 -- Remove all *.la files from RPMs.

-- Add Multi-Category Security (MCS) infrastructure to permit nodes tobe bound

    to specific users or groups.

-- Install the pmi2 unix sockets in slurmd spool directory instead of/tmp.-- Implement the getaddrinfo and getnameinfo instead of gethostbyaddrand

    gethostbyname.
 -- Finished PMIx implementation.
 -- Implemented the --without=package option for configure.

-- Fix sshare to show each individual cluster with -M,--clustersoption.-- Added --deadline option to salloc, sbatch and srun. Jobs which cannot becompleted by the user specified deadline will be terminated with astate of

    "Deadline" or "DL".

-- Implemented and documented PMIX protocol which is used to bootstrapan

    MPI job. PMIX is an alternative to PMI and PMI2.
 -- Change default CgroupMountpoint (in cgroup.conf) from "/cgroup" to
    "/sys/fs/cgroup" to match current standard.
 -- Add #BSUB options to sbatch to read in from the batch script.
 -- HDF: Change group name of node from nodename to nodeid.

-- The partition-specific SelectTypeParameters parameter can now beused to

    change the memory allocation tracking specification in the global

SelectTypeParameters configuration parameter. Supportedpartition-specificvalues are CR_Core, CR_Core_Memory, CR_Socket and CR_Socket_Memory.If theglobal SelectTypeParameters value includes memory allocationmanagement andthe partition-specific value does not, then memory allocationmanagement forthat partition will NOT be supported (i.e. memory can beover-allocated).Likewise the global SelectTypeParameters might not include memorymanagement

    while the partition-specific value does.

-- Burst buffer/cray - Add support for multiple buffer pools includingsupport

    for different resource granularity by pool.

-- Burst buffer advanced reservation units treated as bytes (perdocumentation)

    rather than GB.

-- Add an "scontrol top <jobid>" command to re-order the priorities ofa user'spending jobs. May be disabled with the "disable_user_top" option inthe

    SchedulerParameters configuration parameter.
 -- Modify sview to display negative job nice values.
 -- Increase job's nice value field from 16 to 32 bits.
 -- Remove deprecated job_submit/cnode plugin.

-- Enhance slurm.conf option EnforcePartLimit to include options like"ANY" and"ALL". "Any" is equivalent to "Yes" and "All" will check allpartitionsa job is submitted to and if any partition limit is violated the jobwill

    be rejected even if it could possibly run on another partition.
 -- Add "features_act" field (currently active features) to the node

information. Output of scontrol, sinfo, and sview changedaccordingly.The field previously displayed as "Features" is now"AvailableFeatures"

    while the new field is displayed as "ActiveFeatures".

-- Remove Sun Constellation, IBM Federation Switches (replaced by NRTswitch

    plugin) and long-defunct Quadrics Elan support.
 -- Add -M<clusters> option to sreport.
 -- Rework group caching to work better in environments with
    enumeration disabled. Removed CacheGroups config directive, group
    membership lists are now always cached, controlled by
    GroupUpdateTime parameter. GroupUpdateForce parameter default
    value changed to 1.
 -- Add reservation flag of "purge_comp" which will purge an advanced

reservation once it has no more active (pending, suspended orrunning) jobs.-- Add new configuration parameter "KNLPlugins" and plugininfrastructure.

 -- Add optional job "features" to node reboot RPC.

-- Add slurmd "-b" option to report node rebooted at daemon start time.Used

    for testing purposes.
 -- contribs/cray: Add framework for powering nodes up and down.
 -- For job constraint, convert comma separator to "&".
 -- Add Max*PerAccount options for QOS.
 -- Protect slurm_mutex_* calls with abort() on failure.

[slurm-dev] Slurm version 15.08.8 and 16.05.0-pre1 now available

Reply via email to