[slurm-dev] Re: Slurm version 15.08.8 and 16.05.0-pre1 now available

Wojciech Turek Fri, 19 Feb 2016 02:08:19 -0800

I see that the slurm burst buffer generic plugin has been removed in
version 16 ? Does this mean that there will be no more development on this?


On 18 February 2016 at 23:45, <[email protected]> wrote:

>
> Slurm version 15.08.8 is now available and includes about 30 bug fixes
> developed over the past four weeks.
>
> Slurm version 16.05.0-pre1 is also available and includes new development
> for
> the next major release in May.
>
> Slurm downloads are available from
> http://www.schedmd.com/#repos
>
> * Changes in Slurm 15.08.8
> ==========================
>  -- Backfill scheduling properly synchronized with Cray Node Health Check.
>     Prior logic could result in highest priority job getting improperly
>     postponed.
>  -- Make it so daemons also support TopologyParam=NoInAddrAny.
>  -- If scancel is operating on large number of jobs and RPC responses from
>     slurmctld daemon are slow then introduce a delay in sending the cancel
> job
>     requests from scancel in order to reduce load on slurmctld.
>  -- Remove redundant logic when updating a job's task count.
>  -- MySQL - Fix querying jobs with reservations when the id's have rolled.
>  -- Perl - Fix use of uninitialized variable in slurm_job_step_get_pids.
>  -- Launch batch job requsting --reboot after the boot completes.
>  -- Move debug messages like "not the right user" from association manager
>     to debug3 when trying to find the correct association.
>  -- Fix incorrect logic when querying assoc_mgr information.
>  -- Move debug messages to debug3 notifying a gres_bit_alloc was NULL for
>     gres types without a file.
>  -- Sanity Check Patch to setup variables for RAPL if in a race for it.
>  -- GRES - Fix minor typecast issues.
>  -- burst_buffer/cray - Increase size of intermediate variable used to
> store
>     buffer byte size read from DW instance from 32 to 64-bits to avoid
> overflow
>     and reporting invalid buffer sizes.
>  -- Allow an existing reservation with running jobs to be modified without
>     Flags=IGNORE_JOBS.
>  -- srun - don't attempt to execve() a directory with a name matching the
>     requested command
>  -- Do not automatically relocate an advanced reservation for individual
> cores
>     that spans multiple nodes when nodes in that reservation go down (e.g.
>     a 1 core reservation on node "tux1" will be moved if node "tux1" goes
>     down, but a reservation containing 2 cores on node "tux1" and 3 cores
> on
>     "tux2" will not be moved node "tux1" goes down). Advanced reservations
> for
>     whole nodes will be moved by default for down nodes.
>  -- Avoid possible double free of memory (and likely abort) for slurmctld
> in
>     background mode.
>  -- contribs/cray/csm/slurmconfgen_smw.py - avoid including repurposed
> compute
>     nodes in configs.
>  -- Support AuthInfo in slurmdbd.conf that is different from the value in
>     slurm.conf.
>  -- Fix build on FreeBSD 10.
>  -- Fix hdf5 build on ppc64 by using correct fprintf formatting for types.
>  -- Fix cosmetic printing of NO_VALs in scontrol show assoc_mgr.
>  -- Fix perl api for newer perl versions.
>  -- Fix for jobs requesting cpus-per-task (eg. -c3) that exceed the number
> of
>     cpus on a core.
>  -- Remove unneeded perl files from the .spec file.
>  -- Flesh out filters for scontrol show assoc_mgr.
>  -- Add function to remove assoc_mgr_info_request_t members without freeing
>     structure.
>  -- Fix build on some non-glibc systems by updating includes.
>  -- Add new PowerParameters options of get_timeout and set_timeout. The
> default
>     set_timeout was increased from 5 seconds to 30 seconds. Also re-read
> current
>     power caps periodically or after any failed "set" operation.
>  -- Fix slurmdbd segfault when listing users with blank user condition.
>  -- Save the ClusterName to a file in SaveStateLocation, and use that to
>     verify the state directory belongs to the given cluster at startup to
> avoid
>     corruption from multiple clusters attempting to share a state
> directory.
>  -- MYSQL - Fix issue when rerolling monthly data to work off correct time
>     period.  This would only hit you if you rerolled a 15.08 prior to this
>     commit.
>  -- If FastSchedule=0 is used make sure TRES are set up correctly in
> accounting.
>  -- Fix sreport's truncation of columns with large TRES and not using
>     a parsing option.
>  -- Make sure count of boards are restored when slurmctld has option -R.
>  -- When determine if a job can fit into a TRES time limit after resources
>     have been selected set the time limit appropriately if the job didn't
>     request one.
>  -- Fix inadequate locks when updating a partition's TRES.
>  -- Add new assoc_limit_continue flag to SchedulerParameters.
>  -- Avoid race in acct_gather_energy_cray if energy requested before
> available.
>  -- MYSQL - Avoid having multiple default accounts when a user is added to
>     a new account and making it a default all at once.
>
> * Changes in Slurm 16.05.0pre1
> ===============================
>  -- Add sbatch "--wait" option that waits for job completion before
> exiting.
>     Exit code will match that of spawned job.
>  -- Modify advanced reservation save/restore logic for core reservations to
>     support configuration changes (changes in configured nodes or cores
> counts).
>  -- Allow ControlMachine, BackupController, DbdHost and DbdBackupHost to be
>     either short or long hostname.
>  -- Job output and error files can now contain "%" character by specifying
>     a file name with two consecutive "%" characters. For example,
>     "sbatch -o "slurm.%%.%j" for job ID 123 will generate an output file
> named
>     "slurm.%.123".
>  -- Pass user name in Prolog RPC from controller to slurmd when using
>     PrologFlags=Alloc. Allows SLURM_JOB_USER env variable to be set when
> using
>     Native Slurm on a Cray.
>  -- Add "NumTasks" to job information visible to Slurm commands.
>  -- Add mail wrapper script "smail" that will include job statistics in
> email
>     notification messages.
>  -- Remove vestigial "SICP" job option (inter-cluster job option).
> Completely
>     different logic will be forthcoming.
>  -- Fix case where the primary and backup dbds would both be performing
> rollup.
>  -- Add an ack reply from slurmd to slurmstepd when job setup is done and
> the
>     job is ready to be executed.
>  -- Removed support for authd. authd has not been developed and supported
> since
>     several years.
>  -- Introduce a new parameter requeue_setup_env_fail in
> SchedulerParameters.
>     A job that fails to setup the environment will be requeued and the node
>     drained.
>  -- Add ValidateTimeout and OtherTimeout to "scontrol show burst" output.
>  -- Increase default sbcast buffer size from 512KB to 8MB.
>  -- Enable the hdf5 profiling of the batch step.
>  -- Eliminate redundant environment and script files for job arrays.
>  -- Stop searching sbatch scripts for #PBS directives after 100 lines of
>     non-comments. Stop parsing #PBS or #SLURM directives after 1024
> characters
>     into a line. Required for decent perforamnce with huge scripts.
>  -- Add debug flag for timing Cray portions of the code.
>  -- Remove all *.la files from RPMs.
>  -- Add Multi-Category Security (MCS) infrastructure to permit nodes to be
> bound
>     to specific users or groups.
>  -- Install the pmi2 unix sockets in slurmd spool directory instead of
> /tmp.
>  -- Implement the getaddrinfo and getnameinfo instead of gethostbyaddr and
>     gethostbyname.
>  -- Finished PMIx implementation.
>  -- Implemented the --without=package option for configure.
>  -- Fix sshare to show each individual cluster with -M,--clusters option.
>  -- Added --deadline option to salloc, sbatch and srun. Jobs which can not
> be
>     completed by the user specified deadline will be terminated with a
> state of
>     "Deadline" or "DL".
>  -- Implemented and documented PMIX protocol which is used to bootstrap an
>     MPI job. PMIX is an alternative to PMI and PMI2.
>  -- Change default CgroupMountpoint (in cgroup.conf) from "/cgroup" to
>     "/sys/fs/cgroup" to match current standard.
>  -- Add #BSUB options to sbatch to read in from the batch script.
>  -- HDF: Change group name of node from nodename to nodeid.
>  -- The partition-specific SelectTypeParameters parameter can now be used
> to
>     change the memory allocation tracking specification in the global
>     SelectTypeParameters configuration parameter. Supported
> partition-specific
>     values are CR_Core, CR_Core_Memory, CR_Socket and CR_Socket_Memory. If
> the
>     global SelectTypeParameters value includes memory allocation
> management and
>     the partition-specific value does not, then memory allocation
> management for
>     that partition will NOT be supported (i.e. memory can be
> over-allocated).
>     Likewise the global SelectTypeParameters might not include memory
> management
>     while the partition-specific value does.
>  -- Burst buffer/cray - Add support for multiple buffer pools including
> support
>     for different resource granularity by pool.
>  -- Burst buffer advanced reservation units treated as bytes (per
> documentation)
>     rather than GB.
>  -- Add an "scontrol top <jobid>" command to re-order the priorities of a
> user's
>     pending jobs. May be disabled with the "disable_user_top" option in the
>     SchedulerParameters configuration parameter.
>  -- Modify sview to display negative job nice values.
>  -- Increase job's nice value field from 16 to 32 bits.
>  -- Remove deprecated job_submit/cnode plugin.
>  -- Enhance slurm.conf option EnforcePartLimit to include options like
> "ANY" and
>     "ALL".  "Any" is equivalent to "Yes" and "All" will check all
> partitions
>     a job is submitted to and if any partition limit is violated the job
> will
>     be rejected even if it could possibly run on another partition.
>  -- Add "features_act" field (currently active features) to the node
>     information. Output of scontrol, sinfo, and sview changed accordingly.
>     The field previously displayed as "Features" is now "AvailableFeatures"
>     while the new field is displayed as "ActiveFeatures".
>  -- Remove Sun Constellation, IBM Federation Switches (replaced by NRT
> switch
>     plugin) and long-defunct Quadrics Elan support.
>  -- Add -M<clusters> option to sreport.
>  -- Rework group caching to work better in environments with
>     enumeration disabled. Removed CacheGroups config directive, group
>     membership lists are now always cached, controlled by
>     GroupUpdateTime parameter. GroupUpdateForce parameter default
>     value changed to 1.
>  -- Add reservation flag of "purge_comp" which will purge an advanced
>     reservation once it has no more active (pending, suspended or running)
> jobs.
>  -- Add new configuration parameter "KNLPlugins" and plugin infrastructure.
>  -- Add optional job "features" to node reboot RPC.
>  -- Add slurmd "-b" option to report node rebooted at daemon start time.
> Used
>     for testing purposes.
>  -- contribs/cray: Add framework for powering nodes up and down.
>  -- For job constraint, convert comma separator to "&".
>  -- Add Max*PerAccount options for QOS.
>  -- Protect slurm_mutex_* calls with abort() on failure.
>

[slurm-dev] Re: Slurm version 15.08.8 and 16.05.0-pre1 now available

Reply via email to