I see that the slurm burst buffer generic plugin has been removed in version 16 ? Does this mean that there will be no more development on this?
On 18 February 2016 at 23:45, <[email protected]> wrote: > > Slurm version 15.08.8 is now available and includes about 30 bug fixes > developed over the past four weeks. > > Slurm version 16.05.0-pre1 is also available and includes new development > for > the next major release in May. > > Slurm downloads are available from > http://www.schedmd.com/#repos > > * Changes in Slurm 15.08.8 > ========================== > -- Backfill scheduling properly synchronized with Cray Node Health Check. > Prior logic could result in highest priority job getting improperly > postponed. > -- Make it so daemons also support TopologyParam=NoInAddrAny. > -- If scancel is operating on large number of jobs and RPC responses from > slurmctld daemon are slow then introduce a delay in sending the cancel > job > requests from scancel in order to reduce load on slurmctld. > -- Remove redundant logic when updating a job's task count. > -- MySQL - Fix querying jobs with reservations when the id's have rolled. > -- Perl - Fix use of uninitialized variable in slurm_job_step_get_pids. > -- Launch batch job requsting --reboot after the boot completes. > -- Move debug messages like "not the right user" from association manager > to debug3 when trying to find the correct association. > -- Fix incorrect logic when querying assoc_mgr information. > -- Move debug messages to debug3 notifying a gres_bit_alloc was NULL for > gres types without a file. > -- Sanity Check Patch to setup variables for RAPL if in a race for it. > -- GRES - Fix minor typecast issues. > -- burst_buffer/cray - Increase size of intermediate variable used to > store > buffer byte size read from DW instance from 32 to 64-bits to avoid > overflow > and reporting invalid buffer sizes. > -- Allow an existing reservation with running jobs to be modified without > Flags=IGNORE_JOBS. > -- srun - don't attempt to execve() a directory with a name matching the > requested command > -- Do not automatically relocate an advanced reservation for individual > cores > that spans multiple nodes when nodes in that reservation go down (e.g. > a 1 core reservation on node "tux1" will be moved if node "tux1" goes > down, but a reservation containing 2 cores on node "tux1" and 3 cores > on > "tux2" will not be moved node "tux1" goes down). Advanced reservations > for > whole nodes will be moved by default for down nodes. > -- Avoid possible double free of memory (and likely abort) for slurmctld > in > background mode. > -- contribs/cray/csm/slurmconfgen_smw.py - avoid including repurposed > compute > nodes in configs. > -- Support AuthInfo in slurmdbd.conf that is different from the value in > slurm.conf. > -- Fix build on FreeBSD 10. > -- Fix hdf5 build on ppc64 by using correct fprintf formatting for types. > -- Fix cosmetic printing of NO_VALs in scontrol show assoc_mgr. > -- Fix perl api for newer perl versions. > -- Fix for jobs requesting cpus-per-task (eg. -c3) that exceed the number > of > cpus on a core. > -- Remove unneeded perl files from the .spec file. > -- Flesh out filters for scontrol show assoc_mgr. > -- Add function to remove assoc_mgr_info_request_t members without freeing > structure. > -- Fix build on some non-glibc systems by updating includes. > -- Add new PowerParameters options of get_timeout and set_timeout. The > default > set_timeout was increased from 5 seconds to 30 seconds. Also re-read > current > power caps periodically or after any failed "set" operation. > -- Fix slurmdbd segfault when listing users with blank user condition. > -- Save the ClusterName to a file in SaveStateLocation, and use that to > verify the state directory belongs to the given cluster at startup to > avoid > corruption from multiple clusters attempting to share a state > directory. > -- MYSQL - Fix issue when rerolling monthly data to work off correct time > period. This would only hit you if you rerolled a 15.08 prior to this > commit. > -- If FastSchedule=0 is used make sure TRES are set up correctly in > accounting. > -- Fix sreport's truncation of columns with large TRES and not using > a parsing option. > -- Make sure count of boards are restored when slurmctld has option -R. > -- When determine if a job can fit into a TRES time limit after resources > have been selected set the time limit appropriately if the job didn't > request one. > -- Fix inadequate locks when updating a partition's TRES. > -- Add new assoc_limit_continue flag to SchedulerParameters. > -- Avoid race in acct_gather_energy_cray if energy requested before > available. > -- MYSQL - Avoid having multiple default accounts when a user is added to > a new account and making it a default all at once. > > * Changes in Slurm 16.05.0pre1 > =============================== > -- Add sbatch "--wait" option that waits for job completion before > exiting. > Exit code will match that of spawned job. > -- Modify advanced reservation save/restore logic for core reservations to > support configuration changes (changes in configured nodes or cores > counts). > -- Allow ControlMachine, BackupController, DbdHost and DbdBackupHost to be > either short or long hostname. > -- Job output and error files can now contain "%" character by specifying > a file name with two consecutive "%" characters. For example, > "sbatch -o "slurm.%%.%j" for job ID 123 will generate an output file > named > "slurm.%.123". > -- Pass user name in Prolog RPC from controller to slurmd when using > PrologFlags=Alloc. Allows SLURM_JOB_USER env variable to be set when > using > Native Slurm on a Cray. > -- Add "NumTasks" to job information visible to Slurm commands. > -- Add mail wrapper script "smail" that will include job statistics in > email > notification messages. > -- Remove vestigial "SICP" job option (inter-cluster job option). > Completely > different logic will be forthcoming. > -- Fix case where the primary and backup dbds would both be performing > rollup. > -- Add an ack reply from slurmd to slurmstepd when job setup is done and > the > job is ready to be executed. > -- Removed support for authd. authd has not been developed and supported > since > several years. > -- Introduce a new parameter requeue_setup_env_fail in > SchedulerParameters. > A job that fails to setup the environment will be requeued and the node > drained. > -- Add ValidateTimeout and OtherTimeout to "scontrol show burst" output. > -- Increase default sbcast buffer size from 512KB to 8MB. > -- Enable the hdf5 profiling of the batch step. > -- Eliminate redundant environment and script files for job arrays. > -- Stop searching sbatch scripts for #PBS directives after 100 lines of > non-comments. Stop parsing #PBS or #SLURM directives after 1024 > characters > into a line. Required for decent perforamnce with huge scripts. > -- Add debug flag for timing Cray portions of the code. > -- Remove all *.la files from RPMs. > -- Add Multi-Category Security (MCS) infrastructure to permit nodes to be > bound > to specific users or groups. > -- Install the pmi2 unix sockets in slurmd spool directory instead of > /tmp. > -- Implement the getaddrinfo and getnameinfo instead of gethostbyaddr and > gethostbyname. > -- Finished PMIx implementation. > -- Implemented the --without=package option for configure. > -- Fix sshare to show each individual cluster with -M,--clusters option. > -- Added --deadline option to salloc, sbatch and srun. Jobs which can not > be > completed by the user specified deadline will be terminated with a > state of > "Deadline" or "DL". > -- Implemented and documented PMIX protocol which is used to bootstrap an > MPI job. PMIX is an alternative to PMI and PMI2. > -- Change default CgroupMountpoint (in cgroup.conf) from "/cgroup" to > "/sys/fs/cgroup" to match current standard. > -- Add #BSUB options to sbatch to read in from the batch script. > -- HDF: Change group name of node from nodename to nodeid. > -- The partition-specific SelectTypeParameters parameter can now be used > to > change the memory allocation tracking specification in the global > SelectTypeParameters configuration parameter. Supported > partition-specific > values are CR_Core, CR_Core_Memory, CR_Socket and CR_Socket_Memory. If > the > global SelectTypeParameters value includes memory allocation > management and > the partition-specific value does not, then memory allocation > management for > that partition will NOT be supported (i.e. memory can be > over-allocated). > Likewise the global SelectTypeParameters might not include memory > management > while the partition-specific value does. > -- Burst buffer/cray - Add support for multiple buffer pools including > support > for different resource granularity by pool. > -- Burst buffer advanced reservation units treated as bytes (per > documentation) > rather than GB. > -- Add an "scontrol top <jobid>" command to re-order the priorities of a > user's > pending jobs. May be disabled with the "disable_user_top" option in the > SchedulerParameters configuration parameter. > -- Modify sview to display negative job nice values. > -- Increase job's nice value field from 16 to 32 bits. > -- Remove deprecated job_submit/cnode plugin. > -- Enhance slurm.conf option EnforcePartLimit to include options like > "ANY" and > "ALL". "Any" is equivalent to "Yes" and "All" will check all > partitions > a job is submitted to and if any partition limit is violated the job > will > be rejected even if it could possibly run on another partition. > -- Add "features_act" field (currently active features) to the node > information. Output of scontrol, sinfo, and sview changed accordingly. > The field previously displayed as "Features" is now "AvailableFeatures" > while the new field is displayed as "ActiveFeatures". > -- Remove Sun Constellation, IBM Federation Switches (replaced by NRT > switch > plugin) and long-defunct Quadrics Elan support. > -- Add -M<clusters> option to sreport. > -- Rework group caching to work better in environments with > enumeration disabled. Removed CacheGroups config directive, group > membership lists are now always cached, controlled by > GroupUpdateTime parameter. GroupUpdateForce parameter default > value changed to 1. > -- Add reservation flag of "purge_comp" which will purge an advanced > reservation once it has no more active (pending, suspended or running) > jobs. > -- Add new configuration parameter "KNLPlugins" and plugin infrastructure. > -- Add optional job "features" to node reboot RPC. > -- Add slurmd "-b" option to report node rebooted at daemon start time. > Used > for testing purposes. > -- contribs/cray: Add framework for powering nodes up and down. > -- For job constraint, convert comma separator to "&". > -- Add Max*PerAccount options for QOS. > -- Protect slurm_mutex_* calls with abort() on failure. >
