body p { margin-bottom: 0cm; margin-top: 0pt; } 
 On 02/19/2016 01:44 AM,
   [email protected] wrote:
   Slurm version 15.08.8 is now available and includes about 30 bug
   fixes
   
   developed over the past four weeks.
   Slurm version 16.05.0-pre1 is also available and includes new
   development for
   
   the next major release in May.
   Slurm downloads are available from
   
   http://www.schedmd.com/#repos
   * Changes in Slurm 15.08.8
   
   ==========================
   
    -- Backfill scheduling properly synchronized with Cray Node
   Health Check.
   
       Prior logic could result in highest priority job getting
   improperly
   
       postponed.
   
    -- Make it so daemons also support TopologyParam=NoInAddrAny.
   
    -- If scancel is operating on large number of jobs and RPC
   responses from
   
       slurmctld daemon are slow then introduce a delay in sending
   the cancel job
   
       requests from scancel in order to reduce load on slurmctld.
   
    -- Remove redundant logic when updating a job's task count.
   
    -- MySQL - Fix querying jobs with reservations when the id's have
   rolled.
   
    -- Perl - Fix use of uninitialized variable in
   slurm_job_step_get_pids.
   
    -- Launch batch job requsting --reboot after the boot completes.
   
    -- Move debug messages like "not the right user" from association
   manager
   
       to debug3 when trying to find the correct association.
   
    -- Fix incorrect logic when querying assoc_mgr information.
   
    -- Move debug messages to debug3 notifying a gres_bit_alloc was
   NULL for
   
       gres types without a file.
   
    -- Sanity Check Patch to setup variables for RAPL if in a race
   for it.
   
    -- GRES - Fix minor typecast issues.
   
    -- burst_buffer/cray - Increase size of intermediate variable
   used to store
   
       buffer byte size read from DW instance from 32 to 64-bits to
   avoid overflow
   
       and reporting invalid buffer sizes.
   
    -- Allow an existing reservation with running jobs to be modified
   without
   
       Flags=IGNORE_JOBS.
   
    -- srun - don't attempt to execve() a directory with a name
   matching the
   
       requested command
   
    -- Do not automatically relocate an advanced reservation for
   individual cores
   
       that spans multiple nodes when nodes in that reservation go
   down (e.g.
   
       a 1 core reservation on node "tux1" will be moved if node
   "tux1" goes
   
       down, but a reservation containing 2 cores on node "tux1" and
   3 cores on
   
       "tux2" will not be moved node "tux1" goes down). Advanced
   reservations for
   
       whole nodes will be moved by default for down nodes.
   
    -- Avoid possible double free of memory (and likely abort) for
   slurmctld in
   
       background mode.
   
    -- contribs/cray/csm/slurmconfgen_smw.py - avoid including
   repurposed compute
   
       nodes in configs.
   
    -- Support AuthInfo in slurmdbd.conf that is different from the
   value in
   
       slurm.conf.
   
    -- Fix build on FreeBSD 10.
   
    -- Fix hdf5 build on ppc64 by using correct fprintf formatting
   for types.
   
    -- Fix cosmetic printing of NO_VALs in scontrol show assoc_mgr.
   
    -- Fix perl api for newer perl versions.
   
    -- Fix for jobs requesting cpus-per-task (eg. -c3) that exceed
   the number of
   
       cpus on a core.
   
    -- Remove unneeded perl files from the .spec file.
   
    -- Flesh out filters for scontrol show assoc_mgr.
   
    -- Add function to remove assoc_mgr_info_request_t members
   without freeing
   
       structure.
   
    -- Fix build on some non-glibc systems by updating includes.
   
    -- Add new PowerParameters options of get_timeout and
   set_timeout. The default
   
       set_timeout was increased from 5 seconds to 30 seconds. Also
   re-read current
   
       power caps periodically or after any failed "set" operation.
   
    -- Fix slurmdbd segfault when listing users with blank user
   condition.
   
    -- Save the ClusterName to a file in SaveStateLocation, and use
   that to
   
       verify the state directory belongs to the given cluster at
   startup to avoid
   
       corruption from multiple clusters attempting to share a state
   directory.
   
    -- MYSQL - Fix issue when rerolling monthly data to work off
   correct time
   
       period.  This would only hit you if you rerolled a 15.08 prior
   to this
   
       commit.
   
    -- If FastSchedule=0 is used make sure TRES are set up correctly
   in accounting.
 Does that mean TRES will behave like memory, i.e., update from
 slurmd?
  -- Fix sreport's truncation of columns with large
   TRES and not using
   
       a parsing option.
   
    -- Make sure count of boards are restored when slurmctld has
   option -R.
   
    -- When determine if a job can fit into a TRES time limit after
   resources
   
       have been selected set the time limit appropriately if the job
   didn't
   
       request one.
   
    -- Fix inadequate locks when updating a partition's TRES.
   
    -- Add new assoc_limit_continue flag to SchedulerParameters.
 What does this new flag do? I don't see it in the online slurm.conf
 documentation yet.
  -- Avoid race in acct_gather_energy_cray if energy
   requested before available.
   
    -- MYSQL - Avoid having multiple default accounts when a user is
   added to
   
       a new account and making it a default all at once.
   * Changes in Slurm 16.05.0pre1
   
   ===============================
   
    -- Add sbatch "--wait" option that waits for job completion
   before exiting.
   
       Exit code will match that of spawned job.
   
    -- Modify advanced reservation save/restore logic for core
   reservations to
   
       support configuration changes (changes in configured nodes or
   cores counts).
   
    -- Allow ControlMachine, BackupController, DbdHost and
   DbdBackupHost to be
   
       either short or long hostname.
   
    -- Job output and error files can now contain "%" character by
   specifying
   
       a file name with two consecutive "%" characters. For example,
   
       "sbatch -o "slurm.%%.%j" for job ID 123 will generate an
   output file named
   
       "slurm.%.123".
   
    -- Pass user name in Prolog RPC from controller to slurmd when
   using
   
       PrologFlags=Alloc. Allows SLURM_JOB_USER env variable to be
   set when using
   
       Native Slurm on a Cray.
   
    -- Add "NumTasks" to job information visible to Slurm commands.
   
    -- Add mail wrapper script "smail" that will include job
   statistics in email
   
       notification messages.
   
    -- Remove vestigial "SICP" job option (inter-cluster job option).
   Completely
   
       different logic will be forthcoming.
   
    -- Fix case where the primary and backup dbds would both be
   performing rollup.
   
    -- Add an ack reply from slurmd to slurmstepd when job setup is
   done and the
   
       job is ready to be executed.
   
    -- Removed support for authd. authd has not been developed and
   supported since
   
       several years.
   
    -- Introduce a new parameter requeue_setup_env_fail in
   SchedulerParameters.
   
       A job that fails to setup the environment will be requeued and
   the node
   
       drained.
   
    -- Add ValidateTimeout and OtherTimeout to "scontrol show burst"
   output.
   
    -- Increase default sbcast buffer size from 512KB to 8MB.
   
    -- Enable the hdf5 profiling of the batch step.
   
    -- Eliminate redundant environment and script files for job
   arrays.
   
    -- Stop searching sbatch scripts for #PBS directives after 100
   lines of
   
       non-comments. Stop parsing #PBS or #SLURM directives after
   1024 characters
   
       into a line. Required for decent perforamnce with huge
   scripts.
   
    -- Add debug flag for timing Cray portions of the code.
   
    -- Remove all *.la files from RPMs.
   
    -- Add Multi-Category Security (MCS) infrastructure to permit
   nodes to be bound
   
       to specific users or groups.
   
    -- Install the pmi2 unix sockets in slurmd spool directory
   instead of /tmp.
   
    -- Implement the getaddrinfo and getnameinfo instead of
   gethostbyaddr and
   
       gethostbyname.
   
    -- Finished PMIx implementation.
   
    -- Implemented the --without=package option for configure.
   
    -- Fix sshare to show each individual cluster with -M,--clusters
   option.
   
    -- Added --deadline option to salloc, sbatch and srun. Jobs which
   can not be
   
       completed by the user specified deadline will be terminated
   with a state of
   
       "Deadline" or "DL".
   
    -- Implemented and documented PMIX protocol which is used to
   bootstrap an
   
       MPI job. PMIX is an alternative to PMI and PMI2.
   
    -- Change default CgroupMountpoint (in cgroup.conf) from
   "/cgroup" to
   
       "/sys/fs/cgroup" to match current standard.
   
    -- Add #BSUB options to sbatch to read in from the batch script.
   
    -- HDF: Change group name of node from nodename to nodeid.
   
    -- The partition-specific SelectTypeParameters parameter can now
   be used to
   
       change the memory allocation tracking specification in the
   global
   
       SelectTypeParameters configuration parameter. Supported
   partition-specific
   
       values are CR_Core, CR_Core_Memory, CR_Socket and
   CR_Socket_Memory. If the
   
       global SelectTypeParameters value includes memory allocation
   management and
   
       the partition-specific value does not, then memory allocation
   management for
   
       that partition will NOT be supported (i.e. memory can be
   over-allocated).
   
       Likewise the global SelectTypeParameters might not include
   memory management
   
       while the partition-specific value does.
   
    -- Burst buffer/cray - Add support for multiple buffer pools
   including support
   
       for different resource granularity by pool.
   
    -- Burst buffer advanced reservation units treated as bytes (per
   documentation)
   
       rather than GB.
   
    -- Add an "scontrol top <jobid>" command to re-order the
   priorities of a user's
   
       pending jobs. May be disabled with the "disable_user_top"
   option in the
   
       SchedulerParameters configuration parameter.
   
    -- Modify sview to display negative job nice values.
   
    -- Increase job's nice value field from 16 to 32 bits.
   
    -- Remove deprecated job_submit/cnode plugin.
   
    -- Enhance slurm.conf option EnforcePartLimit to include options
   like "ANY" and
   
       "ALL".  "Any" is equivalent to "Yes" and "All" will check all
   partitions
   
       a job is submitted to and if any partition limit is violated
   the job will
   
       be rejected even if it could possibly run on another
   partition.
   
    -- Add "features_act" field (currently active features) to the
   node
   
       information. Output of scontrol, sinfo, and sview changed
   accordingly.
   
       The field previously displayed as "Features" is now
   "AvailableFeatures"
   
       while the new field is displayed as "ActiveFeatures".
   
    -- Remove Sun Constellation, IBM Federation Switches (replaced by
   NRT switch
   
       plugin) and long-defunct Quadrics Elan support.
   
    -- Add -M<clusters> option to sreport.
   
    -- Rework group caching to work better in environments with
   
       enumeration disabled. Removed CacheGroups config directive,
   group
   
       membership lists are now always cached, controlled by
   
       GroupUpdateTime parameter. GroupUpdateForce parameter default
   
       value changed to 1.
   
    -- Add reservation flag of "purge_comp" which will purge an
   advanced
   
       reservation once it has no more active (pending, suspended or
   running) jobs.
   
    -- Add new configuration parameter "KNLPlugins" and plugin
   infrastructure.
   
    -- Add optional job "features" to node reboot RPC.
   
    -- Add slurmd "-b" option to report node rebooted at daemon start
   time. Used
   
       for testing purposes.
   
    -- contribs/cray: Add framework for powering nodes up and down.
   
    -- For job constraint, convert comma separator to "&".
   
    -- Add Max*PerAccount options for QOS.
   
    -- Protect slurm_mutex_* calls with abort() on failure.

Reply via email to