[slurm-dev] Re: Slurm versions 15.08.0 and 14.11.9 have been released!

E V Tue, 01 Sep 2015 13:07:37 -0700

C++ style comments in slurm.h break the slurm-drmaa configure, trivial
patch and it builds. Now I need to see if it works...


--- slurm-15.08.0/slurm/slurm.h     2015-09-01 14:31:23.589431892 -0400
+++ include/slurm/slurm.h     2015-09-01 15:55:38.125485196 -0400
@@ -2971,7 +2971,7 @@
        unsigned char ip_dst[16];
        uint32_t port_src;
        uint32_t port_dst;
-       int32_t af;     // NOTE: un/packed as uint32_t
+       int32_t af;     /* NOTE: un/packed as uint32_t */
 } network_callerid_msg_t;

 /*****************************************************************************\

On Mon, Aug 31, 2015 at 8:26 PM, Danny Auble <[email protected]> wrote:
>
> Greetings everyone.
>
> We are pleased to announce the release of 15.08.0!  It contains many new
> features and performance enhancements.  Please read the RELEASE_NOTES file
> to get an idea of the new items that have been added.  The on-line Slurm
> documentation has been updated to reflect this release.  Thanks to everyone
> that helped in this release.
>
> Some notable changes are listed here.
>
>  -- Added TRES (Trackable resources) to track utilization of memory, GRES,
>     burst buffer, license, and any other configurable resources in the
>     accounting database.
>  -- Add configurable billing weight that takes into consideration any TRES
> when
>     calculating a job's resource utilization.
>  -- Add configurable prioritization factor that takes into consideration any
>     TRES when calculating a job's resource utilization.
>  -- Add burst buffer support infrastructure. Currently available plugin
> include
>     burst_buffer/generic (uses administrator supplied programs to manage
> file
>     staging) and burst_buffer/cray (uses Cray APIs to manage buffers).
>  -- Add power capping support for Cray systems with automatic rebalancing of
>     power allocation between nodes.
>  -- Modify slurmctld outgoing RPC logic to support more parallel tasks (up
> to
>     85 RPCs and 256 pthreads; the old logic supported up to 21 RPCs and 256
>     threads).
>  -- Add support for job dependencies joined with OR operator (e.g.
>     "--depend=afterok:123?afternotok:124").
>  -- Add advance reservation flag of "replace" that causes allocated
> resources
>     to be replaced with idle resources. This maintains a pool of available
>     resources that maintains a constant size (to the extent possible).
>  -- Permit PreemptType=qos and PreemptMode=suspend,gang to be used together.
>     A high-priority QOS job will now oversubscribe resources and gang
> schedule,
>     but only if there are insufficient resources for the job to be started
>     without preemption. NOTE: That with PreemptType=qos, the partition's
>     Shared=FORCE:# configuration option will permit one job more per
> resource
>     to be run than than specified, but only if started by preemption.
>  -- A partition can now have an associated QOS.  This will allow a partition
>     to have all the limits a QOS has.  If a limit is set in both QOS
>     the partition QOS will override the job's QOS unless the job's QOS has
> the
>     'OverPartQOS' flag set.
>  -- Expanded --cpu-freq parameters to include min-max:governor
> specifications.
>     --cpu-freq now supported on salloc and sbatch.
>  -- Add support for optimized job allocations with respect to SGI Hypercube
>     topology.
>     NOTE: Only supported with select/linear plugin.
>     NOTE: The program contribs/sgi/netloc_to_topology can be used to build
>     Slurm's topology.conf file.
>  -- Add the ability for a compute node to be allocated to multiple jobs, but
>     restricted to a single user. Added "--exclusive=user" option to salloc,
>     the scontrol and sview commands. Added new partition configuration
> parameter
>     "ExclusiveUser=yes|no".
>  -- Verify that all plugin version numbers are identical to the component
>     attempting to load them. Without this verification, the plugin can
> reference
>     Slurm functions in the caller which differ (e.g. the underlying
> function's
>     arguments could have changed between Slurm versions).
>     NOTE: All plugins (except SPANK) must be built against the identical
>     version of Slurm in order to be used by any Slurm command or daemon.
> This
>     should eliminate some very difficult to diagnose problems due to use of
> old
>     plugins.
>  -- Optimize resource allocation for systems with dragonfly networks.
>  -- Added plugin to record job completion information using Elasticsearch.
>     Libcurl is required for build. Configure slurm.conf as follows
>     JobCompType=jobcomp/elasticsearch
>     JobCompLoc=http://YOUR_ELASTICSEARCH_SERVER:9200
>  -- DATABASE SCHEME HAS CHANGED.  WHEN UPDATING THE MIGRATION PROCESS MAY
> TAKE
>     SOME AMOUNT OF TIME DEPENDING ON HOW LARGE YOUR DATABASE IS. WHILE
> UPDATING
>     NO RECORDS WILL BE LOST, BUT THE SLURMDBD MAY NOT BE RESPONSIVE DURING
> THE
>     UPDATE. IT WILL ALSO NOT BE POSSIBLE TO AUTOMATICALLY REVERT THE
> DATABASE
>     TO THE FORMAT FOR AN EARLIER VERSION OF SLURM. PLAN ACCORDINGLY.
>  -- The performance of Profiling with HDF5 is improved. In addition,
> internal
>     structures are changed to make it easier to add new profile types,
>     particularly energy sensors. This has introduced an operational issue.
> See
>     OTHER CHANGES.
>  -- MPI/MVAPICH plugin now requires Munge for authentication.
>  -- In order to support inter-cluster job dependencies, the MaxJobID
>     configuration parameter default value has been reduced from
> 4,294,901,760
>     to 2,147,418,112 and it's maximum value is now 2,147,463,647.
>     ANY JOBS WITH A JOB ID ABOVE 2,147,463,647 WILL BE PURGED WHEN SLURM IS
>     UPGRADED FROM AN OLDER VERSION!
>
>
> We have also release one of the last tags of 14.11 in the form of 14.11.9.
>
> Changes are listed here
>
>  -- Correct "sdiag" backfill cycle time calculation if it yields locks. A
>     microsecond value was being treated as a second value resulting in an
>     overflow in the calcuation.
>  -- Fix segfault when updating timelimit on jobarray task.
>  -- Fix to job array update logic that can result in a task ID of
> 4294967294.
>  -- Fix of job array update, previous logic could fail to update some tasks
>     of a job array for some fields.
>  -- CRAY - Fix seg fault if a blade is replaced and slurmctld is restarted.
>  -- Fix plane distribution to allocate in blocks rather than cyclically.
>  -- squeue - Remove newline from job array ID value printed.
>  -- squeue - Enable filtering for job state SPECIAL_EXIT.
>  -- Prevent job array task ID being inappropriately set to NO_VAL.
>  -- MYSQL - Make it so you don't have to restart the slurmctld
>     to gain the correct limit when a parent account is root and you
>     remove a subaccount's limit which exists on the parent account.
>  -- MYSQL - Close chance of setting the wrong limit on an association
>     when removing a limit from an association on multiple clusters
>     at the same time.
>  -- MYSQL - Fix minor memory leak when modifying an association but no
>     change was made.
>  -- srun command line of either --mem or --mem-per-cpu will override both
> the
>     SLURM_MEM_PER_CPU and SLURM_MEM_PER_NODE environment variables.
>  -- Prevent slurmctld abort on update of advanced reservation that contains
> no
>     nodes.
>  -- ALPS - Revert commit 2c95e2d22 which also removes commit 2e2de6a4
> allowing
>     cray with the SubAllocate option to work as it did with 2.5.
>  -- Properly parse CPU frequency data on POWER systems.
>  -- Correct sacct.a man pages describing -i option.
>  -- Capture salloc/srun information in sdiag statistics.
>  -- Fix bug in node selection with topology optimization.
>  -- Don't set distribution when srun requests 0 memory.
>  -- Read in correct number of nodes from SLURM_HOSTFILE when specifying
> nodes
>     and --distribution=arbitrary.
>  -- Fix segfault in Bluegene setups where RebootQOSList is defined in
>     bluegene.conf and accounting is not setup.
>  -- MYSQL - Update mod_time when updating a start job record or adding one.
>  -- MYSQL - Fix issue where if an association id ever changes on at least a
>     portion of a job array is pending after it's initial start in the
>     database it could create another row for the remain array instead
>     of using the already existing row.
>  -- Fix scheduling anomaly with job arrays submitted to multiple partitions,
>     jobs could be started out of priority order.
>  -- If a host has suspended jobs do not reboot it. Reboot only hosts
>     with no jobs in any state.
>  -- ALPS - Fix issue when using --exclusive flag on srun to do the correct
>     thing (-F exclusive) instead of -F share.
>  -- Fix various memory leaks in the Perl API.
>  -- Fix a bug in the controller which display jobs in CF state as RUNNING.
>  -- Preserve advanced _core_ reservation when nodes added/removed/resized on
>     slurmctld restart. Rebuild core_bitmap as needed.
>  -- Fix for non-standard Munge port location for srun/pmi use.
>  -- Fix gang scheduling/preemption issue that could cancel job at startup.
>  -- Fix a bug in squeue which prevented squeue -tPD to print array jobs.
>  -- Sort job arrays in job queue according to array_task_id when priorities
> are
>     equal.
>  -- Fix segfault in sreport when there was no response from the dbd.
>  -- ALPS - Fix compile to not link against -ljob and -lexpat with every lib
>     or binary.
>  -- Fix testing for CR_Memory when CR_Memory and CR_ONE_TASK_PER_CORE are
> used
>     with select/linear.
>  -- MySQL - Fix minor memory leak if a connection ever goes away whist using
> it.
>  -- ALPS - Make it so srun --hint=nomultithread works correctly.
>  -- Prevent job array task ID from being reported as NO_VAL if last task in
> the
>     array gets requeued.
>  -- Fix some potential deadlock issues when state files don't exist in the
>     association manager.
>  -- Correct RebootProgram logic when executed outside of a maintenance
>     reservation.
>  -- Requeue job if possible when slurmstepd aborts.
>
> Both versions can be downloaded from the normal spot
> http://schedmd.com/#repos.
>
> --
> Danny Auble
> President, SchedMD LLC
> Commercial Slurm Development and Support
> ===============================================================
> Slurm User Group Meeting, 15-16 September 2015, Washington D.C.
> http://slurm.schedmd.com/slurm_ug_agenda.html

[slurm-dev] Re: Slurm versions 15.08.0 and 14.11.9 have been released!

Reply via email to