Just an info for people who consider updating: After the update, we
experienced that the slurmctld hang from time to time (we had to
restart it). The latest version from the git branch resolved this
problem. I guess it was fixed by this commit:

https://github.com/SchedMD/slurm/commit/0c5e35089c9ea775d0ff66fbbbc774cba5009468

We use "scontrol show config" regularly to check whether primary and
backup daemon are online.

2017-03-01 6:14 GMT+01:00 Nicholas C Santucci <[email protected]>:
> which also means starting in 17.02
>
> in "RPMs INSTALLED" section of
> https://slurm.schedmd.com/quickstart_admin.html should be revised as follows
>
> slurm-sjobexit
> slurm-sjstat
>
> should be changed to
>
> slurm-sjobexit (only prior to 17.02)
> slurm-sjstat (only prior to 17.02)
>
> On Mon, Feb 27, 2017 at 9:56 AM, <[email protected]> wrote:
>>
>> Thanks for the patch. Committed here:
>>
>> https://github.com/SchedMD/slurm/commit/95cf960afcdb77cae644b7d0709ede123896626d
>>
>> ----- Message from Daniel Letai <[email protected]> ---------
>>     Date: Mon, 27 Feb 2017 05:07:08 -0800
>>     From: Daniel Letai <[email protected]>
>> Reply-To: slurm-dev <[email protected]>
>> Subject: [slurm-dev] Re: Slurm version 17.02.0 is now available
>>       To: slurm-dev <[email protected]>
>>
>> $ git diff
>> diff --git a/slurm.spec b/slurm.spec
>> index 941b360..6bb3014 100644
>> --- a/slurm.spec
>> +++ b/slurm.spec
>> @@ -346,6 +346,7 @@ Includes the Slurm proctrack/lua and job_submit/lua
>> plugin
>>  Summary: Perl tool to print Slurm job state information
>>  Group: Development/System
>>  Requires: slurm
>> +Obsoletes: slurm-sjobexit slurm-sjstat slurm-seff
>>  %description contribs
>>  seff is a mail program used directly by the Slurm daemons. On completion
>> of a
>>  job, wait for it's accounting information to be available and include
>> that
>>
>>
>> On 02/27/2017 01:35 PM, dani wrote:
>>
>> Seems like no obsoletes was set on slurm-contribs, so yum complains of
>> conflicts with slurm-sjobs and friends.
>>
>>
>> On 24/02//2017 01:41, Danny Auble wrote:
>>
>>
>> After 9 months of development we are pleased to announce the availability
>> of Slurm version 17.02.0.
>>
>> A brief description of what is contained in this release and other notes
>> about it is contained below.  For a fuller description please consult the
>> RELEASE_NOTES file available in the source.
>>
>> Thanks to all involved!
>>
>> Slurm downloads are available from https://schedmd.com/downloads.php.
>>
>> RELEASE NOTES FOR SLURM VERSION 17.02
>> 23 February 2017
>>
>> IMPORTANT NOTES:
>> THE MAXJOBID IS NOW 67,108,863. ANY PRE-EXISTING JOBS WILL CONTINUE TO RUN
>> BUT
>> NEW JOB IDS WILL BE WITHIN THE NEW MAXJOBID RANGE. Adjust your configured
>> MaxJobID value as needed to eliminate any confusion.
>>
>> If using the slurmdbd (Slurm DataBase Daemon) you must update this first.
>> The 17.02 slurmdbd will work with Slurm daemons of version 15.08 and
>> above.
>> You will not need to update all clusters at the same time, but it is very
>> important to update slurmdbd first and having it running before updating
>> any other clusters making use of it.  No real harm will come from updating
>> your systems before the slurmdbd, but they will not talk to each other
>> until you do.  Also at least the first time running the slurmdbd you need
>> to
>> make sure your my.cnf file has innodb_buffer_pool_size equal to at least
>> 64M.
>> You can accomplish this by adding the line
>>
>> innodb_buffer_pool_size=64M
>>
>> under the [mysqld] reference in the my.cnf file and restarting the mysqld.
>> The
>> buffer pool size must be smaller than the size of the MySQL tmpdir. This
>> is
>> needed when converting large tables over to the new database schema.
>>
>> Slurm can be upgraded from version 15.08 or 16.05 to version 17.02 without
>> loss
>> of jobs or other state information. Upgrading directly from an earlier
>> version
>> of Slurm will result in loss of state information.
>>
>> If using SPANK plugins that use the Slurm APIs, they should be recompiled
>> when
>> upgrading Slurm to a new major release.
>>
>> NOTE: systemd services files are installed automatically, but not enabled.
>>       You will need to manually enable them on the appropriate systems:
>>       - Controller: systemctl enable slurmctld
>>       - Database: systemctl enable slurmdbd
>>       - Compute Nodes: systemctl enable slurmd
>>
>> NOTE: If you are not using Munge, but are using the "service" scripts to
>>       start Slurm daemons, then you will need to remove this check from
>> the
>>       etc/slurm*service scripts.
>>
>> NOTE: If you are upgrading with any jobs from 14.03 or earlier
>>       (i.e. quick upgrade from 14.03 -> 15.08 -> 17.02) you will need
>>       to wait until after those jobs are gone before you upgrade to 17.02.
>>
>> HIGHLIGHTS
>> ==========
>>  -- Added infrastructure for managing workload across a federation of
>> clusters.
>>     (partial functionality in version 17.02, fully operational in May
>> 2017)
>>  -- In order to support federated jobs, the MaxJobID configuration
>> parameter
>>     default value has been reduced from 2,147,418,112 to 67,043,328 and
>> its
>>     maximum value is now 67,108,863. Upon upgrading, any pre-existing jobs
>> that
>>     have a job ID above the new range will continue to run and new jobs
>> will get
>>     job IDs in the new range.
>>  -- Added "MailDomain" configuration parameter to qualify email addresses.
>>  -- Automatically clean up task/cgroup cpuset and devices cgroups after
>> steps
>>     are completed.
>>  -- Added burst buffer support for job arrays. Added new
>> SchedulerParameters
>>     configuration parameter of bb_array_stage_cnt=# to indicate how many
>> pending
>>     tasks of a job array should be made available for burst buffer
>> resource
>>     allocation.
>>  -- Added new sacctmgr commands: "shutdown" (shutdown the server), "list
>> stats"
>>     (get server statistics) "clear stats" (clear server statistics).
>>  -- The database index for jobs is now 64 bits.  If you happen to be close
>> to
>>     4 billion jobs in your database you will want to update your slurmctld
>> at
>>     the same time as your slurmdbd to prevent roll over of this variable
>> as
>>     it is 32 bit previous versions of Slurm.
>>  -- All memory values (in MB) are now 64 bit. Previously, nodes with more
>> than
>>     of memory would not schedule or enforce memory limits correctly.
>>  -- Removed AIX, BlueGene/L and BlueGene/P support.
>>  -- Removed sched/wiki and sched/wiki2 plugins and associated code.
>>  -- Added PrologFlags=Serial to disable concurrent execution of
>> prolog/epilog
>>     scripts.
>>
>>
>>
>>
>>
>>
>>
>> ----- End message from Daniel Letai <[email protected]> ----

Reply via email to