[slurm-dev] Slurm 17.02.7 and PMIx

2017-10-04 Thread Christopher Samuel

Hi folks,

Just wondering if anyone here has had any success getting Slurm to
compile with PMIx support?

I'm trying 17.02.7 and I find that with PMIx I get either:

PMIX v1.2.2: Slurm complains and tells me it wants v2.

PMIX v2.0.1: Slurm can't find it because the header files are not
where it is looking for them, and when I do a symlink hack to make
PMIX detection work it then fails to compile, with:

/bin/sh ../../../../libtool  --tag=CC   --mode=compile gcc -DHAVE_CONFIG_H -I. 
-I../../../.. -I../../../../slurm  -I../../../.. -I../../../../src/common 
-I/usr/include -I/usr/local/pmix/latest/include -DHAVE_PMIX_VER=2   -g -O0 
-pthread -Wall -g -O0 -fno-strict-aliasing -MT mpi_pmix_v2_la-pmixp_client.lo 
-MD -MP -MF .deps/mpi_pmix_v2_la-pmixp_client.Tpo -c -o 
mpi_pmix_v2_la-pmixp_client.lo `test -f 'pmixp_client.c' || echo 
'./'`pmixp_client.c
libtool: compile:  gcc -DHAVE_CONFIG_H -I. -I../../../.. -I../../../../slurm 
-I../../../.. -I../../../../src/common -I/usr/include 
-I/usr/local/pmix/latest/include -DHAVE_PMIX_VER=2 -g -O0 -pthread -Wall -g -O0 
-fno-strict-aliasing -MT mpi_pmix_v2_la-pmixp_client.lo -MD -MP -MF 
.deps/mpi_pmix_v2_la-pmixp_client.Tpo -c pmixp_client.c  -fPIC -DPIC -o 
.libs/mpi_pmix_v2_la-pmixp_client.o
pmixp_client.c: In function ‘_set_procdatas’:
pmixp_client.c:468:24: error: request for member ‘size’ in something not a 
structure or union
   kvp->value.data.array.size = count;
^
pmixp_client.c:482:24: error: request for member ‘array’ in something not a 
structure or union
   kvp->value.data.array.array = (pmix_info_t *)info;
^
make[4]: *** [mpi_pmix_v2_la-pmixp_client.lo] Error 1


So I'm guessing that I'm missing something but the documentation
for PMIX in Slurm seems pretty much non-existent. :-(

Anyone had any luck with this?

cheers,
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 Melbourne Bioinformatics - The University of Melbourne
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545


[slurm-dev] Re: Upgrading Slurm

2017-10-04 Thread Christopher Samuel

On 04/10/17 20:51, Gennaro Oliva wrote:

> If you are talking about Slurm I would backup the configuration files
> also.

Not directly Slurm related but don't forget to install and configure
etckeeper first.

It puts your /etc/ directory under git version control and will do
commits of changes before and after any package upgrade/install/removal
so you have a good history of changes made.

I'm assuming that the slurm config files in the Debian package are under
/etc so that will be helpful to you for this.

> Anyway there have been a lot of major changes in SLURM and in Debian since
> 2013 (Wheezy release date), so be prepared that it will be no picnic.

The Debian package name also changed from slurm-llnl to slurm-wlm at
some point too, so missing the intermediate release may result in that
not transitioning properly.

To be honest I would never use a distros packages for Slurm, I'd always
install it centrally (NFS exported to compute nodes) to keep things
simple.  That way you decouple your Slurm version from the OS and can
keep it up to date (or keep it on a known working version).

All the best!
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 Melbourne Bioinformatics - The University of Melbourne
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545


[slurm-dev] Re: Setting up Environment Modules package

2017-10-04 Thread Christopher Samuel

On 05/10/17 03:11, Mike Cammilleri wrote:

> 2. Install Environment Modules packages in a location visible to the
> entire cluster (NFS or similar), including the compute nodes, and the
> user then includes their 'module load' commands in their actual slurm
> submit scripts since the command would be available on the compute
> nodes - loading software (either local or from network locations
> depending on what they're loading) visible to the nodes

This is what we do, the management node for the cluster exports its
/usr/local read-only to the rest of the cluster.

We also have in our taskprolog.sh:

echo export BASH_ENV=/etc/profile.d/module.sh

to try and ensure that bash shells have modules set up, just in case. :-)

-- 
 Christopher SamuelSenior Systems Administrator
 Melbourne Bioinformatics - The University of Melbourne
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545


[slurm-dev] Re: Tasks distribution

2017-10-04 Thread Jeffrey Frey

I didn't realize prior to this that the "--distribution" flag to "sbatch" only 
affects how an "srun" within the batch script will make CPU allocations.  Prior 
to that happening, SLURM must allocate CPUs to the batch job, and _that_ 
distribution is dictated by how you have the "select/cons_res" plugin 
configured:

> SelectType=select/cons_res
> SelectTypeParameters=CR_Core

The default behavior is to spread the allocation across the available nodes -- 
thus, 4/4/3/3/3.  If you'd rather "pack" allocations onto the nodes, enable the 
CR_PACK_NODES option:

> SelectType=select/cons_res
> SelectTypeParameters=CR_Core,CR_Pack_Nodes

This will produce the 4/4/4/4/1 allocation pattern.  AFAIK there's no way to 
alter which CPU allocation pattern gets used on a per-job basis.


Once the job has been assigned nodes and CPUs on those nodes, the 
"--distribution" option you provide informs "srun" how to distribute the tasks 
it starts.  Not using "srun" to start the MPI program, Open MPI itself knows 
nothing beyond seeing

SLURM_NODELIST=n[009-013]
SLURM_TASKS_PER_NODE=4(x2),3(x3)

in the environment which produces the host list

n009:4
n010:4
n011:3
n012:3
n013:3

for which the --map-by and --rank-by options to "mpirun" will affect the 
distribution.





 
> On Oct 3, 2017, at 8:26 PM, Christopher Samuel  wrote:
> 
> 
> On 02/10/17 20:51, Sysadmin CAOS wrote:
> 
>> I'm execution  my MPI program with "mpirun"... Maybe could be this the
>> problem? Do I need to execute with "srun"?
> 
> I suspect so, try it and see..
> 
> -- 
> Christopher SamuelSenior Systems Administrator
> Melbourne Bioinformatics - The University of Melbourne
> Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545


::
Jeffrey T. Frey, Ph.D.
Systems Programmer V / HPC Management
Network & Systems Services / College of Engineering
University of Delaware, Newark DE  19716
Office: (302) 831-6034  Mobile: (302) 419-4976
::





[slurm-dev] Re: Setting up Environment Modules package

2017-10-04 Thread Andy Riebs


We've had good luck putting the modules on an nfs-mounted file system. 
Along with that, suggest creating /etc/profile.d/zmodule.sh that contains


    module use /modules

then symlink /etc/profile.d/zmodule.csh to it, and set this up on all 
login and compute nodes.


Andy

On 10/04/2017 12:10 PM, Mike Cammilleri wrote:

Hi Everyone,

I'm in search of a best practice for setting up Environment Modules for our 
Slurm 16.05.6 installation (we have not had the time to upgrade to 17.02 yet). 
We're a small group and had no explicit need for this in the beginning, but as 
we are growing larger with more users we clearly need something like this.

I see there are a couple ways to implement Environment Modules and I'm 
wondering which would be the cleanest, most sensible way. I'll list my ideas 
below:

1. Install Environment Modules package and relevant modulefiles on the slurm 
head/submit/login node, perhaps in the default /usr/local/ location. The 
modulefiles modules would define paths to various software packages that exist 
in a location visible/readable to the compute nodes (NFS or similar). The user 
then loads the modules manually at the command line on the submit/login node 
and not in the slurm submit script - but specify #SBATCH --export=ALL and 
import the environment before submitting the sbatch job.

2. Install Environment Modules packages in a location visible to the entire 
cluster (NFS or similar), including the compute nodes, and the user then 
includes their 'module load' commands in their actual slurm submit scripts 
since the command would be available on the compute nodes - loading software 
(either local or from network locations depending on what they're loading) 
visible to the nodes

3. Another variation would be to use a configuration manager like bcfg2 to make 
sure Environment Modules and necessary modulefiles and all configurations are 
present on all compute/submit nodes. Seems like that's potential for a mess 
though.

Is there a preferred approach? I see in the archives some folks have strange 
behavior when a user uses --export=ALL, so it would seem to me that the cleaner 
approach is to have the 'module load' command available on all compute nodes 
and have users do this in their submit scripts. If this is the case, I'll need 
to configure Environment Modules and relevant modulefiles to live in special 
places when I build Environment Modules (./configure --prefix=/mounted-fs 
--modulefilesdir=/mounted-fs, etc.).

We've been testing with modules-tcl-1.923

Thanks for any advice,
mike


[slurm-dev] Setting up Environment Modules package

2017-10-04 Thread Mike Cammilleri

Hi Everyone,

I'm in search of a best practice for setting up Environment Modules for our 
Slurm 16.05.6 installation (we have not had the time to upgrade to 17.02 yet). 
We're a small group and had no explicit need for this in the beginning, but as 
we are growing larger with more users we clearly need something like this.

I see there are a couple ways to implement Environment Modules and I'm 
wondering which would be the cleanest, most sensible way. I'll list my ideas 
below:

1. Install Environment Modules package and relevant modulefiles on the slurm 
head/submit/login node, perhaps in the default /usr/local/ location. The 
modulefiles modules would define paths to various software packages that exist 
in a location visible/readable to the compute nodes (NFS or similar). The user 
then loads the modules manually at the command line on the submit/login node 
and not in the slurm submit script - but specify #SBATCH --export=ALL and 
import the environment before submitting the sbatch job.

2. Install Environment Modules packages in a location visible to the entire 
cluster (NFS or similar), including the compute nodes, and the user then 
includes their 'module load' commands in their actual slurm submit scripts 
since the command would be available on the compute nodes - loading software 
(either local or from network locations depending on what they're loading) 
visible to the nodes

3. Another variation would be to use a configuration manager like bcfg2 to make 
sure Environment Modules and necessary modulefiles and all configurations are 
present on all compute/submit nodes. Seems like that's potential for a mess 
though.

Is there a preferred approach? I see in the archives some folks have strange 
behavior when a user uses --export=ALL, so it would seem to me that the cleaner 
approach is to have the 'module load' command available on all compute nodes 
and have users do this in their submit scripts. If this is the case, I'll need 
to configure Environment Modules and relevant modulefiles to live in special 
places when I build Environment Modules (./configure --prefix=/mounted-fs 
--modulefilesdir=/mounted-fs, etc.).

We've been testing with modules-tcl-1.923

Thanks for any advice,
mike


[slurm-dev] Re: Is PriorityUsageResetPeriod really required for hard limits?

2017-10-04 Thread Jacob Chappell
Thanks for the replies and clarifications. It is actually our desired usage
policy that they never be able to run jobs once their allocation is
exhausted. They must submit a proposal at which point we increase their
allocation, but we never want to reset their usage. It's good to know that
the reset period is not actually required, as the Slurm documentation
suggests, because we have a very real use case. I'm assuming the usage is
stored as a 64-bit integer, so hopefully we don't end up overflowing in the
future.

__
*Jacob D. Chappell*
*Research Computing Associate*
Research Computing | Research Computing Infrastructure
Information Technology Services | University of Kentucky
301 Rose Street | 102 James F. Hardymon Building
Lexington, KY 40506-0495
jacob.chapp...@uky.edu

Visit us: www.uky.edu/ITS
How are we doing? Send Feedback to itsabout...@uky.edu
ITS . . . it’s about technology. ITS . . . it’s about innovation.  ITS . .
. it’s about you!

On Wed, Oct 4, 2017 at 10:19 AM, Thomas M. Payerle  wrote:

>
> On Tue, 3 Oct 2017, Christopher Samuel wrote:
>
>>
>> On 29/09/17 06:34, Jacob Chappell wrote:
>>
>> Hi all. The slurm.conf documentation says that if decayed usage is
>>> disabled, then PriorityUsageResetPeriod must be set to some value. Is
>>> this really true? What is the technical reason for this requirement if
>>> so? Can we set this period to sometime far into the future to have
>>> effectively an infinite period (no reset)?
>>>
>>
>> Basically this is because once a user exceeds something like their
>> maximum CPU run time limit then they will never be able to run jobs
>> again unless you either decay or reset usage.
>>
>> --
>> Christopher SamuelSenior Systems Administrator
>> Melbourne Bioinformatics - The University of Melbourne
>> Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
>>
>>
> To answer your question, it is not required.  Although if you do not
> have it set, you will, as Christopher pointed out, have to do something
> to reset usage if you do not want people to lose ability to run jobs
> forever.
>
> We have a couple of different "types" of allocations with different
> reset periods, so a global PriorityUsageResetPeriod does not work for
> us.  Instead we have cron jobs that run at the appropriate times
> and do something like sacctmgr update account name=XXX rawusage=0
> to do our resets.  But PriorityUsageResetPeriod is set to none.
>
>
> Tom Payerle
> DIT-ACIGS/Mid-Atlantic Crossroads   paye...@umd.edu
> 5825 University Research Court  (301) 405-6135
> University of Maryland
> College Park, MD 20740-3831
>


[slurm-dev] Re: Is PriorityUsageResetPeriod really required for hard limits?

2017-10-04 Thread Thomas M. Payerle


On Tue, 3 Oct 2017, Christopher Samuel wrote:


On 29/09/17 06:34, Jacob Chappell wrote:


Hi all. The slurm.conf documentation says that if decayed usage is
disabled, then PriorityUsageResetPeriod must be set to some value. Is
this really true? What is the technical reason for this requirement if
so? Can we set this period to sometime far into the future to have
effectively an infinite period (no reset)?


Basically this is because once a user exceeds something like their
maximum CPU run time limit then they will never be able to run jobs
again unless you either decay or reset usage.

--
Christopher SamuelSenior Systems Administrator
Melbourne Bioinformatics - The University of Melbourne
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545



To answer your question, it is not required.  Although if you do not
have it set, you will, as Christopher pointed out, have to do something
to reset usage if you do not want people to lose ability to run jobs
forever.

We have a couple of different "types" of allocations with different
reset periods, so a global PriorityUsageResetPeriod does not work for
us.  Instead we have cron jobs that run at the appropriate times
and do something like 
sacctmgr update account name=XXX rawusage=0

to do our resets.  But PriorityUsageResetPeriod is set to none.


Tom Payerle
DIT-ACIGS/Mid-Atlantic Crossroads   paye...@umd.edu
5825 University Research Court  (301) 405-6135
University of Maryland
College Park, MD 20740-3831


[slurm-dev] Re: Upgrading Slurm

2017-10-04 Thread Gennaro Oliva

Ciao Elisabetta,

On Wed, Oct 04, 2017 at 01:38:21AM -0600, Elisabetta Falivene wrote:
> Just some other questions. How would you do the upgrade in the safer way?
> Letting aptitude do his job?

I prefer apt, but it's a matter of taste.

> Would you to debian 9?

In my opinion is preferable to make 2 steps: from wheezy to jessie and
from jessie to stretch. Upgrade from releases older than Jessie is not
supported. If you don't have many software compiled from source under
/usr/local, /opt or in the users' home directories, it is better to
leave your /home partition unchanged and do a fresh install of your
system than making the long jump. This will also have the benefit of
putting the system under your total control.

> And the nodes must be
> upgraded in the same way one by one?

Yes I would use the same method for the front-end and the computing
nodes unless you have automatic centralized installation and
configuration systems in place. In that case I would reinstall nodes
from scratch after front-end has been upgraded.

> Let's think about the worst case: upgrading nuke slurm. I don't really know
> well this machine's configuration. You would backup something else beside
> The database before upgrading?

If you are talking about Slurm I would backup the configuration files
also.

Regarding Debian system updates, you can find some information here, for
the first upgrade:

https://www.debian.org/releases/jessie/amd64/release-notes/ch-upgrading.en.html

and here for the second upgrade:

https://www.debian.org/releases/stretch/amd64/release-notes/ch-upgrading.html

Anyway there have been a lot of major changes in SLURM and in Debian since
2013 (Wheezy release date), so be prepared that it will be no picnic.

Best regards
-- 
Gennaro Oliva


[slurm-dev] Re: Upgrading Slurm

2017-10-04 Thread Loris Bennett

Hi Elisabetta,

Elisabetta Falivene  writes:

> Upgrading Slurm 
>
> Thank you all for useful advices!
>
> So The 'jump' could not be a problem if there are no running jobs
> (which is my case as you guessed). Surely I'll report how it went
> doing it. I would like to do some test on a virtual machine, but
> really can't imagine how to replicate the exact situation of a 7Tb
> cluster locally...
>
> Just some other questions. How would you do the upgrade in the safer
> way? Letting aptitude do his job? Would you to debian 9? And the nodes
> must be upgraded in the same way one by one?

If no jobs are running, I would just let aptitude get on with it.

It there are no other reasons not to, I would upgrade to Debian 9.  In
this case, your version of Slurm will be 16.05 and thus not too old.

> Let's think about the worst case: upgrading nuke slurm. I don't really
> know well this machine's configuration. You would backup something
> else beside The database before upgrading?

The only other thing I backup is the statesave directory, but this only
interesting if you are upgrading while jobs are running.  In your case,
only the database is worth backing up, and even then, that's only really
interesting if you need the old data for statistical purposes, or you
need to maintain, say, fairshare information across the upgrade.

In bocca al lupo!

Loris

-- 
Dr. Loris Bennett (Mr.)
ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de


[slurm-dev] Re: Upgrading Slurm

2017-10-04 Thread Elisabetta Falivene
Awesome! Thank you Ole!


2017-10-04 9:59 GMT+02:00 Ole Holm Nielsen :

>
> On 10/04/2017 09:38 AM, Elisabetta Falivene wrote:
>
>> Ps: if you know some good source of information about how to set up a
>> cluster and slurm beside official doc, I would be grateful if you could
>> share. It is difficult to find good material
>>
>
> I agree about the lack of availability of HowTo guides.  That's why I
> wrote a Slurm HowTo Wiki while installing Slurm, but it is focused on
> CentOS 7: https://wiki.fysik.dtu.dk/niflheim/SLURM
>
> However, the Slurm setup in itself should not depend on the Linux
> distribution, so perhaps you can learn something useful from the Wiki
> anyway.
>
> /Ole
>


[slurm-dev] Re: Upgrading Slurm

2017-10-04 Thread Christopher Samuel

On 04/10/17 17:12, Loris Bennett wrote:

> Ole's pages on Slurm are indeed very useful (Thanks, Ole!).  I just
> thought I point out that the limitation on only upgrading by 2 major
> versions is for the case that you are upgrading a production system and
> don't want to lose any running jobs. 

The on disk format might for spooled jobs may also change between
releases too, so you probably want to keep that in mind as well..

-- 
 Christopher SamuelSenior Systems Administrator
 Melbourne Bioinformatics - The University of Melbourne
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545


[slurm-dev] Re: Upgrading Slurm

2017-10-04 Thread Loris Bennett

Hi Elisabetta,

Ole Holm Nielsen  writes:

> On 10/03/2017 03:29 PM, Elisabetta Falivene wrote:
>> I've been asked to upgrade our slurm installation. I have a slurm 2.3.4 on a
>> Debian 7.0 wheezy cluster (1 master + 8 nodes). I've not installed it so I'm 
>> a
>> bit confused about how to do this and how to proceed without destroying
>> anything.
>>
>> I was thinking to upgrade at least to Jessie (Debian 8) but what about Slurm?
>> I've read carefully the upgrading section
>> (https://slurm.schedmd.com/quickstart_admin.html) of the doc, reading that 
>> the
>> upgrade must be done incrementally and not jumping from 2.3.4 to 17, for
>> example.
>
> Yes, you may jump max 2 versions per upgrade.
> Quoting https://slurm.schedmd.com/quickstart_admin.html#upgrade
>
>> Slurm daemons will support RPCs and state files from the two previous minor
>> releases (e.g. a version 16.05.x SlurmDBD will support slurmctld daemons and
>> commands with a version of 16.05.x, 15.08.x or 14.11.x). 
>
>
>> Stil is not clear to me precisely how to do this. How would you proceed if
>> asked to upgrade a cluster you just don't know nothing about? What would you
>> check? What version of o.s. and slurm would you choose? What would you 
>> backup?
>> And how would you proceed?
>>
>> Any info is gold! Thank you
>
> My 2 cents of information:
>
> My Slurm Wiki explains how to upgrade Slurm on CentOS 7:
> https://wiki.fysik.dtu.dk/niflheim/Slurm_installation#upgrading-slurm
>
> Probably the general method is the same for Debian.

Ole's pages on Slurm are indeed very useful (Thanks, Ole!).  I just
thought I point out that the limitation on only upgrading by 2 major
versions is for the case that you are upgrading a production system and
don't want to lose any running jobs.  If you are upgrading the whole
operating system, you are probably planning a downtime anyway and so
there won't be any such jobs.  In this case, there shouldn't in theory
be a problem - although I must admit that I wouldn't be that surprised
if converting the database from 2.3.4 to, say, 17.02.7 didn't go 100%
smoothly.  However, Debian users who just rely on Debian packages are
always going to face this problem of large version jumps between Debian
releases, and so it would be useful for the community to know how well
this works.

Cheers,

Loris

-- 
Dr. Loris Bennett (Mr.)
ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de