[slurm-dev] Re: Setting up Environment Modules package

2017-10-05 Thread Jeffrey Frey
FWIW, we have a home-grown project going here at UD called "VALET" to setup 
users' environment on our HPC clusters.  It's implemented in Python and uses 
only standard packages present on a RHEL-based installation (probably Ubuntu et 
al., too).  The main reasons we wrote it:


- Automatically detect standard paths under a base prefix (e.g. prefix = /usr, 
make sure /usr/bin is on PATH, /usr/lib and /usr/lib64 on LD_LIBRARY_PATH, 
-I/usr/include is in CPPFLAGS, etc.)

- Automatically handle dependencies (e.g. openmpi/1.10.2-intel2015 requires the 
intel/2015 package)

- Create environment "snapshots" before changes are made (thus, a package can 
source a script to make env changes and _still_ have them removed from the 
environment on a rollback)


This happened when TCL modules was in its prime, and we didn't feel comfortable 
asking our users to write TCL code to handle multiple revisions of their 
software.  Since we have a LOT of software that uses the standard filesystem 
layout (bin/lib/include) we install into an NFS share a'la


/opt/shared/open-mpi/1.10.2-gcc
/opt/shared/open-mpi/1.10.2-intel2015


and the package definition in VALET looks like (in YAML syntax, but JSON and 
XML are available, too):


open-mpi:
  description: Open MPI:  Message-Passing Interface
  url: http://www.open-mpi.org/
  prefix: /opt/shared/open-mpi
  default-version: 1.10.2-gcc
  versions:
1.10.2-gcc:
  description: 1.10.2 (with system GCC)
1.10.2-intel2015:
  description: 1.10.2 (with Intel 2015)
  dependencies:
intel/2015


As with any solution to the problem, once our users grok the config format, 
this makes it very easy for them to define their own packages and maintain 
multiple versions of their own software.


The project is up on gitlab:


https://gitlab.com/valet/




> On Oct 5, 2017, at 12:51 PM, Mike Cammilleri <mi...@stat.wisc.edu> wrote:
> 
> Thanks to everyone for your responses.
>  
> We primarily use bcfg2 config management to keep the system packages and 
> configs the same throughout the cluster. I have not generally used it for 
> add-on software used with research unless for some reason it really needed to 
> be locally installed for whatever reason.  I think we’ll be putting various 
> versions of gcc, R, matlab, etc. in the NFS mounted space, so it seems to 
> make sense to just build (from source) the tcl-based Environment Modules 
> software into the NFS space as well and have the scripts in /etc/profile.d/ 
> point to it. I’m not using RPM because believe it or not we’re running a 
> Ubuntu-based SLURM cluster, where we built slurm from source as well.
>  
> I’m interested in the Lmod modules available from TACC with EasyBuild if it’s 
> something that works with our Ubuntu based environment – however I disagree 
> with the modules-tcl-1.923 flavor being out of date – perhaps the pre-built 
> packages are – but the source was updated 2017-07-20 and seems to work great. 
> 
> http://modules.sourceforge.net/tcl/NEWS.html
>  
> Most of what we need out of this is setting environments to various versions 
> of software and setting some library paths but not much more complex than 
> that, yet!
>  
> --mike
>  
> From: r...@open-mpi.org [mailto:r...@open-mpi.org] 
> Sent: Thursday, October 5, 2017 10:24 AM
> To: slurm-dev <slurm-dev@schedmd.com>
> Subject: [slurm-dev] Re: Setting up Environment Modules package
>  
> 
> On Oct 5, 2017, at 12:08 AM, Ole Holm Nielsen <ole.h.niel...@fysik.dtu.dk> 
> wrote:
>  
> 
> On 10/04/2017 06:11 PM, Mike Cammilleri wrote:
> 
> I'm in search of a best practice for setting up Environment Modules for our 
> Slurm 16.05.6 installation (we have not had the time to upgrade to 17.02 
> yet). We're a small group and had no explicit need for this in the beginning, 
> but as we are growing larger with more users we clearly need something like 
> this.
> I see there are a couple ways to implement Environment Modules and I'm 
> wondering which would be the cleanest, most sensible way. I'll list my ideas 
> below:
> 1. Install Environment Modules package and relevant modulefiles on the slurm 
> head/submit/login node, perhaps in the default /usr/local/ location. The 
> modulefiles modules would define paths to various software packages that 
> exist in a location visible/readable to the compute nodes (NFS or similar). 
> The user then loads the modules manually at the command line on the 
> submit/login node and not in the slurm submit script - but specify #SBATCH 
> --export=ALL and import the environment before submitting the sbatch job.
> 2. Install Environment Modules packages in a location visible to the entire 
> cluster (NFS or similar), including the compute nodes, and the user then 
> includes their 'module load' commands in their 

[slurm-dev] Re: Setting up Environment Modules package

2017-10-05 Thread Mike Cammilleri
Thanks to everyone for your responses.

We primarily use bcfg2 config management to keep the system packages and 
configs the same throughout the cluster. I have not generally used it for 
add-on software used with research unless for some reason it really needed to 
be locally installed for whatever reason.  I think we’ll be putting various 
versions of gcc, R, matlab, etc. in the NFS mounted space, so it seems to make 
sense to just build (from source) the tcl-based Environment Modules software 
into the NFS space as well and have the scripts in /etc/profile.d/ point to it. 
I’m not using RPM because believe it or not we’re running a Ubuntu-based SLURM 
cluster, where we built slurm from source as well.

I’m interested in the Lmod modules available from TACC with EasyBuild if it’s 
something that works with our Ubuntu based environment – however I disagree 
with the modules-tcl-1.923 flavor being out of date – perhaps the pre-built 
packages are – but the source was updated 2017-07-20 and seems to work great.

http://modules.sourceforge.net/tcl/NEWS.html

Most of what we need out of this is setting environments to various versions of 
software and setting some library paths but not much more complex than that, 
yet!

--mike

From: r...@open-mpi.org [mailto:r...@open-mpi.org]
Sent: Thursday, October 5, 2017 10:24 AM
To: slurm-dev <slurm-dev@schedmd.com>
Subject: [slurm-dev] Re: Setting up Environment Modules package


On Oct 5, 2017, at 12:08 AM, Ole Holm Nielsen 
<ole.h.niel...@fysik.dtu.dk<mailto:ole.h.niel...@fysik.dtu.dk>> wrote:


On 10/04/2017 06:11 PM, Mike Cammilleri wrote:

I'm in search of a best practice for setting up Environment Modules for our 
Slurm 16.05.6 installation (we have not had the time to upgrade to 17.02 yet). 
We're a small group and had no explicit need for this in the beginning, but as 
we are growing larger with more users we clearly need something like this.
I see there are a couple ways to implement Environment Modules and I'm 
wondering which would be the cleanest, most sensible way. I'll list my ideas 
below:
1. Install Environment Modules package and relevant modulefiles on the slurm 
head/submit/login node, perhaps in the default /usr/local/ location. The 
modulefiles modules would define paths to various software packages that exist 
in a location visible/readable to the compute nodes (NFS or similar). The user 
then loads the modules manually at the command line on the submit/login node 
and not in the slurm submit script - but specify #SBATCH --export=ALL and 
import the environment before submitting the sbatch job.
2. Install Environment Modules packages in a location visible to the entire 
cluster (NFS or similar), including the compute nodes, and the user then 
includes their 'module load' commands in their actual slurm submit scripts 
since the command would be available on the compute nodes - loading software 
(either local or from network locations depending on what they're loading) 
visible to the nodes
3. Another variation would be to use a configuration manager like bcfg2 to make 
sure Environment Modules and necessary modulefiles and all configurations are 
present on all compute/submit nodes. Seems like that's potential for a mess 
though.
Is there a preferred approach? I see in the archives some folks have strange 
behavior when a user uses --export=ALL, so it would seem to me that the cleaner 
approach is to have the 'module load' command available on all compute nodes 
and have users do this in their submit scripts. If this is the case, I'll need 
to configure Environment Modules and relevant modulefiles to live in special 
places when I build Environment Modules (./configure --prefix=/mounted-fs 
--modulefilesdir=/mounted-fs, etc.).
We've been testing with modules-tcl-1.923

I strongly recommend uninstalling the Linux distro "environment-modules" 
package because this old Tcl-based software hasn't been maintained for 5+ 
years.  I recommend a very readable paper on various module systems:
http://dl.acm.org/citation.cfm?id=2691141

As was pointed out to me when I made a similar comment on another mailing list, 
the Tcl-based system is actively maintained - the repo simply moved. I’m not 
recommending either direction as it gets into religion rather quickly.



We use the modern and actively maintained Lmod modules developed at TACC 
(https://www.tacc.utexas.edu/research-development/tacc-projects/lmod) together 
with the EasyBuild module building system (a strong HPC community effort, 
https://github.com/easybuilders/easybuild).

I believe that the TACC supercomputer systems provide Slurm as a loadable 
module, but I don't know any details.  We just install Slurm as RPMs on CentOS 
7.

We're extremely happy with Lmod and EasyBuild because of the simplicity with 
which 1300+ modules are made available.  I've written a Wiki about how we have 
installed this: https://wiki.fysik.dtu.dk/niflheim/EasyBuild_modules.  We put 
all of

[slurm-dev] Re: Setting up Environment Modules package

2017-10-05 Thread r...@open-mpi.org
> 
> On Oct 5, 2017, at 12:08 AM, Ole Holm Nielsen  > wrote:
> 
> 
> On 10/04/2017 06:11 PM, Mike Cammilleri wrote:
>> I'm in search of a best practice for setting up Environment Modules for our 
>> Slurm 16.05.6 installation (we have not had the time to upgrade to 17.02 
>> yet). We're a small group and had no explicit need for this in the 
>> beginning, but as we are growing larger with more users we clearly need 
>> something like this.
>> I see there are a couple ways to implement Environment Modules and I'm 
>> wondering which would be the cleanest, most sensible way. I'll list my ideas 
>> below:
>> 1. Install Environment Modules package and relevant modulefiles on the slurm 
>> head/submit/login node, perhaps in the default /usr/local/ location. The 
>> modulefiles modules would define paths to various software packages that 
>> exist in a location visible/readable to the compute nodes (NFS or similar). 
>> The user then loads the modules manually at the command line on the 
>> submit/login node and not in the slurm submit script - but specify #SBATCH 
>> --export=ALL and import the environment before submitting the sbatch job.
>> 2. Install Environment Modules packages in a location visible to the entire 
>> cluster (NFS or similar), including the compute nodes, and the user then 
>> includes their 'module load' commands in their actual slurm submit scripts 
>> since the command would be available on the compute nodes - loading software 
>> (either local or from network locations depending on what they're loading) 
>> visible to the nodes
>> 3. Another variation would be to use a configuration manager like bcfg2 to 
>> make sure Environment Modules and necessary modulefiles and all 
>> configurations are present on all compute/submit nodes. Seems like that's 
>> potential for a mess though.
>> Is there a preferred approach? I see in the archives some folks have strange 
>> behavior when a user uses --export=ALL, so it would seem to me that the 
>> cleaner approach is to have the 'module load' command available on all 
>> compute nodes and have users do this in their submit scripts. If this is the 
>> case, I'll need to configure Environment Modules and relevant modulefiles to 
>> live in special places when I build Environment Modules (./configure 
>> --prefix=/mounted-fs --modulefilesdir=/mounted-fs, etc.).
>> We've been testing with modules-tcl-1.923
> 
> I strongly recommend uninstalling the Linux distro "environment-modules" 
> package because this old Tcl-based software hasn't been maintained for 5+ 
> years.  I recommend a very readable paper on various module systems:
> http://dl.acm.org/citation.cfm?id=2691141 
> 

As was pointed out to me when I made a similar comment on another mailing list, 
the Tcl-based system is actively maintained - the repo simply moved. I’m not 
recommending either direction as it gets into religion rather quickly.

> 
> We use the modern and actively maintained Lmod modules developed at TACC 
> (https://www.tacc.utexas.edu/research-development/tacc-projects/lmod 
> ) 
> together with the EasyBuild module building system (a strong HPC community 
> effort, https://github.com/easybuilders/easybuild 
> ).
> 
> I believe that the TACC supercomputer systems provide Slurm as a loadable 
> module, but I don't know any details.  We just install Slurm as RPMs on 
> CentOS 7.
> 
> We're extremely happy with Lmod and EasyBuild because of the simplicity with 
> which 1300+ modules are made available.  I've written a Wiki about how we 
> have installed this: https://wiki.fysik.dtu.dk/niflheim/EasyBuild_modules 
> .  We put all of our 
> modules on a shared NFS file system for all nodes.
> 
> /Ole


[slurm-dev] Re: Setting up Environment Modules package

2017-10-05 Thread Bill Barth
I could get into more details if necessary, but we do *not* provide SLURM as a 
module itself on our systems. It’s provided in the usual install location via 
RPMs that we build ourselves locally. We have some small modifications that we 
provide as part of our build, so it’s necessary to do this way. We also do 
*everything else* via our own RPMs, so it fits with our local methodology. 

Best,
Bill.

-- 
Bill Barth, Ph.D., Director, HPC
bba...@tacc.utexas.edu|   Phone: (512) 232-7069
Office: ROC 1.435|   Fax:   (512) 475-9445
 
 

On 10/5/17, 2:07 AM, "Ole Holm Nielsen"  wrote:


On 10/04/2017 06:11 PM, Mike Cammilleri wrote:
> I'm in search of a best practice for setting up Environment Modules for 
our Slurm 16.05.6 installation (we have not had the time to upgrade to 17.02 
yet). We're a small group and had no explicit need for this in the beginning, 
but as we are growing larger with more users we clearly need something like 
this.
> 
> I see there are a couple ways to implement Environment Modules and I'm 
wondering which would be the cleanest, most sensible way. I'll list my ideas 
below:
> 
> 1. Install Environment Modules package and relevant modulefiles on the 
slurm head/submit/login node, perhaps in the default /usr/local/ location. The 
modulefiles modules would define paths to various software packages that exist 
in a location visible/readable to the compute nodes (NFS or similar). The user 
then loads the modules manually at the command line on the submit/login node 
and not in the slurm submit script - but specify #SBATCH --export=ALL and 
import the environment before submitting the sbatch job.
> 
> 2. Install Environment Modules packages in a location visible to the 
entire cluster (NFS or similar), including the compute nodes, and the user then 
includes their 'module load' commands in their actual slurm submit scripts 
since the command would be available on the compute nodes - loading software 
(either local or from network locations depending on what they're loading) 
visible to the nodes
> 
> 3. Another variation would be to use a configuration manager like bcfg2 
to make sure Environment Modules and necessary modulefiles and all 
configurations are present on all compute/submit nodes. Seems like that's 
potential for a mess though.
> 
> Is there a preferred approach? I see in the archives some folks have 
strange behavior when a user uses --export=ALL, so it would seem to me that the 
cleaner approach is to have the 'module load' command available on all compute 
nodes and have users do this in their submit scripts. If this is the case, I'll 
need to configure Environment Modules and relevant modulefiles to live in 
special places when I build Environment Modules (./configure 
--prefix=/mounted-fs --modulefilesdir=/mounted-fs, etc.).
> 
> We've been testing with modules-tcl-1.923

I strongly recommend uninstalling the Linux distro "environment-modules" 
package because this old Tcl-based software hasn't been maintained for 
5+ years.  I recommend a very readable paper on various module systems:
http://dl.acm.org/citation.cfm?id=2691141

We use the modern and actively maintained Lmod modules developed at TACC 
(https://www.tacc.utexas.edu/research-development/tacc-projects/lmod) 
together with the EasyBuild module building system (a strong HPC 
community effort, https://github.com/easybuilders/easybuild).

I believe that the TACC supercomputer systems provide Slurm as a 
loadable module, but I don't know any details.  We just install Slurm as 
RPMs on CentOS 7.

We're extremely happy with Lmod and EasyBuild because of the simplicity 
with which 1300+ modules are made available.  I've written a Wiki about 
how we have installed this: 
https://wiki.fysik.dtu.dk/niflheim/EasyBuild_modules.  We put all of our 
modules on a shared NFS file system for all nodes.

/Ole




[slurm-dev] Re: Setting up Environment Modules package

2017-10-05 Thread Thomas M. Payerle


On Wed, 4 Oct 2017, Mike Cammilleri wrote:


Hi Everyone,

I'm in search of a best practice for setting up Environment Modules for our 
Slurm 16.05.6 installation (we have not had the time to upgrade to 17.02 yet). 
We're a small group and had no explicit need for this in the beginning, but as 
we are growing larger with more users we clearly need something like this.

I see there are a couple ways to implement Environment Modules and I'm 
wondering which would be the cleanest, most sensible way. I'll list my ideas 
below:

1. Install Environment Modules package and relevant modulefiles on the slurm 
head/submit/login node, perhaps in the default /usr/local/ location. The 
modulefiles modules would define paths to various software packages that exist 
in a location visible/readable to the compute nodes (NFS or similar). The user 
then loads the modules manually at the command line on the submit/login node 
and not in the slurm submit script - but specify #SBATCH --export=ALL and 
import the environment before submitting the sbatch job.

2. Install Environment Modules packages in a location visible to the entire 
cluster (NFS or similar), including the compute nodes, and the user then 
includes their 'module load' commands in their actual slurm submit scripts 
since the command would be available on the compute nodes - loading software 
(either local or from network locations depending on what they're loading) 
visible to the nodes

3. Another variation would be to use a configuration manager like bcfg2 to make 
sure Environment Modules and necessary modulefiles and all configurations are 
present on all compute/submit nodes. Seems like that's potential for a mess 
though.

Is there a preferred approach? I see in the archives some folks have strange 
behavior when a user uses --export=ALL, so it would seem to me that the cleaner 
approach is to have the 'module load' command available on all compute nodes 
and have users do this in their submit scripts. If this is the case, I'll need 
to configure Environment Modules and relevant modulefiles to live in special 
places when I build Environment Modules (./configure --prefix=/mounted-fs 
--modulefilesdir=/mounted-fs, etc.).

We've been testing with modules-tcl-1.923

Thanks for any advice,
mike



For ease of use of end users, I would recommend either 2 or 3.  In addition to 
making things
more consistent between interactive and batch use, it also makes it more 
manageable if particular
jobs require specific versions.  E.g., if I have a job that requires the specific version X of 
the "foo" library and the specific version Y of the "bar" library, that can be specified in

the job script.

Between 2 and 3, I would generally recommend 2.  Some of that depends on how 
you handle software
installs.  If most of the software is in a shared filesystem, it makes sense to 
keep module
files there as well; if on other hand software is installed locally using a 
configuration mgmt
system, then it might make sense to do that for modules as well.  Basically, if 
one adds a
new software package, do you want to go through the overhead of config mgr to 
push out the
module files?

Note, however, that the available software on the compute and login nodes might 
be different.
E.g., you might have an PNG image viewer available as a module on the login 
nodes that would
not be useful on the compute nodes.  If desired, you could do something like 
have two module
directories; one shared between compute and login nodes, one for login nodes 
only.

Tom Payerle
DIT-ACIGS/Mid-Atlantic Crossroads   paye...@umd.edu
5825 University Research Court  (301) 405-6135
University of Maryland
College Park, MD 20740-3831


[slurm-dev] Re: Setting up Environment Modules package

2017-10-05 Thread Benjamin Redling


Hello Mike,

On 10/4/17 6:10 PM, Mike Cammilleri wrote:

I'm in search of a best practice for setting up Environment Modules for our 
Slurm 16.05.6 installation (we have not had the time to upgrade to 17.02 yet). 
We're a small group and had no explicit need for this in the beginning, but as 
we are growing larger with more users we clearly need something like this.

what are your needs that brought you to "Environment Modules"?

Have you seen Singularity containers?
We are a small group and they seem to be less of a burden to have 
reproducible environment and allow users to relatively easily get a 
certain setup.


(Main motivation here is the use of TensorFlow, which has the best 
support on Ubuntu. But using that as the host OS proved to be a major 
pain because of its ridiculous package quality compared to a stable Debian.
With singularity I can provide whatever container/distribution is needed 
on top of a stable host OS.
And thank to NeuroDebian and tensorflow/tensorflow:latest-gpu-py3 even 
the newest version with access to Nvidia GPUs -- all ready to use, no 
messing around with dependencies)


Regards,
Benjamin
--
FSU Jena | JULIELab.de/Staff/Benjamin+Redling.html
☎ +49 3641 9 44323


[slurm-dev] Re: Setting up Environment Modules package

2017-10-05 Thread Ole Holm Nielsen


On 10/04/2017 06:11 PM, Mike Cammilleri wrote:

I'm in search of a best practice for setting up Environment Modules for our 
Slurm 16.05.6 installation (we have not had the time to upgrade to 17.02 yet). 
We're a small group and had no explicit need for this in the beginning, but as 
we are growing larger with more users we clearly need something like this.

I see there are a couple ways to implement Environment Modules and I'm 
wondering which would be the cleanest, most sensible way. I'll list my ideas 
below:

1. Install Environment Modules package and relevant modulefiles on the slurm 
head/submit/login node, perhaps in the default /usr/local/ location. The 
modulefiles modules would define paths to various software packages that exist 
in a location visible/readable to the compute nodes (NFS or similar). The user 
then loads the modules manually at the command line on the submit/login node 
and not in the slurm submit script - but specify #SBATCH --export=ALL and 
import the environment before submitting the sbatch job.

2. Install Environment Modules packages in a location visible to the entire 
cluster (NFS or similar), including the compute nodes, and the user then 
includes their 'module load' commands in their actual slurm submit scripts 
since the command would be available on the compute nodes - loading software 
(either local or from network locations depending on what they're loading) 
visible to the nodes

3. Another variation would be to use a configuration manager like bcfg2 to make 
sure Environment Modules and necessary modulefiles and all configurations are 
present on all compute/submit nodes. Seems like that's potential for a mess 
though.

Is there a preferred approach? I see in the archives some folks have strange 
behavior when a user uses --export=ALL, so it would seem to me that the cleaner 
approach is to have the 'module load' command available on all compute nodes 
and have users do this in their submit scripts. If this is the case, I'll need 
to configure Environment Modules and relevant modulefiles to live in special 
places when I build Environment Modules (./configure --prefix=/mounted-fs 
--modulefilesdir=/mounted-fs, etc.).

We've been testing with modules-tcl-1.923


I strongly recommend uninstalling the Linux distro "environment-modules" 
package because this old Tcl-based software hasn't been maintained for 
5+ years.  I recommend a very readable paper on various module systems:

http://dl.acm.org/citation.cfm?id=2691141

We use the modern and actively maintained Lmod modules developed at TACC 
(https://www.tacc.utexas.edu/research-development/tacc-projects/lmod) 
together with the EasyBuild module building system (a strong HPC 
community effort, https://github.com/easybuilders/easybuild).


I believe that the TACC supercomputer systems provide Slurm as a 
loadable module, but I don't know any details.  We just install Slurm as 
RPMs on CentOS 7.


We're extremely happy with Lmod and EasyBuild because of the simplicity 
with which 1300+ modules are made available.  I've written a Wiki about 
how we have installed this: 
https://wiki.fysik.dtu.dk/niflheim/EasyBuild_modules.  We put all of our 
modules on a shared NFS file system for all nodes.


/Ole


[slurm-dev] Re: Setting up Environment Modules package

2017-10-05 Thread Ole Holm Nielsen


On 10/05/2017 08:38 AM, Blomqvist Janne wrote:

what we do is, roughly, a combination of your options #2 and #3. To start with, 
however, I'd like to point out that we're using Lmod instead of the old Tcl 
environment-modules. I'd really recommend you to do the same.

So basically, we have our modules available on NFS, both the module files 
themselves and the software that modules makes available. Then we use 
configuration management (ansible, in our case) to ensure that Lmod is 
installed on all nodes, and that we have a suitable configuration file in 
/etc/profile.d that adds our NFS location to $MODULEPATH so that Lmod can find 
it.

We also use Easybuild to build (most) software and module files, you might want 
to look into that as well.


We use the same approach.


And yes, we tell our users to load the appropriate modules in the slurm batch 
scripts rather than relying on slurm to transfer the environment correctly.

As to whether this is preferred, well, it works, but provisioning with 
kickstart + config management gets tedious at scale (say, hundreds of nodes or 
more). If we were to rebuild everything from scratch, I think we'd take a long 
hard look at image-based deployment, e.g. openhpc/warewulf.


We use Kickstart including some post-install scripts to automatically 
install compute nodes with CentOS.  At 800 nodes currently, it's not at 
all tedious to perform installation and config management, IMHO.


In the distant past, we used the image-based approach with SystemImager, 
but I think this was no simpler than the Kickstart-based approach.


/Ole


[slurm-dev] Re: Setting up Environment Modules package

2017-10-05 Thread Blomqvist Janne

Hi,

what we do is, roughly, a combination of your options #2 and #3. To start with, 
however, I'd like to point out that we're using Lmod instead of the old Tcl 
environment-modules. I'd really recommend you to do the same.

So basically, we have our modules available on NFS, both the module files 
themselves and the software that modules makes available. Then we use 
configuration management (ansible, in our case) to ensure that Lmod is 
installed on all nodes, and that we have a suitable configuration file in 
/etc/profile.d that adds our NFS location to $MODULEPATH so that Lmod can find 
it.

We also use Easybuild to build (most) software and module files, you might want 
to look into that as well.

And yes, we tell our users to load the appropriate modules in the slurm batch 
scripts rather than relying on slurm to transfer the environment correctly.

As to whether this is preferred, well, it works, but provisioning with 
kickstart + config management gets tedious at scale (say, hundreds of nodes or 
more). If we were to rebuild everything from scratch, I think we'd take a long 
hard look at image-based deployment, e.g. openhpc/warewulf.

--
Janne Blomqvist


From: Mike Cammilleri 
Sent: Wednesday, October 4, 2017 7:10:44 PM
To: slurm-dev
Subject: [slurm-dev] Setting up Environment Modules package

Hi Everyone,

I'm in search of a best practice for setting up Environment Modules for our 
Slurm 16.05.6 installation (we have not had the time to upgrade to 17.02 yet). 
We're a small group and had no explicit need for this in the beginning, but as 
we are growing larger with more users we clearly need something like this.

I see there are a couple ways to implement Environment Modules and I'm 
wondering which would be the cleanest, most sensible way. I'll list my ideas 
below:

1. Install Environment Modules package and relevant modulefiles on the slurm 
head/submit/login node, perhaps in the default /usr/local/ location. The 
modulefiles modules would define paths to various software packages that exist 
in a location visible/readable to the compute nodes (NFS or similar). The user 
then loads the modules manually at the command line on the submit/login node 
and not in the slurm submit script - but specify #SBATCH --export=ALL and 
import the environment before submitting the sbatch job.

2. Install Environment Modules packages in a location visible to the entire 
cluster (NFS or similar), including the compute nodes, and the user then 
includes their 'module load' commands in their actual slurm submit scripts 
since the command would be available on the compute nodes - loading software 
(either local or from network locations depending on what they're loading) 
visible to the nodes

3. Another variation would be to use a configuration manager like bcfg2 to make 
sure Environment Modules and necessary modulefiles and all configurations are 
present on all compute/submit nodes. Seems like that's potential for a mess 
though.

Is there a preferred approach? I see in the archives some folks have strange 
behavior when a user uses --export=ALL, so it would seem to me that the cleaner 
approach is to have the 'module load' command available on all compute nodes 
and have users do this in their submit scripts. If this is the case, I'll need 
to configure Environment Modules and relevant modulefiles to live in special 
places when I build Environment Modules (./configure --prefix=/mounted-fs 
--modulefilesdir=/mounted-fs, etc.).

We've been testing with modules-tcl-1.923

Thanks for any advice,
mike


[slurm-dev] Re: Setting up Environment Modules package

2017-10-04 Thread Christopher Samuel

On 05/10/17 03:11, Mike Cammilleri wrote:

> 2. Install Environment Modules packages in a location visible to the
> entire cluster (NFS or similar), including the compute nodes, and the
> user then includes their 'module load' commands in their actual slurm
> submit scripts since the command would be available on the compute
> nodes - loading software (either local or from network locations
> depending on what they're loading) visible to the nodes

This is what we do, the management node for the cluster exports its
/usr/local read-only to the rest of the cluster.

We also have in our taskprolog.sh:

echo export BASH_ENV=/etc/profile.d/module.sh

to try and ensure that bash shells have modules set up, just in case. :-)

-- 
 Christopher SamuelSenior Systems Administrator
 Melbourne Bioinformatics - The University of Melbourne
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545


[slurm-dev] Re: Setting up Environment Modules package

2017-10-04 Thread Andy Riebs


We've had good luck putting the modules on an nfs-mounted file system. 
Along with that, suggest creating /etc/profile.d/zmodule.sh that contains


    module use /modules

then symlink /etc/profile.d/zmodule.csh to it, and set this up on all 
login and compute nodes.


Andy

On 10/04/2017 12:10 PM, Mike Cammilleri wrote:

Hi Everyone,

I'm in search of a best practice for setting up Environment Modules for our 
Slurm 16.05.6 installation (we have not had the time to upgrade to 17.02 yet). 
We're a small group and had no explicit need for this in the beginning, but as 
we are growing larger with more users we clearly need something like this.

I see there are a couple ways to implement Environment Modules and I'm 
wondering which would be the cleanest, most sensible way. I'll list my ideas 
below:

1. Install Environment Modules package and relevant modulefiles on the slurm 
head/submit/login node, perhaps in the default /usr/local/ location. The 
modulefiles modules would define paths to various software packages that exist 
in a location visible/readable to the compute nodes (NFS or similar). The user 
then loads the modules manually at the command line on the submit/login node 
and not in the slurm submit script - but specify #SBATCH --export=ALL and 
import the environment before submitting the sbatch job.

2. Install Environment Modules packages in a location visible to the entire 
cluster (NFS or similar), including the compute nodes, and the user then 
includes their 'module load' commands in their actual slurm submit scripts 
since the command would be available on the compute nodes - loading software 
(either local or from network locations depending on what they're loading) 
visible to the nodes

3. Another variation would be to use a configuration manager like bcfg2 to make 
sure Environment Modules and necessary modulefiles and all configurations are 
present on all compute/submit nodes. Seems like that's potential for a mess 
though.

Is there a preferred approach? I see in the archives some folks have strange 
behavior when a user uses --export=ALL, so it would seem to me that the cleaner 
approach is to have the 'module load' command available on all compute nodes 
and have users do this in their submit scripts. If this is the case, I'll need 
to configure Environment Modules and relevant modulefiles to live in special 
places when I build Environment Modules (./configure --prefix=/mounted-fs 
--modulefilesdir=/mounted-fs, etc.).

We've been testing with modules-tcl-1.923

Thanks for any advice,
mike