[slurm-dev] Re: Setting up Environment Modules package

Bill Barth Thu, 05 Oct 2017 07:55:53 -0700

I could get into more details if necessary, but we do *not* provide SLURM as a 
module itself on our systems. It’s provided in the usual install location via 
RPMs that we build ourselves locally. We have some small modifications that we 
provide as part of our build, so it’s necessary to do this way. We also do 
*everything else* via our own RPMs, so it fits with our local methodology.


Best,
Bill.

-- 
Bill Barth, Ph.D., Director, HPC
bba...@tacc.utexas.edu        |   Phone: (512) 232-7069
Office: ROC 1.435            |   Fax:   (512) 475-9445
 
 

On 10/5/17, 2:07 AM, "Ole Holm Nielsen" <ole.h.niel...@fysik.dtu.dk> wrote:

    
    On 10/04/2017 06:11 PM, Mike Cammilleri wrote:
    > I'm in search of a best practice for setting up Environment Modules for 
our Slurm 16.05.6 installation (we have not had the time to upgrade to 17.02 
yet). We're a small group and had no explicit need for this in the beginning, 
but as we are growing larger with more users we clearly need something like 
this.
    > 
    > I see there are a couple ways to implement Environment Modules and I'm 
wondering which would be the cleanest, most sensible way. I'll list my ideas 
below:
    > 
    > 1. Install Environment Modules package and relevant modulefiles on the 
slurm head/submit/login node, perhaps in the default /usr/local/ location. The 
modulefiles modules would define paths to various software packages that exist 
in a location visible/readable to the compute nodes (NFS or similar). The user 
then loads the modules manually at the command line on the submit/login node 
and not in the slurm submit script - but specify #SBATCH --export=ALL and 
import the environment before submitting the sbatch job.
    > 
    > 2. Install Environment Modules packages in a location visible to the 
entire cluster (NFS or similar), including the compute nodes, and the user then 
includes their 'module load' commands in their actual slurm submit scripts 
since the command would be available on the compute nodes - loading software 
(either local or from network locations depending on what they're loading) 
visible to the nodes
    > 
    > 3. Another variation would be to use a configuration manager like bcfg2 
to make sure Environment Modules and necessary modulefiles and all 
configurations are present on all compute/submit nodes. Seems like that's 
potential for a mess though.
    > 
    > Is there a preferred approach? I see in the archives some folks have 
strange behavior when a user uses --export=ALL, so it would seem to me that the 
cleaner approach is to have the 'module load' command available on all compute 
nodes and have users do this in their submit scripts. If this is the case, I'll 
need to configure Environment Modules and relevant modulefiles to live in 
special places when I build Environment Modules (./configure 
--prefix=/mounted-fs --modulefilesdir=/mounted-fs, etc.).
    > 
    > We've been testing with modules-tcl-1.923
    
    I strongly recommend uninstalling the Linux distro "environment-modules" 
    package because this old Tcl-based software hasn't been maintained for 
    5+ years.  I recommend a very readable paper on various module systems:
    http://dl.acm.org/citation.cfm?id=2691141
    
    We use the modern and actively maintained Lmod modules developed at TACC 
    (https://www.tacc.utexas.edu/research-development/tacc-projects/lmod) 
    together with the EasyBuild module building system (a strong HPC 
    community effort, https://github.com/easybuilders/easybuild).
    
    I believe that the TACC supercomputer systems provide Slurm as a 
    loadable module, but I don't know any details.  We just install Slurm as 
    RPMs on CentOS 7.
    
    We're extremely happy with Lmod and EasyBuild because of the simplicity 
    with which 1300+ modules are made available.  I've written a Wiki about 
    how we have installed this: 
    https://wiki.fysik.dtu.dk/niflheim/EasyBuild_modules.  We put all of our 
    modules on a shared NFS file system for all nodes.
    
    /Ole

[slurm-dev] Re: Setting up Environment Modules package

Reply via email to