> 
> On Oct 5, 2017, at 12:08 AM, Ole Holm Nielsen <ole.h.niel...@fysik.dtu.dk 
> <mailto:ole.h.niel...@fysik.dtu.dk>> wrote:
> 
> 
> On 10/04/2017 06:11 PM, Mike Cammilleri wrote:
>> I'm in search of a best practice for setting up Environment Modules for our 
>> Slurm 16.05.6 installation (we have not had the time to upgrade to 17.02 
>> yet). We're a small group and had no explicit need for this in the 
>> beginning, but as we are growing larger with more users we clearly need 
>> something like this.
>> I see there are a couple ways to implement Environment Modules and I'm 
>> wondering which would be the cleanest, most sensible way. I'll list my ideas 
>> below:
>> 1. Install Environment Modules package and relevant modulefiles on the slurm 
>> head/submit/login node, perhaps in the default /usr/local/ location. The 
>> modulefiles modules would define paths to various software packages that 
>> exist in a location visible/readable to the compute nodes (NFS or similar). 
>> The user then loads the modules manually at the command line on the 
>> submit/login node and not in the slurm submit script - but specify #SBATCH 
>> --export=ALL and import the environment before submitting the sbatch job.
>> 2. Install Environment Modules packages in a location visible to the entire 
>> cluster (NFS or similar), including the compute nodes, and the user then 
>> includes their 'module load' commands in their actual slurm submit scripts 
>> since the command would be available on the compute nodes - loading software 
>> (either local or from network locations depending on what they're loading) 
>> visible to the nodes
>> 3. Another variation would be to use a configuration manager like bcfg2 to 
>> make sure Environment Modules and necessary modulefiles and all 
>> configurations are present on all compute/submit nodes. Seems like that's 
>> potential for a mess though.
>> Is there a preferred approach? I see in the archives some folks have strange 
>> behavior when a user uses --export=ALL, so it would seem to me that the 
>> cleaner approach is to have the 'module load' command available on all 
>> compute nodes and have users do this in their submit scripts. If this is the 
>> case, I'll need to configure Environment Modules and relevant modulefiles to 
>> live in special places when I build Environment Modules (./configure 
>> --prefix=/mounted-fs --modulefilesdir=/mounted-fs, etc.).
>> We've been testing with modules-tcl-1.923
> 
> I strongly recommend uninstalling the Linux distro "environment-modules" 
> package because this old Tcl-based software hasn't been maintained for 5+ 
> years.  I recommend a very readable paper on various module systems:
> http://dl.acm.org/citation.cfm?id=2691141 
> <http://dl.acm.org/citation.cfm?id=2691141>

As was pointed out to me when I made a similar comment on another mailing list, 
the Tcl-based system is actively maintained - the repo simply moved. I’m not 
recommending either direction as it gets into religion rather quickly.

> 
> We use the modern and actively maintained Lmod modules developed at TACC 
> (https://www.tacc.utexas.edu/research-development/tacc-projects/lmod 
> <https://www.tacc.utexas.edu/research-development/tacc-projects/lmod>) 
> together with the EasyBuild module building system (a strong HPC community 
> effort, https://github.com/easybuilders/easybuild 
> <https://github.com/easybuilders/easybuild>).
> 
> I believe that the TACC supercomputer systems provide Slurm as a loadable 
> module, but I don't know any details.  We just install Slurm as RPMs on 
> CentOS 7.
> 
> We're extremely happy with Lmod and EasyBuild because of the simplicity with 
> which 1300+ modules are made available.  I've written a Wiki about how we 
> have installed this: https://wiki.fysik.dtu.dk/niflheim/EasyBuild_modules 
> <https://wiki.fysik.dtu.dk/niflheim/EasyBuild_modules>.  We put all of our 
> modules on a shared NFS file system for all nodes.
> 
> /Ole

Reply via email to