> > On Oct 5, 2017, at 12:08 AM, Ole Holm Nielsen <ole.h.niel...@fysik.dtu.dk > <mailto:ole.h.niel...@fysik.dtu.dk>> wrote: > > > On 10/04/2017 06:11 PM, Mike Cammilleri wrote: >> I'm in search of a best practice for setting up Environment Modules for our >> Slurm 16.05.6 installation (we have not had the time to upgrade to 17.02 >> yet). We're a small group and had no explicit need for this in the >> beginning, but as we are growing larger with more users we clearly need >> something like this. >> I see there are a couple ways to implement Environment Modules and I'm >> wondering which would be the cleanest, most sensible way. I'll list my ideas >> below: >> 1. Install Environment Modules package and relevant modulefiles on the slurm >> head/submit/login node, perhaps in the default /usr/local/ location. The >> modulefiles modules would define paths to various software packages that >> exist in a location visible/readable to the compute nodes (NFS or similar). >> The user then loads the modules manually at the command line on the >> submit/login node and not in the slurm submit script - but specify #SBATCH >> --export=ALL and import the environment before submitting the sbatch job. >> 2. Install Environment Modules packages in a location visible to the entire >> cluster (NFS or similar), including the compute nodes, and the user then >> includes their 'module load' commands in their actual slurm submit scripts >> since the command would be available on the compute nodes - loading software >> (either local or from network locations depending on what they're loading) >> visible to the nodes >> 3. Another variation would be to use a configuration manager like bcfg2 to >> make sure Environment Modules and necessary modulefiles and all >> configurations are present on all compute/submit nodes. Seems like that's >> potential for a mess though. >> Is there a preferred approach? I see in the archives some folks have strange >> behavior when a user uses --export=ALL, so it would seem to me that the >> cleaner approach is to have the 'module load' command available on all >> compute nodes and have users do this in their submit scripts. If this is the >> case, I'll need to configure Environment Modules and relevant modulefiles to >> live in special places when I build Environment Modules (./configure >> --prefix=/mounted-fs --modulefilesdir=/mounted-fs, etc.). >> We've been testing with modules-tcl-1.923 > > I strongly recommend uninstalling the Linux distro "environment-modules" > package because this old Tcl-based software hasn't been maintained for 5+ > years. I recommend a very readable paper on various module systems: > http://dl.acm.org/citation.cfm?id=2691141 > <http://dl.acm.org/citation.cfm?id=2691141>
As was pointed out to me when I made a similar comment on another mailing list, the Tcl-based system is actively maintained - the repo simply moved. I’m not recommending either direction as it gets into religion rather quickly. > > We use the modern and actively maintained Lmod modules developed at TACC > (https://www.tacc.utexas.edu/research-development/tacc-projects/lmod > <https://www.tacc.utexas.edu/research-development/tacc-projects/lmod>) > together with the EasyBuild module building system (a strong HPC community > effort, https://github.com/easybuilders/easybuild > <https://github.com/easybuilders/easybuild>). > > I believe that the TACC supercomputer systems provide Slurm as a loadable > module, but I don't know any details. We just install Slurm as RPMs on > CentOS 7. > > We're extremely happy with Lmod and EasyBuild because of the simplicity with > which 1300+ modules are made available. I've written a Wiki about how we > have installed this: https://wiki.fysik.dtu.dk/niflheim/EasyBuild_modules > <https://wiki.fysik.dtu.dk/niflheim/EasyBuild_modules>. We put all of our > modules on a shared NFS file system for all nodes. > > /Ole