Re: [slurm-users] slurm 17.11.2: Socket timed out on send/recv operation

2018-01-17 Thread Alessandro Federico
Hi John thanks for the infos. We are investigating the slowdown of sssd and I found some bug reports regarding slow sssd query with almost the same backtrace. Hopefully an update of sssd could solve this issue. We'll let you know if we found a solution. thanks ale - Original Message

Re: [slurm-users] Slurm and available libraries

2018-01-17 Thread Christopher Samuel
On 18/01/18 02:53, Loris Bennett wrote: This is all very OT, so it might be better to discuss it on, say, the OpenHPC mailing list, since as far as I can tell Spack, EasyBuild and Lmod (but not old or new 'environment-modules') are part of OpenHPC. Another place might be the Beowulf list, all

Re: [slurm-users] Slurm and available libraries

2018-01-17 Thread Loris Bennett
Hi Ole, Ole Holm Nielsen writes: > John: I would refrain from installing the old default package > "environment-modules" from the Linux distribution, since it doesn't > seem to be maintained any more. Is this still true? Here http://modules.sourceforge.net/

Re: [slurm-users] Best practice: How much node memory to specify in slurm.conf?

2018-01-17 Thread Christopher Samuel
On 18/01/18 01:52, Paul Edmon wrote: We've been typically taking 4G off the top for memory in our slurm.conf for the system and other processes.  This seems to work pretty well. Where I was working previously we'd discount the memory by the amount of GPFS page cache too, plus a little for

Re: [slurm-users] slurm 17.11.2: Socket timed out on send/recv operation

2018-01-17 Thread John DeSantis
Ale, > As Matthieu said it seems something related to SSS daemon. That was a great catch by Matthieu. > Moreover, only 3 SLURM partitions have the AllowGroups ACL Correct, which may seem negligent, but after each `scontrol reconfigure`, slurmctld restart, and/or AllowGroups= partition update,

Re: [slurm-users] Slurm and available libraries

2018-01-17 Thread Vanzo, Davide
Hi Bill! Always glad to contribute to the Lmod cause! ;) Back to the discussion, I simply gave my contribution based on how we set up our system. In no way I intended to say that that is the only way to deploy software. Yours is definitely a valid alternative, although it requires a deeper

Re: [slurm-users] Slurm and available libraries

2018-01-17 Thread Bill Barth
I’d go slightly further, though I do appreciate the Lmod shout-out!: In some cases, you may not even want the software on the frontend nodes (hear me out before I retract it). If it’s a library that requires linking against before it can be used, then you probably have to have it unless you

Re: [slurm-users] Slurm and available libraries

2018-01-17 Thread John Hearns
I should also say that Modules should be easy to install on Ubuntu. It will be the package named "environment-modules" You probably will have to edit the configuration file a little bit since the default install will assume al lModules files are local. You need to set your MODULESPATH to include

Re: [slurm-users] Slurm and available libraries

2018-01-17 Thread Ole Holm Nielsen
I can highly recommend EasyBuild as an easy way to provide software packages as "modules" to your cluster. We have been very pleased with EasyBuild in our cluster. I made some notes about installing EasyBuild in a Wiki page: https://wiki.fysik.dtu.dk/niflheim/EasyBuild_modules We use CentOS

Re: [slurm-users] Slurm and available libraries

2018-01-17 Thread Vanzo, Davide
Ciao Elisabetta, I second John's reply. On our cluster we install software on the shared parallel filesystem with EasyBuild and use Lmod as a module front-end. Then users will simply load software in the job's environment by using the module command. Feel free to ping me directly if you need

Re: [slurm-users] Slurm and available libraries

2018-01-17 Thread John Hearns
Hi Elisabetta. No, you normally do not need to install software on all the compute nodes separately. It is quite common to use the 'modules' environment to manage software like this http://www.admin-magazine.com/HPC/Articles/Environment-Modules Once you have numpy installed on a shared drive on

[slurm-users] Slurm and available libraries

2018-01-17 Thread Elisabetta Falivene
Hi, let's say I need to execute a python script with slurm. The script require a particular library installed on the system like numpy. If the library is not installed to the system, it is necessary to install it on the master AND the nodes, right? This has to be done on each machine separately or

Re: [slurm-users] Slurm not starting

2018-01-17 Thread Elisabetta Falivene
Ciao Gennaro! > > *NodeName=node[01-08] CPUs=16 RealMemory=16000 State=UNKNOWN* > > to > > *NodeName=node[01-08] CPUs=16 RealMemory=15999 State=UNKNOWN* > > > > Now, slurm works and the nodes are running. There is only one minor > problem > > > > *error: Node node04 has low real_memory size

Re: [slurm-users] Best practice: How much node memory to specify in slurm.conf?

2018-01-17 Thread Bjørn-Helge Mevik
I tend to run a test program on an otherwise idle node, allocating (and actually using!) more and more memory, and then see when it starts swapping. I typically end up with between 1 and 1.5 GiB less than what "free" reports as the total memory. -- Regards, Bjørn-Helge Mevik, dr. scient,