Unfortunately, I don't recall the details. I did find an article on the
web, but this was back around February.
In a nutshell, our slurmctld was mysteriously crashing on CentOS 6.5. I
think someone on this list pointed me to the Linux kernel issue, so it
might be in the archives.
After I increased the memory limit from 2G to 10G, the problem ceased.
I now have the following in /etc/security/limits.d/91-as.conf on our
controller nodes:
* soft as 16777216
* hard as 16777216
* soft memlock 16777216
* hard memlock 16777216
Slurmctld has been rock solid since this change. This cluster has 1136
cores, BTW.
On 08/18/14 14:58, Marcin Stolarek wrote:
Re: [slurm-dev] How to size the controller systems
W dniu poniedziaĆek, 18 sierpnia 2014 Jason Bacon <[email protected]
<mailto:[email protected]>> napisaĆ(a):
The controller generally shouldn't require much, but if you're
running Linux, be aware that the way memory use is measured in
recent kernels makes it look like slurmctld is using a lot of RAM
Can you point me to detailed information about that ? How is the
memory measured?
when multiple threads are active. I had to up the per-process
limit to 10G on our CentOS 6.5 controller nodes, even though
slurmctld was using less than 1G in reality.
Regards,
Jason
On 8/18/14 1:08 PM, Louis Capps wrote:
Hi,
We are looking at using SLURM for a large 6000 node cluster and
need more info on the support systems. Can you point me to a
sizing guide or info on the requirements for the primary and
backup controllers for SLURM including CPU, memory and local disk
requirements?
Thx,
Louis
*******************************************************************************************
Louis Capps ([email protected]
<javascript:_e(%7B%7D,'cvml','[email protected]');>)
--- Systems Architect - Federal High Performance Computing - US
Federal IMT - IBM Corporation
--- Office (512)286-5556, t/l 363-5556 --- fax 678-6146 ---
cell (512)796-4501
--- Bld 045, 3C80, Austin, TX
http://www-1.ibm.com/servers/deepcomputing/
http://www-03.ibm.com/systems/clusters/
*******************************************************************************************
--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Jason W. Bacon
[email protected] <javascript:_e(%7B%7D,'cvml','[email protected]');>
Circumstances don't make a man:
They reveal him.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Jason W. Bacon
[email protected]
Circumstances don't make a man:
They reveal him.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~