[slurm-dev] Cgroups and memory accounting

2015-12-14 Thread Felip Moll
TaskPlugin=task/cgroup SelectTypeParameters=CR_Core_Memory JobAcctGatherFrequency=15 JobAcctGatherType=jobacct_gather/linux cgroup.conf: CgroupAutomount=yes CgroupReleaseAgentDir="/etc/slurm/cgroup" ConstrainCores=yes ConstrainRAMSpace=yes Thank you! *--Felip Moll Marquès* Comput

[slurm-dev] cgroups and memory accounting

2015-12-14 Thread Felip Moll
="/etc/slurm/cgroup" ConstrainCores=yes ConstrainRAMSpace=yes Thank you! *--Felip Moll Marquès* Computer Science Engineer E-Mail - lip...@gmail.com WebPage - http://lipix.ciutadella.es

[slurm-dev] Re: cgroups and memory accounting

2015-12-18 Thread Felip Moll
process 18956 (R) total-vm:53184568kB, anon-rss:45767588kB, file-rss:5447628kB *--Felip Moll Marquès* Computer Science Engineer E-Mail - lip...@gmail.com WebPage - http://lipix.ciutadella.es 2015-12-15 10:50 GMT+01:00 Bjørn-Helge Mevik : > > Felip Moll writes: > > > On one

[slurm-dev] Re: cgroups and memory accounting

2015-12-18 Thread Felip Moll
1154 372 80 0 gzip nov 25 15:47:05 cn23 kernel: Memory cgroup out of memory: Kill process 15011 (ocean_pp.bash) score 0 or sacrifice child nov 25 15:47:05 cn23 kernel: Killed process 32554 (gzip) total-vm:4616kB, anon-rss:216kB, file-rss:1272kB *--Felip Moll Marquès* Co

[slurm-dev] Re: cgroups and memory accounting

2016-01-22 Thread Felip Moll
lip M *--Felip Moll Marquès* Computer Science Engineer E-Mail - lip...@gmail.com WebPage - http://lipix.ciutadella.es 2015-12-18 15:09 GMT+01:00 Bjørn-Helge Mevik : > > Carlos Fenoy writes: > > > Barbara, I don't think that is the issue here. The killer is the OOM not &

[slurm-dev] Slurm SPANK X11 plugin - unable to connect node

2016-01-22 Thread Felip Moll
is plugin and have the same problem. Regards, Felip M *--Felip Moll Marquès* Computer Science Engineer E-Mail - lip...@gmail.com WebPage - http://lipix.ciutadella.es

[slurm-dev] Re: Confusion with init/systemd files

2016-02-04 Thread Felip Moll
urmctld daemons. Regards, Felip M *--Felip Moll Marquès* Computer Science Engineer E-Mail - lip...@gmail.com WebPage - http://lipix.ciutadella.es 2016-02-03 20:25 GMT+01:00 Cooper, Trevor : > > Jeff, > > You might want to start with the Slurm overview page[1] and quick start > admin guid

[slurm-dev] SLURM_JOB_MEMORY ?

2016-04-11 Thread Felip Moll
o you suggest me to do to set the environment with this info? He need this information to configure on run-time the heap, etc. Regards, Felip M *--Felip Moll Marquès* Computer Science Engineer E-Mail - lip...@gmail.com WebPage - http://lipix.ciutadella.es

[slurm-dev] tres_table in slurm_acct_db

2016-05-20 Thread Felip Moll
1. To do this, actually values of tres_alloc column must be parse externaly to mysql. Regards, Felip M *--Felip Moll Marquès* Computer Science Engineer E-Mail - lip...@gmail.com WebPage - http://lipix.ciutadella.es

[slurm-dev] Re: tres_table in slurm_acct_db

2016-05-20 Thread Felip Moll
A very ugly workaround for having cpus_alloc column would be: create view custom_job_table as select *,substring_index(substring_index(tres_alloc,',',1),'1=',-1) as cpus_alloc from _job_table; *--Felip Moll Marquès* Computer Science Engineer E-Mail - lip...@gmai

[slurm-dev] WallClock time limit updated when scontrol update job xxx qos=yyy

2016-10-18 Thread Felip Moll
o lowprio, wallclock is switched to 7 days. In my opinion, timelimit should not be changed when updating qos job unless explicitly told. Is it a correct behaviour in slurm 15.08.10? -- Felip Moll Marquès Computer Science Engineer E-Mail - lip...@gmail.com WebPage - http://lipix.ciutadella.es

[slurm-dev] Re: Announce: Infiniband topology tool "slurmibtopology.sh" version 0.1

2017-05-08 Thread Felip Moll
SwitchName=cmc1 Nodes=nva[1-9] SwitchName=cmc2 Nodes=nva[10-18] SwitchName=cmc3 Nodes=nva[19-27] SwitchName=cmc4 Nodes=nva[28-36] SwitchName=cmc5 Nodes=nva[37-45] SwitchName=cmc6 Nodes=nva[46-54] SwitchName=cmc7 Nodes=nva[55-61] SwitchName=ibswfdr Nodes=nvb[1-39] SwitchName=troncal Switches=cmc[1-7],ibsw

[slurm-dev] Re: Communication error

2017-05-08 Thread Felip Moll
Do you have any kind of firewall in your network? I would suggest it is a problem with dates but since you tested munge -n we could discard that. Can you anyway do a pdsh -w compute-* date |dshbak -c ? Can you show nodes slurmd log output? *--Felip Moll Marquès* Computer Science Engineer E

[slurm-dev] Re: Is there anyway to commit job with different user?

2017-05-16 Thread Felip Moll
It is not possible, at least in a supported way. The first requirement of the admin guide tells: 1. Make sure the clocks, users and groups (UIDs and GIDs) are synchronized across the cluster. From: https://slurm.schedmd.com/quickstart_admin.html *--Felip Moll Marquès* Computer

[slurm-dev] RE: Suggestions on node memory cleaning

2017-07-07 Thread Felip Moll
I do it in the epilog. When entering the epilog for the last job on the node, I do a drop caches, clean /dev/shm, etc. Since epilog is run as root there is no need to use sudo. You could do that also on the prolog, just check if there are other jobs running on the node. br Felip M On 5 Apr 2017

[slurm-dev] Re: Overview

2013-04-25 Thread Felip Moll
6) Current Job Memory allocation for nodes I am currently looking for options in sstat, sinfo, scontrol.. but I can't find how to see the total reserved memory for one particular node. In sview, "nodes" tab, you can see how many cpus are used/free for each node, but not how many memory. Thks!.

[slurm-dev] Re: Overview

2013-04-26 Thread Felip Moll
This is my scontrol show node: NodeName=pez015 Arch=x86_64 CoresPerSocket=6 CPUAlloc=12 CPUErr=0 CPUTot=12 Features=(null) Gres=(null) NodeAddr=pez015 NodeHostName=pez015 OS=Linux RealMemory=48128 Sockets=2 State=ALLOCATED ThreadsPerCore=1 TmpDisk=61440 Weight=1 BootTime=2013-03-

[slurm-dev] Re: Overview

2013-04-26 Thread Felip Moll
This will give you the computer cores allocated (#3): scontrol show node | grep CPUAlloc | cut -d" " -f 4 | sed 's/CPUAlloc=//g' | awk '{total = total + $1}END{print total}' But of course not very useful.. I would love also a good summary tool. 2013/4/26 Mario Kadastik > > The thread has so