[slurm-dev] unsubscribe

2016-12-15 Thread Gary Brown
[image: Adaptive Computing] [image: Twitter] [image: LinkedIn] [image: YouTube]

[slurm-dev] Re: SLURM reports much higher memory usage than really used

2016-12-15 Thread Christopher Samuel
On 16/12/16 10:33, Kilian Cavalotti wrote: > I remember Danny recommending to use jobacct_gather/linux over > jobacct_gather/cgroup, because "cgroup adds quite a bit of overhead > with very little benefit". > > Did that change? We took that advice but reverted because of this issue (from

[slurm-dev] Re: SLURM reports much higher memory usage than really used

2016-12-15 Thread Kilian Cavalotti
On Thu, Dec 15, 2016 at 11:47 PM, Douglas Jacobsen wrote: > > There are other good reasons to use jobacct_gather/cgroup, in particular if > memory enforcement is used, jobacct_gather/linux will cause a job to be > terminated if the summed memory exceeds the limit, which is OK

[slurm-dev] Re: SLURM reports much higher memory usage than really used

2016-12-15 Thread Douglas Jacobsen
There are other good reasons to use jobacct_gather/cgroup, in particular if memory enforcement is used, jobacct_gather/linux will cause a job to be terminated if the summed memory exceeds the limit, which is OK so long as large memory processes aren't forking and artificially increasing the

[slurm-dev] Re: SLURM reports much higher memory usage than really used

2016-12-15 Thread Christopher Samuel
On 16/12/16 02:15, Stefan Doerr wrote: > If I check on "top" indeed it shows all processes using the same amount > of memory. Hence if I spawn 10 processes and you sum usages it would > look like 10x the memory usage. Do you have: JobAcctGatherType=jobacct_gather/linux or:

[slurm-dev] Re: SLURM reports much higher memory usage than really used

2016-12-15 Thread Benjamin Redling
Am 15. Dezember 2016 14:48:24 MEZ, schrieb Stefan Doerr : >$ sinfo --version >slurm 15.08.11 > >$ sacct --format="CPUTime,MaxRSS" -j 72491 > CPUTime MaxRSS >-- -- > 00:27:06 > 00:27:06 37316236K > > >I will have to ask the sysadms about cgroups since

[slurm-dev] Re: SLURM reports much higher memory usage than really used

2016-12-15 Thread Stefan Doerr
I decided to test it locally. So I ran the exact batch script that I run on SLURM on my machine and monitored max memory usage with time -v The first prints that you see is what python is reporting as RSS and Virtual memory being used currently. As you can see it maxes out at 1.7GB rss which

[slurm-dev] Re: SLURM reports much higher memory usage than really used

2016-12-15 Thread Stefan Doerr
$ sinfo --version slurm 15.08.11 $ sacct --format="CPUTime,MaxRSS" -j 72491 CPUTime MaxRSS -- -- 00:27:06 00:27:06 37316236K I will have to ask the sysadms about cgroups since I'm just a user here. On Thu, Dec 15, 2016 at 12:05 PM, Merlin Hartley <

[slurm-dev] Re: SLURM reports much higher memory usage than really used

2016-12-15 Thread Merlin Hartley
I’ve been having the very same problem since I tried to enable Accounting (slurmdbd) - so I have now had to disable accounting. It would seem therefore that this part of the documentation should be updated: https://slurm.schedmd.com/archive/slurm-15.08-latest/accounting.html "To enable any limit

[slurm-dev] How to debug NODE_FAIL?

2016-12-15 Thread Rafael Kioji
Dear all, Where can I find the *log *when my job fails with message: NODE_FAIL? One of my programs constantly receives this message (after hours of execution) but I couldn't figure out what is actually causing it. -- -- Att., Rafael

[slurm-dev] Re: SLURM reports much higher memory usage than really used

2016-12-15 Thread Uwe Sauter
You are correct. Which version do you run? Do you have cgroups enabled? Can you enable debugging for slurmd on the nodes? The output should contain what Slurm calculates as maximum memory for a job. One other option is do configure MemLimitEnforce=no (which defaults to yes since 14.11). Am

[slurm-dev] Re: SLURM reports much higher memory usage than really used

2016-12-15 Thread Stefan Doerr
But this doesn't answer my question why it reports 10 times as much memory usage as it is actually using, no? On Wed, Dec 14, 2016 at 1:00 PM, Uwe Sauter wrote: > > There are only two memory related options "--mem" and "--mem-per-cpu". > > --mem tells slurm the memory