Thank you very much! Seems like there is an agreement that jobacct_gather/linux
will sum up the shared memory which is very probably the cause of my
problem.
We are switching now to jobacct_gather/cgroup to see if it will count
shared memory correctly.
I'll report back with the results.
On Fri,
Christopher Samuel writes:
> On 16/12/16 02:15, Stefan Doerr wrote:
>
>> If I check on "top" indeed it shows all processes using the same amount
>> of memory. Hence if I spawn 10 processes and you sum usages it would
>> look like 10x the memory usage.
>
> Do you have:
>
>
On 16/12/16 10:33, Kilian Cavalotti wrote:
> I remember Danny recommending to use jobacct_gather/linux over
> jobacct_gather/cgroup, because "cgroup adds quite a bit of overhead
> with very little benefit".
>
> Did that change?
We took that advice but reverted because of this issue (from
On Thu, Dec 15, 2016 at 11:47 PM, Douglas Jacobsen wrote:
>
> There are other good reasons to use jobacct_gather/cgroup, in particular if
> memory enforcement is used, jobacct_gather/linux will cause a job to be
> terminated if the summed memory exceeds the limit, which is OK
There are other good reasons to use jobacct_gather/cgroup, in particular
if memory enforcement is used, jobacct_gather/linux will cause a job to
be terminated if the summed memory exceeds the limit, which is OK so
long as large memory processes aren't forking and artificially
increasing the
On 16/12/16 02:15, Stefan Doerr wrote:
> If I check on "top" indeed it shows all processes using the same amount
> of memory. Hence if I spawn 10 processes and you sum usages it would
> look like 10x the memory usage.
Do you have:
JobAcctGatherType=jobacct_gather/linux
or:
Am 15. Dezember 2016 14:48:24 MEZ, schrieb Stefan Doerr :
>$ sinfo --version
>slurm 15.08.11
>
>$ sacct --format="CPUTime,MaxRSS" -j 72491
> CPUTime MaxRSS
>-- --
> 00:27:06
> 00:27:06 37316236K
>
>
>I will have to ask the sysadms about cgroups since
I decided to test it locally. So I ran the exact batch script that I run on
SLURM on my machine and monitored max memory usage with time -v
The first prints that you see is what python is reporting as RSS and
Virtual memory being used currently. As you can see it maxes out at 1.7GB
rss which
$ sinfo --version
slurm 15.08.11
$ sacct --format="CPUTime,MaxRSS" -j 72491
CPUTime MaxRSS
-- --
00:27:06
00:27:06 37316236K
I will have to ask the sysadms about cgroups since I'm just a user here.
On Thu, Dec 15, 2016 at 12:05 PM, Merlin Hartley <
I’ve been having the very same problem since I tried to enable Accounting
(slurmdbd) - so I have now had to disable accounting.
It would seem therefore that this part of the documentation should be updated:
https://slurm.schedmd.com/archive/slurm-15.08-latest/accounting.html
"To enable any limit
You are correct. Which version do you run? Do you have cgroups enabled? Can you
enable debugging for slurmd on the nodes? The
output should contain what Slurm calculates as maximum memory for a job.
One other option is do configure MemLimitEnforce=no (which defaults to yes
since 14.11).
Am
But this doesn't answer my question why it reports 10 times as much memory
usage as it is actually using, no?
On Wed, Dec 14, 2016 at 1:00 PM, Uwe Sauter wrote:
>
> There are only two memory related options "--mem" and "--mem-per-cpu".
>
> --mem tells slurm the memory
There are only two memory related options "--mem" and "--mem-per-cpu".
--mem tells slurm the memory requirement of the job (if used with sbatch) or
the step (if used with srun). But not the requirement
of each process.
--mem-per-cpu is used in combination with --ntasks and --cpus-per-task. If
13 matches
Mail list logo