Re: [slurm-users] Way MaxRSS should be interpreted

2018-04-17 Thread Gareth.Williams
I think the situation is likely to be a little different. Let’s consider a 
fortran program that statically or dynamically defines large arrays. This 
defines a virtual memory size – like declaring that this is the maximum amount 
of memory you might use if you fill the arrays. That amount of real memory + 
swap must be available for the program to run – after all, you might use that 
amount…  Speaking loosely, linux has a soft memory allocation policy so memory 
may not actually be allocated until it is used. If the program happens to read 
a smaller dataset and the arrays are not filled then the resident set size may 
be significantly smaller than the virtual memory size.  Further, memory swapped 
doesn’t count to the RSS so it might be even smaller. Effectively RSS for a 
process is the actual footprint in RAM. It will change over the life of the 
process/job and slurm will track the maximum (MaxRSS). I’d actually expect 
MaxRSS to be the maximum of the sum of RSS of known processes as sampled 
periodically through the job – but I’m guessing. This should apply reasonably 
to parallel jobs if the sum spans nodes (or it wouldn’t be the first batch 
system to only effectively account for the first allocated node). The whole 
linux memory tracking/accounting system has gotchas as shared memory (say for 
library code) has to be accounted for somewhere, but we can reasonably assume 
in HPC that memory use is dominated by unique computational working set data – 
so MaxRSS is a good estimate of how much RAM is needed to run a given job.

Gareth

From: slurm-users [mailto:slurm-users-boun...@lists.schedmd.com] On Behalf Of 
E.S. Rosenberg
Sent: Tuesday, 17 April 2018 10:42 PM
To: Slurm User Community List <slurm-users@lists.schedmd.com>
Subject: Re: [slurm-users] Way MaxRSS should be interpreted

Hi Loris,
Thanks for your explanation!
I would have interpreted as max(sum()).

Is there a way to get max(sum()) or at least sum form of sum()? The assumption 
that all processes are peaking at the same value is not a valid one unless all 
threads have essentially the same workload...
Thanks again!
Eli

On Tue, Apr 17, 2018 at 2:09 PM, Loris Bennett 
<loris.benn...@fu-berlin.de<mailto:loris.benn...@fu-berlin.de>> wrote:
Hi Eli,

"E.S. Rosenberg" 
<esr+slurm-...@mail.hebrew.edu<mailto:esr%2bslurm-...@mail.hebrew.edu>> writes:

> Hi fellow slurm users,
> We have been struggling for a while with understanding how MaxRSS is reported.
>
> This because jobs often die with MaxRSS not even approaching 10% of the 
> requested memory sometimes.
>
> I just found the following document:
> https://research.csc.fi/-/a
>
> It says:
> "maxrss = maximum amount of memory used at any time by any process in that 
> job. This applies directly for serial jobs. For parallel jobs you need to 
> multiply with the number of cores (max 16 or 24 as this is
> reported only for that node that used the most memory)"
>
> While 'man sacct' says:
> "Maximum resident set size of all tasks in job."
>
> Which explanation is correct? How should I be interpreting MaxRSS?

As far as I can tell, both explanations are correct, but the
text in 'man acct' is confusing.

  "Maximum resident set size of all tasks in job."

is analogous to

  "maximum height of all people in the room"

rather than

  "total height of all people in the room"

More specifically it means

  "Maximum individual resident set size out of the group of resident set
  sizes associated with all tasks in job."

It doesn't mean

  "Sum of the resident set sizes of all the tasks"

I'm a native English-speaker and I keep on stumbling over this in 'man
sacct' and then remembering that I have already worked out how it was
supposed to be interpreted.

My suggestion for improving this would be

  "Maximum individual resident set size of all resident set sizes
  associated with the tasks in job."

It's a little clunky, but I hope it is clearer.

Cheers,

Loris

--
Dr. Loris Bennett (Mr.)
ZEDAT, Freie Universität Berlin Email 
loris.benn...@fu-berlin.de<mailto:loris.benn...@fu-berlin.de>



Re: [slurm-users] Way MaxRSS should be interpreted

2018-04-17 Thread E.S. Rosenberg
Hi Loris,
Thanks for your explanation!
I would have interpreted as max(sum()).

Is there a way to get max(sum()) or at least sum form of sum()? The
assumption that all processes are peaking at the same value is not a valid
one unless all threads have essentially the same workload...
Thanks again!
Eli

On Tue, Apr 17, 2018 at 2:09 PM, Loris Bennett 
wrote:

> Hi Eli,
>
> "E.S. Rosenberg"  writes:
>
> > Hi fellow slurm users,
> > We have been struggling for a while with understanding how MaxRSS is
> reported.
> >
> > This because jobs often die with MaxRSS not even approaching 10% of the
> requested memory sometimes.
> >
> > I just found the following document:
> > https://research.csc.fi/-/a
> >
> > It says:
> > "maxrss = maximum amount of memory used at any time by any process in
> that job. This applies directly for serial jobs. For parallel jobs you need
> to multiply with the number of cores (max 16 or 24 as this is
> > reported only for that node that used the most memory)"
> >
> > While 'man sacct' says:
> > "Maximum resident set size of all tasks in job."
> >
> > Which explanation is correct? How should I be interpreting MaxRSS?
>
> As far as I can tell, both explanations are correct, but the
> text in 'man acct' is confusing.
>
>   "Maximum resident set size of all tasks in job."
>
> is analogous to
>
>   "maximum height of all people in the room"
>
> rather than
>
>   "total height of all people in the room"
>
> More specifically it means
>
>   "Maximum individual resident set size out of the group of resident set
>   sizes associated with all tasks in job."
>
> It doesn't mean
>
>   "Sum of the resident set sizes of all the tasks"
>
> I'm a native English-speaker and I keep on stumbling over this in 'man
> sacct' and then remembering that I have already worked out how it was
> supposed to be interpreted.
>
> My suggestion for improving this would be
>
>   "Maximum individual resident set size of all resident set sizes
>   associated with the tasks in job."
>
> It's a little clunky, but I hope it is clearer.
>
> Cheers,
>
> Loris
>
> --
> Dr. Loris Bennett (Mr.)
> ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
>
>


Re: [slurm-users] Way MaxRSS should be interpreted

2018-04-17 Thread Loris Bennett
Hi Eli,

"E.S. Rosenberg"  writes:

> Hi fellow slurm users,
> We have been struggling for a while with understanding how MaxRSS is reported.
>
> This because jobs often die with MaxRSS not even approaching 10% of the 
> requested memory sometimes.
>
> I just found the following document:
> https://research.csc.fi/-/a
>
> It says:
> "maxrss = maximum amount of memory used at any time by any process in that 
> job. This applies directly for serial jobs. For parallel jobs you need to 
> multiply with the number of cores (max 16 or 24 as this is
> reported only for that node that used the most memory)"
>
> While 'man sacct' says:
> "Maximum resident set size of all tasks in job."
>
> Which explanation is correct? How should I be interpreting MaxRSS?

As far as I can tell, both explanations are correct, but the
text in 'man acct' is confusing.

  "Maximum resident set size of all tasks in job."

is analogous to

  "maximum height of all people in the room"

rather than 

  "total height of all people in the room"

More specifically it means

  "Maximum individual resident set size out of the group of resident set
  sizes associated with all tasks in job."

It doesn't mean

  "Sum of the resident set sizes of all the tasks"

I'm a native English-speaker and I keep on stumbling over this in 'man
sacct' and then remembering that I have already worked out how it was
supposed to be interpreted.

My suggestion for improving this would be

  "Maximum individual resident set size of all resident set sizes
  associated with the tasks in job."

It's a little clunky, but I hope it is clearer.

Cheers,

Loris

-- 
Dr. Loris Bennett (Mr.)
ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de