Hi Gareth, Your assessment is also what I would have thought MaxRSS should be the maximum of the sum of all RSS in a sample, swap and shared memory does complicate things but I think most people expect jobs to only be killed if their RSS exceeds their memory request.
That being said as far as I understand the current slurm reporting mechanisms there is actually no way to get the total MaxRSS of a job but only of whatever step/subjob/thread was largest in memory. Thanks, Eli On Tue, Apr 17, 2018 at 4:03 PM, <gareth.willi...@csiro.au> wrote: > I think the situation is likely to be a little different. Let’s consider a > fortran program that statically or dynamically defines large arrays. This > defines a virtual memory size – like declaring that this is the maximum > amount of memory you might use if you fill the arrays. That amount of real > memory + swap must be available for the program to run – after all, you > might use that amount… Speaking loosely, linux has a soft memory > allocation policy so memory may not actually be allocated until it is used. > If the program happens to read a smaller dataset and the arrays are not > filled then the resident set size may be significantly smaller than the > virtual memory size. Further, memory swapped doesn’t count to the RSS so > it might be even smaller. Effectively RSS for a process is the actual > footprint in RAM. It will change over the life of the process/job and slurm > will track the maximum (MaxRSS). I’d actually expect MaxRSS to be the > maximum of the sum of RSS of known processes as sampled periodically > through the job – but I’m guessing. This should apply reasonably to > parallel jobs if the sum spans nodes (or it wouldn’t be the first batch > system to only effectively account for the first allocated node). The whole > linux memory tracking/accounting system has gotchas as shared memory (say > for library code) has to be accounted for somewhere, but we can reasonably > assume in HPC that memory use is dominated by unique computational working > set data – so MaxRSS is a good estimate of how much RAM is needed to run a > given job. > > > > Gareth > > > > *From:* slurm-users [mailto:slurm-users-boun...@lists.schedmd.com] *On > Behalf Of *E.S. Rosenberg > *Sent:* Tuesday, 17 April 2018 10:42 PM > *To:* Slurm User Community List <slurm-users@lists.schedmd.com> > *Subject:* Re: [slurm-users] Way MaxRSS should be interpreted > > > > Hi Loris, > > Thanks for your explanation! > > I would have interpreted as max(sum()). > > > > Is there a way to get max(sum()) or at least sum form of sum()? The > assumption that all processes are peaking at the same value is not a valid > one unless all threads have essentially the same workload... > > Thanks again! > > Eli > > > > On Tue, Apr 17, 2018 at 2:09 PM, Loris Bennett <loris.benn...@fu-berlin.de> > wrote: > > Hi Eli, > > "E.S. Rosenberg" <esr+slurm-...@mail.hebrew.edu> writes: > > > Hi fellow slurm users, > > We have been struggling for a while with understanding how MaxRSS is > reported. > > > > This because jobs often die with MaxRSS not even approaching 10% of the > requested memory sometimes. > > > > I just found the following document: > > https://research.csc.fi/-/a > > > > It says: > > "maxrss = maximum amount of memory used at any time by any process in > that job. This applies directly for serial jobs. For parallel jobs you need > to multiply with the number of cores (max 16 or 24 as this is > > reported only for that node that used the most memory)" > > > > While 'man sacct' says: > > "Maximum resident set size of all tasks in job." > > > > Which explanation is correct? How should I be interpreting MaxRSS? > > As far as I can tell, both explanations are correct, but the > text in 'man acct' is confusing. > > "Maximum resident set size of all tasks in job." > > is analogous to > > "maximum height of all people in the room" > > rather than > > "total height of all people in the room" > > More specifically it means > > "Maximum individual resident set size out of the group of resident set > sizes associated with all tasks in job." > > It doesn't mean > > "Sum of the resident set sizes of all the tasks" > > I'm a native English-speaker and I keep on stumbling over this in 'man > sacct' and then remembering that I have already worked out how it was > supposed to be interpreted. > > My suggestion for improving this would be > > "Maximum individual resident set size of all resident set sizes > associated with the tasks in job." > > It's a little clunky, but I hope it is clearer. > > Cheers, > > Loris > > -- > Dr. Loris Bennett (Mr.) > ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de > > >