On 20/09/14 02:02, Riccardo Murri wrote:

> The problem is that the job above has actually negligible heap use,
> *but* it allocates a SysV shared memory segment of about 100GB.  It
> seems that the size of this shared memory segment is counted towards
> *all* 4 processes in the job, instead of being counted just once.

My gut instinct is that this may be the result of how the kernel reports
memory usage for processes.  I can see that you are using cgroups for
task control, have you considered using cgroups for accounting as well?

We patched that job killing aspect out of Slurm 2.6.x as it would kill
any Open-MPI jobs we ran with mpirun (srun hurt performance too much)
because it was miscounting its memory usage.

As an experiment I'm running two identical NAMD jobs on 8 cores on the
same node, one with shared memory and the other using TCP to try and see
if that shows a difference in recorded memory usage between the two.
Might take a while to have something to report though. :-)

All the best,
Chris
-- 
 Christopher Samuel        Senior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/      http://twitter.com/vlsci

Reply via email to