Hi Ian. Thanks for your response. To answer some of your questions:

* the behaviour is global --- it's affecting *every* core on *every* node, so 
it's not just a matter of uneven allocation of resources.

* I'm trying to get OpenMP to work on this machine, but my initial tests with 
it showed only modest improvements in memory efficiency (and didn't prevent 
this global memory increase)

* These aren't really very small memory effects --- the 1.5% and 3.0% are 
percentages of the *total* node memory. I was originally doing a vacuum + 
matter run, involving ~ twice as much memory to begin with. The doubling after 
first regridding then brought me to something close to 100% of the nominal 
available memory on the node, and it was enough to kill the run. I removed the 
matter components to see if they were misbehaving, but they're not (at least 
not any more than the vacuum). Obviously, I can just use more nodes, but I'm 
trying to understand the problem.

I'll try out SystemStatistics to see what it tells me.

Thanks,

Bernard

From: Ian Hinder <[email protected]<mailto:[email protected]>>
Date: Wednesday, September 11, 2013 4:23 AM
To: Bernard Kelly <[email protected]<mailto:[email protected]>>
Cc: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: Re: [Users] reported vs real memory usage + CarpetRegrid2?


On 10 Sep 2013, at 18:34, "Kelly, Bernard J. (GSFC-660.0)[UNIVERSITY OF 
MARYLAND BALTIMORE COUNTY]" 
<[email protected]<mailto:[email protected]>> wrote:

Hi.

I'm running a vacuum BHB evolution with a larger-than-usual set of inner
refinement regions (levels 8, 9, 10 11 have radii of 12M, 8M, 6M, and 4M,
respectively)  and consequently the memory usage is a bit higher than
normal. But I'm finding that it jumps up almost 100% after the first
regridding, and stays there.

My diagnostic for this is the result of top on each of the nodes (via
"qtop.pl", a script on the machine I'm using). Sampled before the first
regridding, it shows each core using ~ 1.5% of the node's total memory,
while after regridding, it's more like 2.9% (these are Sandy Bridge nodes,
with 16 available cores).

However, the periodic output message from Carpet reporting the Grid
structure etc. shows regions only marginally larger than before, and ---
crucially for me --- has a marginally larger "Total required memory" (164
GB -> 167 GB, for instance).

So (a) what's using the extra memory, and (b) why isn't Carpet reporting
it? How seriously should I be taking that "Total required memory" message?

I see this with executables generated from both the last (ET_2012_11) and
current (ET_2013_05) stable releases, BTW. I'm attaching the current
parameter file and SCROUT from a run.

I know there was a problem related to drastically increased memory usage, but I 
thought that this was introduced to the trunk after ET_2013_05, and that Erik 
had already fixed it.  There was another problem in (I believe) ET_2012_11 
where Carpet was always collapsing multiple grids on a refinement level into 
the smallest enclosing box, leading to huge memory usage, but that was also 
fixed, and I believe backported to ET_2012_11.  Have you tried using the 
SystemStatistics thorn to monitor memory usage?  This should be easier than 
using top on the nodes.

How is the grid distributed among the nodes?  Even if the total required memory 
is roughly constant, it's possible that the grid is distributed unevenly 
between nodes.  Is every node showing the increased memory usage? Given that 
you are still using a very small amount of memory, it's possible that Carpet is 
just "overallocating" on the first regridding as it anticipates that it might 
need more memory later, and memory allocation might be expensive.  The amount 
of overallocation is presumably small in comparison to the total available 
memory, but might be of the order of 1%, as you are seeing.

I wouldn't worry about this small amount of increased memory usage unless you 
can reproduce the problem on a more heavily-loaded system.  From your 
description, I suspect you are not using OpenMP.  Why is that?  Using pure MPI 
leads to an unnecessary memory overhead.

--
Ian Hinder
http://numrel.aei.mpg.de/people/hinder

_______________________________________________
Users mailing list
[email protected]
http://lists.einsteintoolkit.org/mailman/listinfo/users
  • Re... Ian Hinder
    • ... Kelly, Bernard J. (GSFC-660.0)[UNIVERSITY OF MARYLAND BALTIMORE COUNTY]

Reply via email to