Thought I would revisit this one.

We are still having issues with this. It is not clear to me what is leaving the user files behind in /dev/shm.

This is not something users are doing directly, they are just compiling their code directly with mpif90 (from OpenMPI), using various compilers. Compilers in use are PGI, Intel, SunStudio and Pathscale.

It looks like every job run leaves something behind in /dev/shm and it slowly fills up. We are just clearing these out at this point.


Jeff Squyres wrote:
That is odd. Is your user's app crashing or being forcibly killed? The ORTE daemon that is silently launched in v1.2 jobs should ensure that files under /tmp/openmpi-sessions-<userid>@<hostname> are removed.


On Nov 10, 2008, at 2:14 PM, Ray Muno wrote:

Brock Palen wrote:
on most systems /dev/shm is limited to half the physical ram. Was the user someone filling up /dev/shm so there was no space?

The problem is there is a large collection of stale files left in there by the users that have run on that node (Rocks based cluster).

I am trying to determine why they are left behind.



--

 Ray Muno
 University of Minnesota
 Aerospace Engineering and Mechanics

Reply via email to