Thought I would revisit this one.
We are still having issues with this. It is not clear to me what is
leaving the user files behind in /dev/shm.
This is not something users are doing directly, they are just compiling
their code directly with mpif90 (from OpenMPI), using various compilers.
Compilers in use are PGI, Intel, SunStudio and Pathscale.
It looks like every job run leaves something behind in /dev/shm and it
slowly fills up. We are just clearing these out at this point.
Jeff Squyres wrote:
That is odd. Is your user's app crashing or being forcibly killed? The
ORTE daemon that is silently launched in v1.2 jobs should ensure that
files under /tmp/openmpi-sessions-<userid>@<hostname> are removed.
On Nov 10, 2008, at 2:14 PM, Ray Muno wrote:
Brock Palen wrote:
on most systems /dev/shm is limited to half the physical ram. Was
the user someone filling up /dev/shm so there was no space?
The problem is there is a large collection of stale files left in
there by the users that have run on that node (Rocks based cluster).
I am trying to determine why they are left behind.
--
Ray Muno
University of Minnesota
Aerospace Engineering and Mechanics