Hi Ray
Are the jobs that leave files behind terminating normally or aborting?
Are there any warnings/error messages out of mpirun?
Just trying to determine if this is an abnormal termination issue or a
bug in OMPI itself.
Ralph
On Nov 19, 2008, at 8:05 AM, Ray Muno wrote:
Thought I would revisit this one.
We are still having issues with this. It is not clear to me what is
leaving the user files behind in /dev/shm.
This is not something users are doing directly, they are just
compiling their code directly with mpif90 (from OpenMPI), using
various compilers. Compilers in use are PGI, Intel, SunStudio and
Pathscale.
It looks like every job run leaves something behind in /dev/shm and
it slowly fills up. We are just clearing these out at this point.
Jeff Squyres wrote:
That is odd. Is your user's app crashing or being forcibly
killed? The ORTE daemon that is silently launched in v1.2 jobs
should ensure that files under /tmp/openmpi-sessions-
<userid>@<hostname> are removed.
On Nov 10, 2008, at 2:14 PM, Ray Muno wrote:
Brock Palen wrote:
on most systems /dev/shm is limited to half the physical ram.
Was the user someone filling up /dev/shm so there was no space?
The problem is there is a large collection of stale files left in
there by the users that have run on that node (Rocks based cluster).
I am trying to determine why they are left behind.
--
Ray Muno
University of Minnesota
Aerospace Engineering and Mechanics
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users