Hi Ray

Are the jobs that leave files behind terminating normally or aborting? Are there any warnings/error messages out of mpirun?

Just trying to determine if this is an abnormal termination issue or a bug in OMPI itself.

Ralph


On Nov 19, 2008, at 8:05 AM, Ray Muno wrote:

Thought I would revisit this one.

We are still having issues with this. It is not clear to me what is leaving the user files behind in /dev/shm.

This is not something users are doing directly, they are just compiling their code directly with mpif90 (from OpenMPI), using various compilers. Compilers in use are PGI, Intel, SunStudio and Pathscale.

It looks like every job run leaves something behind in /dev/shm and it slowly fills up. We are just clearing these out at this point.


Jeff Squyres wrote:
That is odd. Is your user's app crashing or being forcibly killed? The ORTE daemon that is silently launched in v1.2 jobs should ensure that files under /tmp/openmpi-sessions- <userid>@<hostname> are removed.
On Nov 10, 2008, at 2:14 PM, Ray Muno wrote:
Brock Palen wrote:
on most systems /dev/shm is limited to half the physical ram. Was the user someone filling up /dev/shm so there was no space?

The problem is there is a large collection of stale files left in there by the users that have run on that node (Rocks based cluster).

I am trying to determine why they are left behind.



--

Ray Muno
University of Minnesota
Aerospace Engineering and Mechanics
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to