Sorry for the delay in replying; this turned into a hectic week...

On Feb 4, 2009, at 11:28 AM, Hana Milani wrote:

Jeff, Thanks for helping me.

Is this a Fortran program, perchance?

Yes, it has been written by f77, but I have compiled it with gfortran. People have also done the same with no problem.

Do you have access to the source code?  I wonder if the program is
internally raising an error and effectively aborting itself.  Do you
know that the application runs correctly?  Do you have any test data
sets that you can try that give known outputs?

Yes, I have installed the source code. I have not been able to run the program in parallel, but I have run my inputs sequentially and got satisfactory results.

That's a good datapoint, but it's unfortunately not conclusive.

If you allow me, I can send the details of the code to your email.


If it's small and simple, sure. I'm afraid I don't have the time/ resources to investigate a large complex application that is misbehaving.

I don't have any more insights other than to re-state that *something* is killing your application with SIGTERM. It is *likely* some other entity on your node - a daemon or some other controller process. But it is also possible (although probably less likely) that the application is aborting itself.

Are you able to run *any* MPI applications (especially those compiled with Fortran) in parallel? E.g., the hello world and the ring programs in the examples/ subdirectory in the OMPI distribution?

--
Jeff Squyres
Cisco Systems

Reply via email to