> From: George Bosilca <bosi...@cs.utk.edu> > Subject: Re: [OMPI users] Error Handling Problem > To: Open MPI Users <us...@open-mpi.org> > Message-ID: <ef68521d-c116-4e75-8fac-5ce918e56...@cs.utk.edu> > Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed > > How about changing the default error handler ?
I did change the default error handler (using Mpi_Comm_set_errhandler) in the main_exe program. I replaced it with a printf. My error handler is never called, but main_exe receives a SIGPIPE signal. So the only solution I found is to catch SIGPIPE and forget it...> > It is not supposed to work, and if you find an MPI implementation > that support this approach please tell me. I know the paper > where you > read about this, but even with their MPI library this approach does > not work. which paper are you talking about ? > > Soon, Open MPI will be able to support this feature. Several fault > tolerant modes are under way, but no precise timeline yet. OK. I keep watching for new versions of Open MPI. Thanks, Laurent. > > Thanks, > george. > > On Oct 26, 2006, at 10:19 AM, laurent.po...@fr.thalesgroup.com wrote: > > > Hi, > > > > I developped a launcher application : > > a MPI application (say main_exe) lauches 2 MPI applications (say > > exe1 and exe2), using MPI_Comm_spawn_multiple. > > > > Now, I'm looking at the behavior when an exe crashes. > > > > What I can see is the following : > > 1) when everybody is launched, I see the followings processes, > > using 'ps' : > > - the 'mpiexec -v -d -n 1 ./main_exe' command > > - the orted server used for 'main_exe' (say 'orted1') > > - main_exe > > - the orted server used for 'exe1' and 'exe2' (say 'orted2') > > - exe1 > > - exe2 > > > > 2) I use kill -9 to 'crash' exe2 > > > > 3) orted2 and exe1 finish. > > > > 4) with ps, I see it remains the following processes : mpiexec, > > 'orted1', main_exe > > > > 5) main_exe tries to send a message to exe1, using MPI_Bsend : > > main_exe gets killed by a SIG_PIPE signal !!!! > > > > So what I see is that when a part of an MPI application crashes, > > the whole application crashes ! > > Is there a way to get an other behavior ? For exemple, MPI_Bsend > > could return an error message. > > > > A few additionnal informations : > > - I work on linux, with Open-MPI 1.1.1. > > - I'm developping in C and C++. > > > > Thanks, > > Laurent. >