So what you are saying is *all* the ranks have entered MPI_Finalize and only a subset has exited per placing prints before and after MPI_Finalize. Good. So my guess is that the processes stuck in MPI_Finalize have a prior MPI request outstanding that for whatever reason is unable to complete. So I would first look at all the MPI requests and make sure they completed.

--td

On 10/25/2010 02:38 AM, Jack Bryan wrote:
thanks
I found a problem:

I used:

cout << " I am rank " << rank << " I am before MPI_Finalize()" << endl;
         MPI_Finalize();
cout << " I am rank " << rank << " I am after MPI_Finalize()" << endl;
         return 0;

I can get the output " I am rank 0 (1, 2, ....) I am before MPI_Finalize() ".

and
       " I am rank 0 I am after MPI_Finalize() "
But, other processes do not printed out "I am rank ... I am after MPI_Finalize()" .

It is weird. The process has reached the point just before MPI_Finalize(), why they are hanged there ?

Are there other better ways to check this ?

Any help is appreciated.

thanks

Jack

Oct. 25 2010

------------------------------------------------------------------------
From: solarbik...@gmail.com
Date: Sun, 24 Oct 2010 19:47:54 -0700
To: us...@open-mpi.org
Subject: Re: [OMPI users] Open MPI program cannot complete

how do you know all process call mpi_finalize? did you have all of them print out something before they call mpi_finalize? I think what Gustavo is getting at is maybe you had some MPI calls within your snippets that hangs your program, thus some of your processes never called mpi_finalize.

On Sun, Oct 24, 2010 at 6:59 PM, Jack Bryan <dtustud...@hotmail.com <mailto:dtustud...@hotmail.com>> wrote:

    Thanks,

    But, my code is too long to be posted.

    What are the common reasons of this kind of problems ?

    Any help is appreciated.

    Jack

    Oct. 24 2010

    > From: g...@ldeo.columbia.edu <mailto:g...@ldeo.columbia.edu>
    > Date: Sun, 24 Oct 2010 18:09:52 -0400

    > To: us...@open-mpi.org <mailto:us...@open-mpi.org>
    > Subject: Re: [OMPI users] Open MPI program cannot complete
    >
    > Hi Jack
    >
    > Your code snippet is too terse, doesn't show the MPI calls.
    > It is hard to guess what is the problem this way.
    >
    > Gus Correa
    > On Oct 24, 2010, at 5:43 PM, Jack Bryan wrote:
    >
    > > Thanks for the reply.
    > > But, I use mpi_waitall() to make sure that all MPI
    communications have been done before a process call MPI_Finalize()
    and returns.
    > >
    > > Any help is appreciated.
    > >
    > > thanks
    > >
    > > Jack
    > >
    > > Oct. 24 2010
    > >
    > > > From: g...@ldeo.columbia.edu <mailto:g...@ldeo.columbia.edu>
    > > > Date: Sun, 24 Oct 2010 17:31:11 -0400
    > > > To: us...@open-mpi.org <mailto:us...@open-mpi.org>
    > > > Subject: Re: [OMPI users] Open MPI program cannot complete
    > > >
    > > > Hi Jack
    > > >
    > > > It may depend on "do some things".
    > > > Does it involve MPI communication?
    > > >
    > > > Also, why not put MPI_Finalize();return 0 outside the ifs?
    > > >
    > > > Gus Correa
    > > >
    > > > On Oct 24, 2010, at 2:23 PM, Jack Bryan wrote:
    > > >
    > > > > Hi
    > > > >
    > > > > I got a problem of open MPI.
    > > > >
    > > > > My program has 5 processes.
    > > > >
    > > > > All of them can run MPI_Finalize() and return 0.
    > > > >
    > > > > But, the whole program cannot be completed.
    > > > >
    > > > > In the MPI cluster job queue, it is strill in running status.
    > > > >
    > > > > If I use 1 process to run it, no problem.
    > > > >
    > > > > Why ?
    > > > >
    > > > > My program:
    > > > >
    > > > > int main (int argc, char **argv)
    > > > > {
    > > > >
    > > > > MPI_Init(&argc, &argv);
    > > > > MPI_Comm_rank(MPI_COMM_WORLD, &myRank);
    > > > > MPI_Comm_size(MPI_COMM_WORLD, &mySize);
    > > > > MPI_Comm world;
    > > > > world = MPI_COMM_WORLD;
    > > > >
    > > > > if (myRank == 0)
    > > > > {
    > > > > do some things.
    > > > > }
    > > > >
    > > > > if (myRank != 0)
    > > > > {
    > > > > do some things.
    > > > > MPI_Finalize();
    > > > > return 0 ;
    > > > > }
    > > > > if (myRank == 0)
    > > > > {
    > > > > MPI_Finalize();
    > > > > return 0;
    > > > > }
    > > > >
    > > > > }
    > > > >
    > > > > And, some output files get wrong codes, which can not be
    readible.
    > > > > In 1-process case, the program can print correct results
    to these output files .
    > > > >
    > > > > Any help is appreciated.
    > > > >
    > > > > thanks
    > > > >
    > > > > Jack
    > > > >
    > > > > Oct. 24 2010
    > > > >
    > > > > _______________________________________________
    > > > > users mailing list
    > > > > us...@open-mpi.org <mailto:us...@open-mpi.org>
    > > > > http://www.open-mpi.org/mailman/listinfo.cgi/users
    > > >
    > > >
    > > > _______________________________________________
    > > > users mailing list
    > > > us...@open-mpi.org <mailto:us...@open-mpi.org>
    > > > http://www.open-mpi.org/mailman/listinfo.cgi/users
    > > _______________________________________________
    > > users mailing list
    > > us...@open-mpi.org <mailto:us...@open-mpi.org>
    > > http://www.open-mpi.org/mailman/listinfo.cgi/users
    >
    >
    > _______________________________________________
    > users mailing list
    > us...@open-mpi.org <mailto:us...@open-mpi.org>
    > http://www.open-mpi.org/mailman/listinfo.cgi/users

    _______________________________________________
    users mailing list
    us...@open-mpi.org <mailto:us...@open-mpi.org>
    http://www.open-mpi.org/mailman/listinfo.cgi/users




--
David Zhang
University of California, San Diego

_______________________________________________ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users


_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle * - Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com <mailto:terry.don...@oracle.com>



Reply via email to