I was using the latest trunk. Now that you raised the issue about the code ... I read it. You're right, for the server process (rank n-1 on MPI_COMM_WORLD) there are 2 calls to MPI_Comm_free for the counter_comm, and [obviously] the second one *should* fail. I'll take a look in the Open MPI code base to see where the problem is coming from.

  Thanks for the hint,
    george.

On Jun 21, 2007, at 1:42 AM, Anthony Chan wrote:


Forgot to mention, to get Jeff's program to work, I modified
MPE_Counter_free() to check for MPI_COMM_NULL, i.e.

if ( *counter_comm != MPI_COMM_NULL ) {    /* new line */
    MPI_Comm_rank( *counter_comm, &myid );
    ...
    MPI_Comm_free( counter_comm );
}                                          /* new line */

With above modification and MPI_Finalize in the main(). I was able to run
the program with OpenMPI (as well as mpich2).  Hope this helps.

A.Chan

On Thu, 21 Jun 2007, Anthony Chan wrote:


Hi George,

Just out of curiosity, what version of OpenMPI that you used works fine with Jeff's program (after adding MPI_Finalize)? The program aborts with either mpich2-1.0.5p4 or OpenMPI-1.2.3 on a AMD x86_64 box(Ubuntu 7.04)
because MPI_Comm_rank() is called with MPI_COMM_NULL.

With OpenMPI:
~/openmpi/install_linux64_123_gcc4_thd/bin/mpiexec -n 2 a.out
...
[octagon.mcs.anl.gov:23279] *** An error occurred in MPI_Comm_rank
[octagon.mcs.anl.gov:23279] *** on communicator MPI_COMM_WORLD
[octagon.mcs.anl.gov:23279] *** MPI_ERR_COMM: invalid communicator
[octagon.mcs.anl.gov:23279] *** MPI_ERRORS_ARE_FATAL (goodbye)

OpenMPI hangs at abort that I need to kill the mpiexec process by hand.
You can reproduce the hang with the following test program with
OpenMPI-1.2.3.

/homes/chan/tmp/tmp6> cat test_comm_rank.c
#include <stdio.h>
#include "mpi.h"

int main( int argc, char *argv[] )
{
    int myrank;

    MPI_Init( &argc, &argv );
    MPI_Comm_rank( MPI_COMM_NULL, &myrank );
    printf( "myrank = %d\n", myrank );
    MPI_Finalize();
    return 0;
}

Since mpiexec hangs, so it may be a bug somewhere in 1.2.3 release.

A.Chan



On Wed, 20 Jun 2007, George Bosilca wrote:

Jeff,

With the proper MPI_Finalize added at the end of the main function,
your program orks fine with the current version of Open MPI up to 32
processors. Here is the output I got for 4 processors:

I am 2 of 4 WORLD procesors
I am 3 of 4 WORLD procesors
I am 0 of 4 WORLD procesors
I am 1 of 4 WORLD procesors
Initial inttemp 1
Initial inttemp 0
final inttemp 0,0
0, WORLD barrier leaving routine
final inttemp 1,0
1, WORLD barrier leaving routine
Initial inttemp 2
final inttemp 2,0
2, WORLD barrier leaving routine
SERVER Got a DONE flag
Initial inttemp 3
final inttemp 3,0
3, WORLD barrier leaving routine

This output seems to indicate that the program is running to
completion and it does what you expect it to do.

Btw, what version of Open MPI are you using and on what kind of
hardware ?

   george.

On Jun 20, 2007, at 6:31 PM, Jeffrey L. Tilson wrote:

Hi,
ANL suggested I post this question to you. This is my second
posting......but now with the proper attachments.

From: Jeffrey Tilson <jltil...@nc.rr.com>
Date: June 20, 2007 5:17:50 PM PDT
To: mpich2-ma...@mcs.anl.gov, Jeffrey Tilson <jtil...@renci.org>
Subject: MPI question/problem


Hello All,
This will probably turn out to be my fault as I haven't used MPI in
a few years.

I am attempting to use an MPI implementation of a "nxtval" (see the
MPI book). I am using the client-server scenario. The MPI book
specifies the three functions required. Two are collective and one
is not. Only the  two collectives are tested in the supplied code.
All three of the MPI functions are reproduced in the attached code,
however.  I wrote a tiny application to create and free a counter
object and it fails.

I need to know if this is a bug in the MPI book and a
misunderstanding on my part.

The complete code is attached. I was using openMPI/intel to compile
and run.

The error I get is:

[compute-0-1.local:22637] *** An error occurred in MPI_Comm_rank
[compute-0-1.local:22637] *** on communicator MPI_COMM_WORLD
[compute-0-1.local:22637] *** MPI_ERR_COMM: invalid communicator
[compute-0-1.local:22637] *** MPI_ERRORS_ARE_FATAL (goodbye)
mpirun noticed that job rank 0 with PID 22635 on node
"compute-0-1.local" exited on signal 15.

I've attempted to google my way to understanding but with little
success. If someone could point me to
a sample application that actually uses these functions, I would
appreciate it.

Sorry if this is the wrong list, it is not an MPICH question and I
wasn't sure where to turn.

Thanks,
--jeff

------------------------------------------------------------------- ---
--

/* A beginning piece of code to perform large-scale web
construction.  */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "mpi.h"

typedef struct {
char description[1024];
double startwtime;
double endwtime;
double difftime;
} Timer;

/* prototypes */
int MPE_Counter_nxtval(MPI_Comm , int *);
int MPE_Counter_free( MPI_Comm *, MPI_Comm * );
void MPE_Counter_create( MPI_Comm , MPI_Comm *, MPI_Comm *);
/* End prototypes */

/* Globals */
        int          rank,numsize;

int main( argc, argv )
int argc;
char **argv;
{

        int i,j;
        MPI_Status   status;
        MPI_Request  r;
        MPI_Comm  smaller_comm,  counter_comm;


        int numtimings=0;
        int inttemp;
        int value=-1;
        int server;

//Init parallel environment

        MPI_Init( &argc, &argv );
        MPI_Comm_rank( MPI_COMM_WORLD, &rank );
        MPI_Comm_size( MPI_COMM_WORLD, &numsize );

        printf("I am %i of %i WORLD procesors\n",rank,numsize);
        server = numsize -1;

MPE_Counter_create( MPI_COMM_WORLD, &smaller_comm, &counter_comm );
        printf("Initial inttemp %i\n",rank);

        inttemp = MPE_Counter_free( &smaller_comm, &counter_comm );
        printf("final inttemp %i,%i\n",rank,inttemp);

        printf("%i, WORLD barrier leaving routine\n",rank);
        MPI_Barrier( MPI_COMM_WORLD );
}

//// Add new MPICH based shared counter.
//// grabbed from http://www-unix.mcs.anl.gov/mpi/usingmpi/ examples/
advanced/nxtval_create_c.htm

/* tag values */
#define REQUEST 0
#define GOAWAY  1
#define VALUE   2
#define MPE_SUCCESS 0

void MPE_Counter_create( MPI_Comm oldcomm, MPI_Comm * smaller_comm,
MPI_Comm * counter_comm )
{
    int counter = 0;
    int message, done = 0, myid, numprocs, server, color,ranks[1];
    MPI_Status status;
    MPI_Group oldgroup, smaller_group;

    MPI_Comm_size(oldcomm, &numprocs);
    MPI_Comm_rank(oldcomm, &myid);
    server = numprocs-1;     /*   last proc is server */
    MPI_Comm_dup( oldcomm, counter_comm ); /* make one new comm */
        if (myid == server) color = MPI_UNDEFINED;
        else color =0;
    MPI_Comm_split( oldcomm, color, myid, smaller_comm);

    if (myid == server) {       /* I am the server */
        while (!done) {
            MPI_Recv(&message, 1, MPI_INT, MPI_ANY_SOURCE,
MPI_ANY_TAG,
                     *counter_comm, &status );
            if (status.MPI_TAG == REQUEST) {
                MPI_Send(&counter, 1, MPI_INT, status.MPI_SOURCE,
VALUE,
                         *counter_comm );
                counter++;
            }
            else if (status.MPI_TAG == GOAWAY) {
                printf("SERVER Got a DONE flag\n");
                done = 1;
            }
            else {
                fprintf(stderr, "bad tag sent to MPE counter\n");
                MPI_Abort(*counter_comm, 1);
           }
        }
        MPE_Counter_free( smaller_comm, counter_comm );
    }
}

/*******************************/
int MPE_Counter_free( MPI_Comm *smaller_comm, MPI_Comm *
counter_comm )
{
    int myid, numprocs;

    MPI_Comm_rank( *counter_comm, &myid );
    MPI_Comm_size( *counter_comm, &numprocs );

    if (myid == 0)
MPI_Send(NULL, 0, MPI_INT, numprocs-1, GOAWAY, *counter_comm);

    MPI_Comm_free( counter_comm );

    if (*smaller_comm != MPI_COMM_NULL) {
        MPI_Comm_free( smaller_comm );
        }
    return 0;
}

/************************/
int MPE_Counter_nxtval(MPI_Comm counter_comm, int * value)
{
    int server,numprocs;
    MPI_Status status;

    MPI_Comm_size( counter_comm, &numprocs );
    server = numprocs-1;
    MPI_Send(NULL, 0, MPI_INT, server, REQUEST, counter_comm );
    MPI_Recv(value, 1, MPI_INT, server, VALUE, counter_comm,
&status );
    return 0;
}


_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to