Re: [OMPI users] questions about attribute caching

2018-12-16 Thread 邹海峰
Thank you for your patience.

Best wishes !


Gilles Gouaillardet  于2018年12月16日周日
下午11:24写道:

> Almost :-)
>
> seqBegin() and seqEnd() takes a communicator as a parameter.
> So attribute caching is good for performance (only) if more than one
> sequence *per communicator* is used in the application.
>
> Cheers,
>
> Gilles
>
> On Sun, Dec 16, 2018 at 11:38 PM 邹海峰  wrote:
> >
> > Thank you very much for the reply.
> >
> > According to your explanation and the content from the website, if there
> is only one sequential execution in the program, then it doesn't matter
> whether using the attribute. But if there are multiple sequential
> execution,  each process only needs to use MPI_Comm_dup() once with the
> help of attribute, instead of duplicating the MPI_COMM_WORLD whenever there
> is a  sequential execution. Am I right?
> >
> > Best wishes!
> >
> > Gilles Gouaillardet  于2018年12月16日周日
> 上午8:14写道:
> >>
> >> Hi,
> >>
> >> Your understanding is incorrect :
> >> "Attributes are local to the process and specific to the communicator
> >> to which they are attached."
> >> (per
> https://www.mcs.anl.gov/research/projects/mpi/mpi-standard/mpi-report-1.1/node119.htm
> )
> >>
> >> think an attribute is often a pointer, and really bad things can
> >> happen if rank 0 uses a pointer that is valid on rank 1,
> >> so if attributes were global, they would be virtually unusable.
> >>
> >> note the comment before the barrier is incorrect. this is a simple
> >> barrier and all MPI tasks will block until all of them invoke seqEnd()
> >>
> >> The main goal of using attributes in this example is to invoke
> >> MPI_Comm_dup() once per communicator (instead of once per sequence,
> >> since this is an expensive operation).
> >>
> >>
> >> Cheers,
> >>
> >> Gilles
> >> On Sun, Dec 16, 2018 at 1:04 AM 邹海峰  wrote:
> >> >
> >> > Hi there,
> >> >
> >> > At first, I think attribute is just like global variable that
> attached to a specific communicator. I can define and set the value on one
> process, then get and modify the value on another process as long as those
> processes belonging to the same communicator. But when I was reading
> chapter 6 of the book using mpi: portable parallel programming with the
> message-passing interface.  I was confused by the usage of caching
> attribute.
> >> >
> >> > The purpose the code is to make the execution sequential. the main
> function is
> >> >
> >> >   seqBegin( MPI_COMM_WORLD );
> >> >   printf( "My rank is %d\n", wrank );
> >> >   fflush( stdout );
> >> >   seqEnd( MPI_COMM_WORLD );
> >> >
> >> > which is simple to understand. The program will print the rank in
> order. The defination of the function "seqBegin()" is
> >> >
> >> > static int seqKeyval = MPI_KEYVAL_INVALID;
> >> >
> >> > void seqBegin( MPI_Comm comm )
> >> > {
> >> >   MPI_Comm lcomm;
> >> >   int  flag, mysize, myrank;
> >> >   seqInfo  *info;
> >> >   if (seqKeyval == MPI_KEYVAL_INVALID) {
> >> > MPI_Comm_create_keyval( MPI_NULL_COPY_FN, seqDelFn, ,
> NULL );
> >> >   }
> >> >   MPI_Comm_get_attr( comm, seqKeyval, ,  );
> >> >   if (!flag) {
> >> > info = (seqInfo *)malloc( sizeof(seqInfo) );
> >> > MPI_Comm_dup( comm, >lcomm );
> >> > MPI_Comm_rank( info->lcomm,  );
> >> > MPI_Comm_size( info->lcomm,  );
> >> > info->prevRank = myrank - 1;
> >> > if (info->prevRank < 0)   info->prevRank = MPI_PROC_NULL;
> >> > info->nextRank = myrank + 1;
> >> > if (info->nextRank >= mysize) info->nextRank = MPI_PROC_NULL;
> >> > if (verbose) {
> >> >   printf( "seqbegin: prev = %d, next = %d\n",
> >> >   info->prevRank, info->nextRank );
> >> > }
> >> > MPI_Comm_set_attr( comm, seqKeyval, info );
> >> >   }
> >> >   MPI_Recv( NULL, 0, MPI_INT, info->prevRank, 0, info->lcomm,
> >> > MPI_STATUS_IGNORE );
> >> > }
> >> >
> >> > and the defination of function "seqEnd()" is
> >> >
> >> > void seqEnd( MPI_Comm comm )
> >> > {
> >> >   seqInfo *info;
> >> >   int flag;
> >> >
> >> >   /* Sanity check */
> >> >   if (seqKeyval == MPI_KEYVAL_INVALID)
> >> > MPI_Abort( MPI_COMM_WORLD, 1 );
> >> >   MPI_Comm_get_attr( comm, seqKeyval, ,  );
> >> >   if (!info || !flag)
> >> > MPI_Abort( MPI_COMM_WORLD, 1 );
> >> >   if (verbose) {
> >> > printf( "seqend: prev = %d, next = %d\n",
> >> > info->prevRank, info->nextRank );
> >> >   }
> >> >   MPI_Send( NULL, 0, MPI_INT, info->nextRank, 0, info->lcomm );
> >> >
> >> >   /* Make everyone wait until all have completed their send */
> >> >   MPI_Barrier( info->lcomm );
> >> > }
> >> >
> >> > Other details are omitted. In fact, all the codes can be found in
> https://www.mcs.anl.gov/research/projects/mpi/usingmpi/examples-usingmpi/libraries/index.html
> which is provided by the author of the book.
> >> >
> >> > The program uses send and recv to block the execution. Only if the
> process receive the message from last process, the process can continue to
> execute, otherwise it is blocked, which resulting in the sequential
> 

Re: [OMPI users] questions about attribute caching

2018-12-16 Thread Gilles Gouaillardet
Almost :-)

seqBegin() and seqEnd() takes a communicator as a parameter.
So attribute caching is good for performance (only) if more than one
sequence *per communicator* is used in the application.

Cheers,

Gilles

On Sun, Dec 16, 2018 at 11:38 PM 邹海峰  wrote:
>
> Thank you very much for the reply.
>
> According to your explanation and the content from the website, if there is 
> only one sequential execution in the program, then it doesn't matter whether 
> using the attribute. But if there are multiple sequential execution,  each 
> process only needs to use MPI_Comm_dup() once with the help of attribute, 
> instead of duplicating the MPI_COMM_WORLD whenever there is a  sequential 
> execution. Am I right?
>
> Best wishes!
>
> Gilles Gouaillardet  于2018年12月16日周日 上午8:14写道:
>>
>> Hi,
>>
>> Your understanding is incorrect :
>> "Attributes are local to the process and specific to the communicator
>> to which they are attached."
>> (per 
>> https://www.mcs.anl.gov/research/projects/mpi/mpi-standard/mpi-report-1.1/node119.htm)
>>
>> think an attribute is often a pointer, and really bad things can
>> happen if rank 0 uses a pointer that is valid on rank 1,
>> so if attributes were global, they would be virtually unusable.
>>
>> note the comment before the barrier is incorrect. this is a simple
>> barrier and all MPI tasks will block until all of them invoke seqEnd()
>>
>> The main goal of using attributes in this example is to invoke
>> MPI_Comm_dup() once per communicator (instead of once per sequence,
>> since this is an expensive operation).
>>
>>
>> Cheers,
>>
>> Gilles
>> On Sun, Dec 16, 2018 at 1:04 AM 邹海峰  wrote:
>> >
>> > Hi there,
>> >
>> > At first, I think attribute is just like global variable that attached to 
>> > a specific communicator. I can define and set the value on one process, 
>> > then get and modify the value on another process as long as those 
>> > processes belonging to the same communicator. But when I was reading 
>> > chapter 6 of the book using mpi: portable parallel programming with the 
>> > message-passing interface.  I was confused by the usage of caching 
>> > attribute.
>> >
>> > The purpose the code is to make the execution sequential. the main 
>> > function is
>> >
>> >   seqBegin( MPI_COMM_WORLD );
>> >   printf( "My rank is %d\n", wrank );
>> >   fflush( stdout );
>> >   seqEnd( MPI_COMM_WORLD );
>> >
>> > which is simple to understand. The program will print the rank in order. 
>> > The defination of the function "seqBegin()" is
>> >
>> > static int seqKeyval = MPI_KEYVAL_INVALID;
>> >
>> > void seqBegin( MPI_Comm comm )
>> > {
>> >   MPI_Comm lcomm;
>> >   int  flag, mysize, myrank;
>> >   seqInfo  *info;
>> >   if (seqKeyval == MPI_KEYVAL_INVALID) {
>> > MPI_Comm_create_keyval( MPI_NULL_COPY_FN, seqDelFn, , NULL );
>> >   }
>> >   MPI_Comm_get_attr( comm, seqKeyval, ,  );
>> >   if (!flag) {
>> > info = (seqInfo *)malloc( sizeof(seqInfo) );
>> > MPI_Comm_dup( comm, >lcomm );
>> > MPI_Comm_rank( info->lcomm,  );
>> > MPI_Comm_size( info->lcomm,  );
>> > info->prevRank = myrank - 1;
>> > if (info->prevRank < 0)   info->prevRank = MPI_PROC_NULL;
>> > info->nextRank = myrank + 1;
>> > if (info->nextRank >= mysize) info->nextRank = MPI_PROC_NULL;
>> > if (verbose) {
>> >   printf( "seqbegin: prev = %d, next = %d\n",
>> >   info->prevRank, info->nextRank );
>> > }
>> > MPI_Comm_set_attr( comm, seqKeyval, info );
>> >   }
>> >   MPI_Recv( NULL, 0, MPI_INT, info->prevRank, 0, info->lcomm,
>> > MPI_STATUS_IGNORE );
>> > }
>> >
>> > and the defination of function "seqEnd()" is
>> >
>> > void seqEnd( MPI_Comm comm )
>> > {
>> >   seqInfo *info;
>> >   int flag;
>> >
>> >   /* Sanity check */
>> >   if (seqKeyval == MPI_KEYVAL_INVALID)
>> > MPI_Abort( MPI_COMM_WORLD, 1 );
>> >   MPI_Comm_get_attr( comm, seqKeyval, ,  );
>> >   if (!info || !flag)
>> > MPI_Abort( MPI_COMM_WORLD, 1 );
>> >   if (verbose) {
>> > printf( "seqend: prev = %d, next = %d\n",
>> > info->prevRank, info->nextRank );
>> >   }
>> >   MPI_Send( NULL, 0, MPI_INT, info->nextRank, 0, info->lcomm );
>> >
>> >   /* Make everyone wait until all have completed their send */
>> >   MPI_Barrier( info->lcomm );
>> > }
>> >
>> > Other details are omitted. In fact, all the codes can be found in 
>> > https://www.mcs.anl.gov/research/projects/mpi/usingmpi/examples-usingmpi/libraries/index.html
>> >  which is provided by the author of the book.
>> >
>> > The program uses send and recv to block the execution. Only if the process 
>> > receive the message from last process, the process can continue to 
>> > execute, otherwise it is blocked, which resulting in the sequential 
>> > execution.  The part I don't understand is in function "seqBegin()". If my 
>> > undertstanding about attribute is right, only one process will enter the 
>> > if condition and set the value of attribute, other processes just get the 
>> > value . Here comes the 

[OMPI users] Limit to number of asynchronous sends/receives?

2018-12-16 Thread Adam Sylvester
I'm running OpenMPI 2.1.0 on RHEL 7 using TCP communication.  For the
specific run that's crashing on me, I'm running with 17 ranks (on 17
different physical machines).  I've got a stage in my application where
ranks need to transfer chunks of data where the size of each chunk is
trivial (on the order of 100 MB) compared to the overall imagery.  However,
the chunks are spread out across many buffers in a way that makes the
indexing complicated (and the memory is not all within a single buffer)...
the simplest way to express the data movement in code is by a large number
of MPI_Isend() and MPI_Ireceive() calls followed of course by an eventual
MPI_Waitall().  This works fine for many cases, but I've run into a case
now where the chunks are imbalanced such that a few ranks have a total of
~450 MPI_Request objects (I do a single MPI_Waitall() with all requests at
once) and the remaining ranks have < 10 MPI_Requests.  In this scenario, I
get a seg fault inside PMPI_Waitall().

Is there an implementation limit as to how many asynchronous requests are
allowed?  Is there a way this can be queried either via a #define value or
runtime call?  I probably won't go this route, but when initially compiling
OpenMPI, is there a configure option to increase it?

I've done a fair amount of debugging and am pretty confident this is where
the error is occurring as opposed to indexing out of bounds somewhere, but
if there is no such limit in OpenMPI, that would be useful to know too.

Thanks.
-Adam
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] questions about attribute caching

2018-12-16 Thread 邹海峰
Thank you very much for the reply.

According to your explanation and the content from the website, if there is
only one sequential execution in the program, then it doesn't matter
whether using the attribute. But if there are multiple sequential
execution,  each process only needs to use MPI_Comm_dup() once with the
help of attribute, instead of duplicating the MPI_COMM_WORLD whenever there
is a  sequential execution. Am I right?

Best wishes!

Gilles Gouaillardet  于2018年12月16日周日 上午8:14写道:

> Hi,
>
> Your understanding is incorrect :
> "Attributes are local to the process and specific to the communicator
> to which they are attached."
> (per
> https://www.mcs.anl.gov/research/projects/mpi/mpi-standard/mpi-report-1.1/node119.htm
> )
>
> think an attribute is often a pointer, and really bad things can
> happen if rank 0 uses a pointer that is valid on rank 1,
> so if attributes were global, they would be virtually unusable.
>
> note the comment before the barrier is incorrect. this is a simple
> barrier and all MPI tasks will block until all of them invoke seqEnd()
>
> The main goal of using attributes in this example is to invoke
> MPI_Comm_dup() once per communicator (instead of once per sequence,
> since this is an expensive operation).
>
>
> Cheers,
>
> Gilles
> On Sun, Dec 16, 2018 at 1:04 AM 邹海峰  wrote:
> >
> > Hi there,
> >
> > At first, I think attribute is just like global variable that attached
> to a specific communicator. I can define and set the value on one process,
> then get and modify the value on another process as long as those processes
> belonging to the same communicator. But when I was reading chapter 6 of the
> book using mpi: portable parallel programming with the message-passing
> interface.  I was confused by the usage of caching attribute.
> >
> > The purpose the code is to make the execution sequential. the main
> function is
> >
> >   seqBegin( MPI_COMM_WORLD );
> >   printf( "My rank is %d\n", wrank );
> >   fflush( stdout );
> >   seqEnd( MPI_COMM_WORLD );
> >
> > which is simple to understand. The program will print the rank in order.
> The defination of the function "seqBegin()" is
> >
> > static int seqKeyval = MPI_KEYVAL_INVALID;
> >
> > void seqBegin( MPI_Comm comm )
> > {
> >   MPI_Comm lcomm;
> >   int  flag, mysize, myrank;
> >   seqInfo  *info;
> >   if (seqKeyval == MPI_KEYVAL_INVALID) {
> > MPI_Comm_create_keyval( MPI_NULL_COPY_FN, seqDelFn, , NULL
> );
> >   }
> >   MPI_Comm_get_attr( comm, seqKeyval, ,  );
> >   if (!flag) {
> > info = (seqInfo *)malloc( sizeof(seqInfo) );
> > MPI_Comm_dup( comm, >lcomm );
> > MPI_Comm_rank( info->lcomm,  );
> > MPI_Comm_size( info->lcomm,  );
> > info->prevRank = myrank - 1;
> > if (info->prevRank < 0)   info->prevRank = MPI_PROC_NULL;
> > info->nextRank = myrank + 1;
> > if (info->nextRank >= mysize) info->nextRank = MPI_PROC_NULL;
> > if (verbose) {
> >   printf( "seqbegin: prev = %d, next = %d\n",
> >   info->prevRank, info->nextRank );
> > }
> > MPI_Comm_set_attr( comm, seqKeyval, info );
> >   }
> >   MPI_Recv( NULL, 0, MPI_INT, info->prevRank, 0, info->lcomm,
> > MPI_STATUS_IGNORE );
> > }
> >
> > and the defination of function "seqEnd()" is
> >
> > void seqEnd( MPI_Comm comm )
> > {
> >   seqInfo *info;
> >   int flag;
> >
> >   /* Sanity check */
> >   if (seqKeyval == MPI_KEYVAL_INVALID)
> > MPI_Abort( MPI_COMM_WORLD, 1 );
> >   MPI_Comm_get_attr( comm, seqKeyval, ,  );
> >   if (!info || !flag)
> > MPI_Abort( MPI_COMM_WORLD, 1 );
> >   if (verbose) {
> > printf( "seqend: prev = %d, next = %d\n",
> > info->prevRank, info->nextRank );
> >   }
> >   MPI_Send( NULL, 0, MPI_INT, info->nextRank, 0, info->lcomm );
> >
> >   /* Make everyone wait until all have completed their send */
> >   MPI_Barrier( info->lcomm );
> > }
> >
> > Other details are omitted. In fact, all the codes can be found in
> https://www.mcs.anl.gov/research/projects/mpi/usingmpi/examples-usingmpi/libraries/index.html
> which is provided by the author of the book.
> >
> > The program uses send and recv to block the execution. Only if the
> process receive the message from last process, the process can continue to
> execute, otherwise it is blocked, which resulting in the sequential
> execution.  The part I don't understand is in function "seqBegin()". If my
> undertstanding about attribute is right, only one process will enter the if
> condition and set the value of attribute, other processes just get the
> value . Here comes the question: since other processes don't set the value,
> how can they get the prevRank and nextRank of their own.
> >
> > The code can be executed as expected. But I still can't get the rational
> behind this and there is little reference about attribute caching, so I
> come here for help. Thank you very much !
> >
> > Best Wishes !
> > ___
> > users mailing list
> > users@lists.open-mpi.org
> >