Re: [OMPI users] questions about attribute caching

2018-12-15 Thread Gilles Gouaillardet
Hi,

Your understanding is incorrect :
"Attributes are local to the process and specific to the communicator
to which they are attached."
(per 
https://www.mcs.anl.gov/research/projects/mpi/mpi-standard/mpi-report-1.1/node119.htm)

think an attribute is often a pointer, and really bad things can
happen if rank 0 uses a pointer that is valid on rank 1,
so if attributes were global, they would be virtually unusable.

note the comment before the barrier is incorrect. this is a simple
barrier and all MPI tasks will block until all of them invoke seqEnd()

The main goal of using attributes in this example is to invoke
MPI_Comm_dup() once per communicator (instead of once per sequence,
since this is an expensive operation).


Cheers,

Gilles
On Sun, Dec 16, 2018 at 1:04 AM 邹海峰  wrote:
>
> Hi there,
>
> At first, I think attribute is just like global variable that attached to a 
> specific communicator. I can define and set the value on one process, then 
> get and modify the value on another process as long as those processes 
> belonging to the same communicator. But when I was reading chapter 6 of the 
> book using mpi: portable parallel programming with the message-passing 
> interface.  I was confused by the usage of caching attribute.
>
> The purpose the code is to make the execution sequential. the main function is
>
>   seqBegin( MPI_COMM_WORLD );
>   printf( "My rank is %d\n", wrank );
>   fflush( stdout );
>   seqEnd( MPI_COMM_WORLD );
>
> which is simple to understand. The program will print the rank in order. The 
> defination of the function "seqBegin()" is
>
> static int seqKeyval = MPI_KEYVAL_INVALID;
>
> void seqBegin( MPI_Comm comm )
> {
>   MPI_Comm lcomm;
>   int  flag, mysize, myrank;
>   seqInfo  *info;
>   if (seqKeyval == MPI_KEYVAL_INVALID) {
> MPI_Comm_create_keyval( MPI_NULL_COPY_FN, seqDelFn, &seqKeyval, NULL );
>   }
>   MPI_Comm_get_attr( comm, seqKeyval, &info, &flag );
>   if (!flag) {
> info = (seqInfo *)malloc( sizeof(seqInfo) );
> MPI_Comm_dup( comm, &info->lcomm );
> MPI_Comm_rank( info->lcomm, &myrank );
> MPI_Comm_size( info->lcomm, &mysize );
> info->prevRank = myrank - 1;
> if (info->prevRank < 0)   info->prevRank = MPI_PROC_NULL;
> info->nextRank = myrank + 1;
> if (info->nextRank >= mysize) info->nextRank = MPI_PROC_NULL;
> if (verbose) {
>   printf( "seqbegin: prev = %d, next = %d\n",
>   info->prevRank, info->nextRank );
> }
> MPI_Comm_set_attr( comm, seqKeyval, info );
>   }
>   MPI_Recv( NULL, 0, MPI_INT, info->prevRank, 0, info->lcomm,
> MPI_STATUS_IGNORE );
> }
>
> and the defination of function "seqEnd()" is
>
> void seqEnd( MPI_Comm comm )
> {
>   seqInfo *info;
>   int flag;
>
>   /* Sanity check */
>   if (seqKeyval == MPI_KEYVAL_INVALID)
> MPI_Abort( MPI_COMM_WORLD, 1 );
>   MPI_Comm_get_attr( comm, seqKeyval, &info, &flag );
>   if (!info || !flag)
> MPI_Abort( MPI_COMM_WORLD, 1 );
>   if (verbose) {
> printf( "seqend: prev = %d, next = %d\n",
> info->prevRank, info->nextRank );
>   }
>   MPI_Send( NULL, 0, MPI_INT, info->nextRank, 0, info->lcomm );
>
>   /* Make everyone wait until all have completed their send */
>   MPI_Barrier( info->lcomm );
> }
>
> Other details are omitted. In fact, all the codes can be found in 
> https://www.mcs.anl.gov/research/projects/mpi/usingmpi/examples-usingmpi/libraries/index.html
>  which is provided by the author of the book.
>
> The program uses send and recv to block the execution. Only if the process 
> receive the message from last process, the process can continue to execute, 
> otherwise it is blocked, which resulting in the sequential execution.  The 
> part I don't understand is in function "seqBegin()". If my undertstanding 
> about attribute is right, only one process will enter the if condition and 
> set the value of attribute, other processes just get the value . Here comes 
> the question: since other processes don't set the value, how can they get the 
> prevRank and nextRank of their own.
>
> The code can be executed as expected. But I still can't get the rational 
> behind this and there is little reference about attribute caching, so I come 
> here for help. Thank you very much !
>
> Best Wishes !
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] Unable to build Open MPI with external PMIx library support

2018-12-15 Thread Howard Pritchard
Hi Eduardo

Could you post the config.log for the build with internal PMIx so we can
figure that out first.

Howard

Eduardo Rothe via users  schrieb am Fr. 14. Dez.
2018 um 09:41:

> Open MPI: 4.0.0
> PMIx: 3.0.2
> OS: Debian 9
>
> I'm building a debian package for Open MPI and either I get the following
> error messages while configuring:
>
>   undefined reference to symbol 'dlopen@@GLIBC_2.2.5'
>   undefined reference to symbol 'lt_dlopen'
>
> when using the configure option:
>
>   ./configure --with-pmix=/usr/lib/x86_64-linux-gnu/pmix
>
> or otherwise, if I use the following configure options:
>
>   ./configure --with-pmix=external
> --with-pmix-libdir=/usr/lib/x86_64-linux-gnu/pmix
>
> I have a successfull compile, but when running mpirun I get the following
> message:
>
> --
> We were unable to find any usable plugins for the BFROPS framework. This
> PMIx
> framework requires at least one plugin in order to operate. This can be
> caused
> by any of the following:
>
> * we were unable to build any of the plugins due to some combination
>   of configure directives and available system support
>
> * no plugin was selected due to some combination of MCA parameter
>   directives versus built plugins (i.e., you excluded all the plugins
>   that were built and/or could execute)
>
> * the PMIX_INSTALL_PREFIX environment variable, or the MCA parameter
>   "mca_base_component_path", is set and doesn't point to any location
>   that includes at least one usable plugin for this framework.
>
> Please check your installation and environment.
> --
>
> What I find most strange is that I get the same error message (unable to
> find
> any usable plugins for the BFROPS framework) even if I don't configure
> external PMIx support!
>
> Can someone please hint me about what's going on?
>
> Cheers!
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

[OMPI users] questions about attribute caching

2018-12-15 Thread 邹海峰
Hi there,

At first, I think attribute is just like global variable that attached to a
specific communicator. I can define and set the value on one process, then
get and modify the value on another process as long as those processes
belonging to the same communicator. But when I was reading chapter 6 of the
book *using mpi: portable parallel programming with the message-passing
interface.  *I was confused by the usage of caching attribute.

The purpose the code is to make the execution sequential. the main function
is

  seqBegin( MPI_COMM_WORLD );
  printf( "My rank is %d\n", wrank );
  fflush( stdout );
  seqEnd( MPI_COMM_WORLD );

which is simple to understand. The program will print the rank in order.
The defination of the function "seqBegin()" is

static int seqKeyval = MPI_KEYVAL_INVALID;

void seqBegin( MPI_Comm comm )
{
  MPI_Comm lcomm;
  int  flag, mysize, myrank;
  seqInfo  *info;
  if (seqKeyval == MPI_KEYVAL_INVALID) {
MPI_Comm_create_keyval( MPI_NULL_COPY_FN, seqDelFn, &seqKeyval, NULL );
  }
  MPI_Comm_get_attr( comm, seqKeyval, &info, &flag );
  if (!flag) {
info = (seqInfo *)malloc( sizeof(seqInfo) );
MPI_Comm_dup( comm, &info->lcomm );
MPI_Comm_rank( info->lcomm, &myrank );
MPI_Comm_size( info->lcomm, &mysize );
info->prevRank = myrank - 1;
if (info->prevRank < 0)   info->prevRank = MPI_PROC_NULL;
info->nextRank = myrank + 1;
if (info->nextRank >= mysize) info->nextRank = MPI_PROC_NULL;
if (verbose) {
  printf( "seqbegin: prev = %d, next = %d\n",
  info->prevRank, info->nextRank );
}
MPI_Comm_set_attr( comm, seqKeyval, info );
  }
  MPI_Recv( NULL, 0, MPI_INT, info->prevRank, 0, info->lcomm,
MPI_STATUS_IGNORE );
}

and the defination of function "seqEnd()" is

void seqEnd( MPI_Comm comm )
{
  seqInfo *info;
  int flag;

  /* Sanity check */
  if (seqKeyval == MPI_KEYVAL_INVALID)
MPI_Abort( MPI_COMM_WORLD, 1 );
  MPI_Comm_get_attr( comm, seqKeyval, &info, &flag );
  if (!info || !flag)
MPI_Abort( MPI_COMM_WORLD, 1 );
  if (verbose) {
printf( "seqend: prev = %d, next = %d\n",
info->prevRank, info->nextRank );
  }
  MPI_Send( NULL, 0, MPI_INT, info->nextRank, 0, info->lcomm );

  /* Make everyone wait until all have completed their send */
  MPI_Barrier( info->lcomm );
}

Other details are omitted. In fact, all the codes can be found in
https://www.mcs.anl.gov/research/projects/mpi/usingmpi/examples-usingmpi/libraries/index.html
which
is provided by the author of the book.

The program uses send and recv to block the execution. Only if the process
receive the message from last process, the process can continue to execute,
otherwise it is blocked, which resulting in the sequential execution.  The
part I don't understand is in function "seqBegin()". If my undertstanding
about attribute is right, only one process will enter the if condition and
set the value of attribute, other processes just get the value . Here comes
the question: since other processes don't set the value, how can they get
the prevRank and nextRank of their own.

The code can be executed as expected. But I still can't get the rational
behind this and there is little reference about attribute caching, so I
come here for help. Thank you very much !

Best Wishes !
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] CUDA GPU Direct RDMA support

2018-12-15 Thread Weicheng Xue
Akshay,

 Thank you very much!

Best regards,

Weicheng Xue

On Fri, Dec 14, 2018 at 8:06 PM Akshay Venkatesh 
wrote:

> These two links will likely be able to help you:
>
> http://www.mellanox.com/page/products_dyn?product_family=295&mtag=gpudirect
> https://github.com/Mellanox/nv_peer_memory
>
> On Fri, Dec 14, 2018 at 4:50 PM Weicheng Xue  wrote:
>
>> Hi all,
>>
>>  I am now having a GPU Direct issue. I loaded gcc/5.2.0,
>> openmpi-gdr/2.0.0, cuda/8.0.61, on two Nvidia P100 GPUs. The program I am
>> now testing is a very simple test code, having one GPU to send a buffer to
>> another GPU directly. After running with "mpirun -mca
>> btl_openib_want_cuda_gdr 1 -np 2 ./executable", the terminal returned "You
>> requested to run with CUDA GPU Direct RDMA support but this OFED
>> installation does not have that support.  Contact Mellanox to figure out
>> how to get an OFED stack with that support." I am wondering how to fix this
>> issue. Thank you very much!
>>
>> Best Regards,
>>
>> Weicheng Xue
>> ___
>> users mailing list
>> users@lists.open-mpi.org
>> https://lists.open-mpi.org/mailman/listinfo/users
>
>
>
> --
> -Akshay
> NVIDIA
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users