Re: [OMPI users] gpudirect p2p (again)?

2012-07-09 Thread Rolf vandeVaart
Yes, this feature is in Open MPI 1.7.  It is implemented in the "smcuda" btl.  
If you configure as outlined in the FAQ, then things should just work.  The 
smcuda btl will be selected and P2P will be used between GPUs on the same node. 
 This is only utilized on transfers of buffers that are larger than 4K in size.

Rolf

>-Original Message-
>From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org]
>On Behalf Of Crni Gorac
>Sent: Monday, July 09, 2012 1:25 PM
>To: us...@open-mpi.org
>Subject: [OMPI users] gpudirect p2p (again)?
>
>Trying to examine CUDA support in OpenMPI, using OpenMPI current feature
>series (v1.7).  There was a question on this mailing list back in October 2011
>(http://www.open-mpi.org/community/lists/users/2011/10/17539.php),
>about OpenMPI being able to use P2P transfers in case when two MPI
>processed involved in the transfer operation happens to execute on the same
>machine, and the answer was that this feature is being implemented.  So my
>question is - what is the current status here, is this feature supported now?
>
>Thanks.
>___
>users mailing list
>us...@open-mpi.org
>http://www.open-mpi.org/mailman/listinfo.cgi/users
---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---



[OMPI users] gpudirect p2p (again)?

2012-07-09 Thread Crni Gorac
Trying to examine CUDA support in OpenMPI, using OpenMPI current
feature series (v1.7).  There was a question on this mailing list back
in October 2011
(http://www.open-mpi.org/community/lists/users/2011/10/17539.php),
about OpenMPI being able to use P2P transfers in case when two MPI
processed involved in the transfer operation happens to execute on the
same machine, and the answer was that this feature is being
implemented.  So my question is - what is the current status here, is
this feature supported now?

Thanks.


Re: [OMPI users] gpudirect p2p?

2011-10-16 Thread Chris Cooper
Ah I see.  Thanks for the info!

On Sat, Oct 15, 2011 at 12:06 AM, Rolf vandeVaart
<rvandeva...@nvidia.com> wrote:
>>-Original Message-
>>From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org]
>>On Behalf Of Chris Cooper
>>Sent: Friday, October 14, 2011 1:28 AM
>>To: us...@open-mpi.org
>>Subject: [OMPI users] gpudirect p2p?
>>
>>Hi,
>>
>>Are the recent peer to peer capabilities of cuda leveraged by Open MPI when
>>eg you're running a rank per gpu on the one workstation?
>
> Currently, no.  I am actively working on adding that capability.
>
>>
>>It seems in my testing that I only get in the order of about 1GB/s as per
>>http://www.open-mpi.org/community/lists/users/2011/03/15823.php,
>>whereas nvidia's simpleP2P test indicates ~6 GB/s.
>>
>>Also, I ran into a problem just trying to test.  It seems you have to do
>>cudaSetDevice/cuCtxCreate with the appropriate gpu id which I was wanting
>>to derive from the rank.  You don't however know the rank until after
>>MPI_Init() and you need to initialize cuda before.  Not sure if there's a
>>standard way to do it?  I have a workaround atm.
>>
>
> The recommended way is to put the GPU in exclusive mode first.
>
> #nvidia-smi -c 1
>
> Then, have this kind of snippet at the beginning of the program. (this is 
> driver
> API, probably should use runtime API)
>
> res = cuInit(0);
> if (CUDA_SUCCESS != res) {
>    exit(1);
> }
>
> if(CUDA_SUCCESS != cuDeviceGetCount()) {
>    exit(2);
> }
> for (device = 0; device < cuDevCount; device++) {
>    if (CUDA_SUCCESS != (res = cuDeviceGet(, device))) {
>        exit(3);
>    }
>    if (CUDA_SUCCESS != cuCtxCreate(, 0, cuDev)) {
>     /* Another process must have grabbed it.  Go to the next one. */
>    } else {
>        break;
>    }
>    i++;
> }
>
>
>
>>Thanks,
>>Chris
>>___
>>users mailing list
>>us...@open-mpi.org
>>http://www.open-mpi.org/mailman/listinfo.cgi/users
> ---
> This email message is for the sole use of the intended recipient(s) and may 
> contain
> confidential information.  Any unauthorized review, use, disclosure or 
> distribution
> is prohibited.  If you are not the intended recipient, please contact the 
> sender by
> reply email and destroy all copies of the original message.
> ---
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>



Re: [OMPI users] gpudirect p2p?

2011-10-14 Thread Rolf vandeVaart
>-Original Message-
>From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org]
>On Behalf Of Chris Cooper
>Sent: Friday, October 14, 2011 1:28 AM
>To: us...@open-mpi.org
>Subject: [OMPI users] gpudirect p2p?
>
>Hi,
>
>Are the recent peer to peer capabilities of cuda leveraged by Open MPI when
>eg you're running a rank per gpu on the one workstation?

Currently, no.  I am actively working on adding that capability. 

>
>It seems in my testing that I only get in the order of about 1GB/s as per
>http://www.open-mpi.org/community/lists/users/2011/03/15823.php,
>whereas nvidia's simpleP2P test indicates ~6 GB/s.
>
>Also, I ran into a problem just trying to test.  It seems you have to do
>cudaSetDevice/cuCtxCreate with the appropriate gpu id which I was wanting
>to derive from the rank.  You don't however know the rank until after
>MPI_Init() and you need to initialize cuda before.  Not sure if there's a
>standard way to do it?  I have a workaround atm.
>

The recommended way is to put the GPU in exclusive mode first.

#nvidia-smi -c 1

Then, have this kind of snippet at the beginning of the program. (this is driver
API, probably should use runtime API)

res = cuInit(0);
if (CUDA_SUCCESS != res) {
exit(1);
} 

if(CUDA_SUCCESS != cuDeviceGetCount()) {
exit(2);
}
for (device = 0; device < cuDevCount; device++) {
if (CUDA_SUCCESS != (res = cuDeviceGet(, device))) {
exit(3);
}
if (CUDA_SUCCESS != cuCtxCreate(, 0, cuDev)) {
 /* Another process must have grabbed it.  Go to the next one. */
} else {
break;
}
i++;
}



>Thanks,
>Chris
>___
>users mailing list
>us...@open-mpi.org
>http://www.open-mpi.org/mailman/listinfo.cgi/users
---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---



[OMPI users] gpudirect p2p?

2011-10-14 Thread Chris Cooper
Hi,

Are the recent peer to peer capabilities of cuda leveraged by Open MPI
when eg you're running a rank per gpu on the one workstation?

It seems in my testing that I only get in the order of about 1GB/s as
per http://www.open-mpi.org/community/lists/users/2011/03/15823.php,
whereas nvidia's simpleP2P test indicates ~6 GB/s.

Also, I ran into a problem just trying to test.  It seems you have to
do cudaSetDevice/cuCtxCreate with the appropriate gpu id which I was
wanting to derive from the rank.  You don't however know the rank
until after MPI_Init() and you need to initialize cuda before.  Not
sure if there's a standard way to do it?  I have a workaround atm.

Thanks,
Chris