hao mailto:jczh...@mcs.anl.gov>>; Open MPI
Users mailto:users@lists.open-mpi.org>>
Subject: Re: [OMPI users] CUDA mpi question
I was pointed to "2.7. Synchronization and Memory Ordering" of
https://docs.nvidia.com/pdf/GPUDirect_RDMA.pdf<https://eur03.safelinks.protection.outl
}
>>
>>
>>
>> for (int i = 0; i < num_threads; i++) {
>>
>> if(pthread_join(threads[i], NULL)) {
>>
>> fprintf(stderr, "Error joining threadn");
>>
>> return 2;
>>
>>
si...@icl.utk.edu>>
Cc: Zhang, Junchao mailto:jczh...@mcs.anl.gov>>; Open MPI
Users mailto:users@lists.open-mpi.org>>
Subject: Re: [OMPI users] CUDA mpi question
I was pointed to "2.7. Synchronization and Memory Ordering" of
https://docs.nvidia.com/pdf/GPUDirect_RDMA.pdf<
I was pointed to "2.7. Synchronization and Memory Ordering" of
https://docs.nvidia.com/pdf/GPUDirect_RDMA.pdf. It is on topic. But
unfortunately it is too short and I could not understand it.
I also checked cudaStreamAddCallback/cudaLaunchHostFunc, which say the host
function "must not make
On Wed, Nov 27, 2019 at 5:02 PM Zhang, Junchao wrote:
> On Wed, Nov 27, 2019 at 3:16 PM George Bosilca
> wrote:
>
>> Short and portable answer: you need to sync before the Isend or you will
>> send garbage data.
>>
> Ideally, I want to formulate my code into a series of asynchronous "kernel
>
On Wed, Nov 27, 2019 at 3:16 PM George Bosilca
mailto:bosi...@icl.utk.edu>> wrote:
Short and portable answer: you need to sync before the Isend or you will send
garbage data.
Ideally, I want to formulate my code into a series of asynchronous "kernel
launch, kernel launch, ..." without
Short and portable answer: you need to sync before the Isend or you will
send garbage data.
Assuming you are willing to go for a less portable solution you can get the
OMPI streams and add your kernels inside, so that the sequential order will
guarantee correctness of your isend. We have 2 hidden
Hi,
Suppose I have this piece of code and I use cuda-aware MPI,
cudaMalloc(,sz);
Kernel1<<<...,stream>>>(...,sbuf);
MPI_Isend(sbuf,...);
Kernel2<<<...,stream>>>();
Do I need to call cudaStreamSynchronize(stream) before MPI_Isend() to make
sure data in sbuf is ready to