Re: [OMPI users] CUDA mpi question

2019-11-28 Thread Justin Luitjens via users
hao mailto:jczh...@mcs.anl.gov>>; Open MPI Users mailto:users@lists.open-mpi.org>> Subject: Re: [OMPI users] CUDA mpi question I was pointed to "2.7. Synchronization and Memory Ordering" of https://docs.nvidia.com/pdf/GPUDirect_RDMA.pdf<https://eur03.safelinks.protection.outl

Re: [OMPI users] CUDA mpi question

2019-11-28 Thread George Bosilca via users
} >> >> >> >> for (int i = 0; i < num_threads; i++) { >> >> if(pthread_join(threads[i], NULL)) { >> >> fprintf(stderr, "Error joining threadn"); >> >> return 2; >> >>

Re: [OMPI users] CUDA mpi question

2019-11-27 Thread Zhang, Junchao via users
si...@icl.utk.edu>> Cc: Zhang, Junchao mailto:jczh...@mcs.anl.gov>>; Open MPI Users mailto:users@lists.open-mpi.org>> Subject: Re: [OMPI users] CUDA mpi question I was pointed to "2.7. Synchronization and Memory Ordering" of https://docs.nvidia.com/pdf/GPUDirect_RDMA.pdf<

Re: [OMPI users] CUDA mpi question

2019-11-27 Thread Zhang, Junchao via users
I was pointed to "2.7. Synchronization and Memory Ordering" of https://docs.nvidia.com/pdf/GPUDirect_RDMA.pdf. It is on topic. But unfortunately it is too short and I could not understand it. I also checked cudaStreamAddCallback/cudaLaunchHostFunc, which say the host function "must not make

Re: [OMPI users] CUDA mpi question

2019-11-27 Thread George Bosilca via users
On Wed, Nov 27, 2019 at 5:02 PM Zhang, Junchao wrote: > On Wed, Nov 27, 2019 at 3:16 PM George Bosilca > wrote: > >> Short and portable answer: you need to sync before the Isend or you will >> send garbage data. >> > Ideally, I want to formulate my code into a series of asynchronous "kernel >

Re: [OMPI users] CUDA mpi question

2019-11-27 Thread Zhang, Junchao via users
On Wed, Nov 27, 2019 at 3:16 PM George Bosilca mailto:bosi...@icl.utk.edu>> wrote: Short and portable answer: you need to sync before the Isend or you will send garbage data. Ideally, I want to formulate my code into a series of asynchronous "kernel launch, kernel launch, ..." without

Re: [OMPI users] CUDA mpi question

2019-11-27 Thread George Bosilca via users
Short and portable answer: you need to sync before the Isend or you will send garbage data. Assuming you are willing to go for a less portable solution you can get the OMPI streams and add your kernels inside, so that the sequential order will guarantee correctness of your isend. We have 2 hidden

[OMPI users] CUDA mpi question

2019-11-27 Thread Zhang, Junchao via users
Hi, Suppose I have this piece of code and I use cuda-aware MPI, cudaMalloc(,sz); Kernel1<<<...,stream>>>(...,sbuf); MPI_Isend(sbuf,...); Kernel2<<<...,stream>>>(); Do I need to call cudaStreamSynchronize(stream) before MPI_Isend() to make sure data in sbuf is ready to