Another question: If MPI_Allgatherv(const void *sendbuf, int sendcount, 
MPI_Datatype sendtype, void *recvbuf, const int recvcounts[],const int 
displs[], MPI_Datatype recvtype, MPI_Comm comm) is cuda aware, are recvcounts, 
displs in CPU memory or GPU memory?

   Are the APIs at 
latest? I could not find MPI_Neighbor_xxx and MPI_Reduce_local.

