I saw your update. In PetscCUDAInitialize we have
/* First get the device count */
err = cudaGetDeviceCount(&devCount);
/* next determine the rank and then set the device via a mod */
ierr = MPI_Comm_rank(comm,&rank);CHKERRQ(ierr);
device = rank % devCount;
}
err = cudaSetDevice(device);
If we rely on the first CUDA call to do initialization, how could CUDA know
these MPI stuff.
--Junchao Zhang
On Wed, Sep 18, 2019 at 11:42 PM Smith, Barry F.
<[email protected]<mailto:[email protected]>> wrote:
Fixed the docs. Thanks for pointing out the lack of clarity
> On Sep 18, 2019, at 11:25 PM, Zhang, Junchao via petsc-dev
> <[email protected]<mailto:[email protected]>> wrote:
>
> Barry,
>
> I saw you added these in init.c
>
>
> + -cuda_initialize - do the initialization in PetscInitialize()
>
>
>
>
>
>
>
>
> Notes:
>
> Initializing cuBLAS takes about 1/2 second there it is done by default in
> PetscInitialize() before logging begins
>
>
>
> But I did not get otherwise with -cuda_initialize 0, when will cuda be
> initialized?
> --Junchao Zhang