ldd -o on the executable of both linkings of your code.

  My guess is that without PETSc it is linking the static version of the needed 
libraries and with PETSc the shared. And, in typical fashion, the shared 
libraries are off on some super slow file system so take a long time to be 
loaded and linked in on demand.

   Still a performance bug in Summit. 

   Barry


> On Feb 7, 2020, at 12:23 PM, Zhang, Hong via petsc-dev 
> <[email protected]> wrote:
> 
> Hi all,
> 
> Previously I have noticed that the first call to a CUDA function such as 
> cudaMalloc and cudaFree in PETSc takes a long time (7.5 seconds) on summit. 
> Then I prepared a simple example as attached to help OCLF reproduce the 
> problem. It turned out that the problem was  caused by PETSc. The 7.5-second 
> overhead can be observed only when the PETSc lib is linked. If I do not link 
> PETSc, it runs normally. Does anyone have any idea why this happens and how 
> to fix it?
> 
> Hong (Mr.)
> 
> bash-4.2$ cat ex_simple.c
> #include <time.h>
> #include <cuda_runtime.h>
> #include <stdio.h>
> 
> int main(int argc,char **args)
> {
>  clock_t start,s1,s2,s3;
>  double  cputime;
>  double   *init,tmp[100] = {0};
> 
>  start = clock();
>  cudaFree(0);
>  s1 = clock();
>  cudaMalloc((void **)&init,100*sizeof(double));
>  s2 = clock();
>  cudaMemcpy(init,tmp,100*sizeof(double),cudaMemcpyHostToDevice);
>  s3 = clock();
>  printf("free time =%lf malloc time =%lf copy time =%lf\n",((double) (s1 - 
> start)) / CLOCKS_PER_SEC,((double) (s2 - s1)) / CLOCKS_PER_SEC,((double) (s3 
> - s2)) / CLOCKS_PER_SEC);
> 
>  return 0;
> }
> 
> 

Reply via email to