ldd -o on the executable of both linkings of your code.
My guess is that without PETSc it is linking the static version of the needed libraries and with PETSc the shared. And, in typical fashion, the shared libraries are off on some super slow file system so take a long time to be loaded and linked in on demand. Still a performance bug in Summit. Barry > On Feb 7, 2020, at 12:23 PM, Zhang, Hong via petsc-dev > <[email protected]> wrote: > > Hi all, > > Previously I have noticed that the first call to a CUDA function such as > cudaMalloc and cudaFree in PETSc takes a long time (7.5 seconds) on summit. > Then I prepared a simple example as attached to help OCLF reproduce the > problem. It turned out that the problem was caused by PETSc. The 7.5-second > overhead can be observed only when the PETSc lib is linked. If I do not link > PETSc, it runs normally. Does anyone have any idea why this happens and how > to fix it? > > Hong (Mr.) > > bash-4.2$ cat ex_simple.c > #include <time.h> > #include <cuda_runtime.h> > #include <stdio.h> > > int main(int argc,char **args) > { > clock_t start,s1,s2,s3; > double cputime; > double *init,tmp[100] = {0}; > > start = clock(); > cudaFree(0); > s1 = clock(); > cudaMalloc((void **)&init,100*sizeof(double)); > s2 = clock(); > cudaMemcpy(init,tmp,100*sizeof(double),cudaMemcpyHostToDevice); > s3 = clock(); > printf("free time =%lf malloc time =%lf copy time =%lf\n",((double) (s1 - > start)) / CLOCKS_PER_SEC,((double) (s2 - s1)) / CLOCKS_PER_SEC,((double) (s3 > - s2)) / CLOCKS_PER_SEC); > > return 0; > } > >
