> On Jul 6, 2021, at 8:30 AM, Vijay S Kumar <[email protected]> wrote: > > Hello all, > > By way of background, we have a PetSc-based solver that we run on our > in-house Cray system. We are carrying out performance analysis using > profilers in the CrayPat suite that provide more fine-grained > performance-related information than the PetSc log_view summary. > > When instrumented using CrayPat perftools, it turns out that the MPI > initialization (MPI_Init) internally invoked by PetscInitialize is not picked > up by the profiler. That is, simply specifying the following: > ierr = PetscInitialize(&argc,&argv,(char*)0,NULL);if (ierr) > return ierr; > results in the following runtime error: > CrayPat/X: Version 7.1.1 Revision 7c0ddd79b 08/19/19 16:58:46 > Attempting to use an MPI routine before initializing MPICH
This is certainly unexpected behavior, PETSc is "just" an MPI application it does not do anything special for CrayPat. We do not expect that one would need to call MPI_Init() outside of PETSc to use a performance tool. Perhaps PETSc is not being configured/compiled with the correct flags for the CrayPat performance tools or its shared library is not being built appropriately. If CrayPat uses the PMPI_xxx wrapper model for MPI profiling it may cause these kinds of difficulties if the correct profile wrapper functions are not inserted during the build process. I would try running a standard PETSc program in a debugger with breakpoints for MPI_Init() (and possible others) to investigate what is happening exactly and maybe why. You can send to [email protected] <mailto:[email protected]> the configure.log and make.log that was generated. Barry > > To circumvent this, we had to explicitly call MPI_Init prior to > PetscInitialize: > MPI_Init(&argc,&argv); > ierr = PetscInitialize(&argc,&argv,(char*)0,NULL);if (ierr) > return ierr; > > However, the side-effect of this above workaround seems to be several > downstream runtime (assertion) errors with VecAssemblyBegin/End and > MatAssemblyBeing/End statements: > > CrayPat/X: Version 7.1.1 Revision 7c0ddd79b 08/19/19 16:58:46 > main.x: ../rtsum.c:5662: __pat_trsup_trace_waitsome_rtsum: Assertion > `recv_count != MPI_UNDEFINED' failed. > > [email protected]:769 > VecAssemblyEnd@0x2aaab951b3ba > VecAssemblyEnd_MPI_BTS@0x2aaab950b179 > MPI_Waitsome@0x43a238 > __pat_trsup_trace_waitsome_rtsum@0x5f1a17 > __GI___assert_fail@0x2aaabc61e7d1 > __assert_fail_base@0x2aaabc61e759 > __GI_abort@0x2aaabc627740 > __GI_raise@0x2aaabc626160 > > Interestingly, we do not see such errors when there is no explicit MPI_Init, > and no instrumentation for performance. > Looking for someone to help throw more light on why PetSc Mat/Vec AssemblyEnd > statements lead to such MPI-level assertion errors in cases where MPI_Init is > explicitly called. > (Or alternatively, is there a way to call PetscInitialize in a manner that > ensures that the MPI initialization is picked up by the profilers in > question?) > > We would highly appreciate any help/pointers, > > Thanks! > Vijay
