Hi Junchao, If you are talking about this commit of yours https://gitlab.com/petsc/petsc/-/commit/f0463fa09df52ce43e7c5bf47a1c87df0c9e5cbb
Recycle keyvals and fix bugs in MPI_Comm creation I think I got it. It fixes the serial one but parallel one is still crashing. Thanks, Sam On Fri, Jun 26, 2020 at 3:43 PM Sam Guo <[email protected]> wrote: > Hi Junchao, > I am not ready to upgrade petsc yet(due to the lengthy technical and > legal approval process of our internal policy). Can you send me the diff > file so I can apply it to petsc 3.11.3)? > > Thanks, > Sam > > On Fri, Jun 26, 2020 at 3:33 PM Junchao Zhang <[email protected]> > wrote: > >> Sam, >> Please discard the origin patch I sent you. A better fix is already in >> maint/master. An test is at src/sys/tests/ex53.c >> I modified that test at the end with >> >> for (i=0; i<500; i++) { >> ierr = PetscInitializeNoPointers(argc,argv,NULL,help);if (ierr) >> return ierr; >> ierr = SlepcInitialize(&argc,&argv,NULL,help);if (ierr) return ierr; >> ierr = SlepcFinalize();if (ierr) return ierr; >> ierr = PetscFinalize();if (ierr) return ierr; >> } >> >> >> then I ran it with multiple mpi ranks and it ran correctly. So try your >> program with petsc master first. If not work, see if you can come up with a >> test example for us. >> >> Thanks. >> --Junchao Zhang >> >> >> On Fri, Jun 26, 2020 at 3:37 PM Sam Guo <[email protected]> wrote: >> >>> One work around for me is to call PetscInitialize once for my entire >>> program and skip PetscFinalize (since I don't have a good place to call >>> PetscFinalize before ending the program). >>> >>> On Fri, Jun 26, 2020 at 1:33 PM Sam Guo <[email protected]> wrote: >>> >>>> I get the crash after calling Initialize/Finalize multiple times. >>>> Junchao fixed the bug for serial but parallel still crashes. >>>> >>>> On Fri, Jun 26, 2020 at 1:28 PM Barry Smith <[email protected]> wrote: >>>> >>>>> >>>>> Ah, so you get the crash the second time you call >>>>> PetscInitialize()? That is a problem because we do intend to support that >>>>> capability (but you much call PetscFinalize() each time also). >>>>> >>>>> Barry >>>>> >>>>> >>>>> On Jun 26, 2020, at 3:25 PM, Sam Guo <[email protected]> wrote: >>>>> >>>>> Hi Barry, >>>>> Thanks for the quick response. >>>>> I will call PetscInitialize once and skip the PetscFinalize for now >>>>> to avoid the crash. The crash is actually in PetscInitialize, not >>>>> PetscFinalize. >>>>> >>>>> Thanks, >>>>> Sam >>>>> >>>>> On Fri, Jun 26, 2020 at 1:21 PM Barry Smith <[email protected]> wrote: >>>>> >>>>>> >>>>>> Sam, >>>>>> >>>>>> You can skip PetscFinalize() so long as you only call >>>>>> PetscInitialize() once. It is not desirable in general to skip the >>>>>> finalize >>>>>> because PETSc can't free all its data structures and you cannot see the >>>>>> PETSc logging information with -log_view but in terms of the code running >>>>>> correctly you do not need to call PetscFinalize. >>>>>> >>>>>> If your code crashes in PetscFinalize() please send the full error >>>>>> output and we can try to help you debug it. >>>>>> >>>>>> >>>>>> Barry >>>>>> >>>>>> On Jun 26, 2020, at 3:14 PM, Sam Guo <[email protected]> wrote: >>>>>> >>>>>> To clarify, we have a mpi wrapper (so we can switch to different mpi >>>>>> at runtime). I compile petsc using our mpi wrapper. >>>>>> If I just call PETSc initialize once without calling finallize, it is >>>>>> ok. My question to you is that: can I skip finalize? >>>>>> Our program calls mpi_finalize at end anyway. >>>>>> >>>>>> On Fri, Jun 26, 2020 at 1:09 PM Sam Guo <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Hi Junchao, >>>>>>> Attached please find the configure.log. >>>>>>> I also attach the pinit.c which contains your patch (I am >>>>>>> currently using 3.11.3. I've applied your patch to 3.11.3). Your patch >>>>>>> fixes the serial version. The error now is about the parallel. >>>>>>> Here is the error log: >>>>>>> >>>>>>> [1]PETSC ERROR: #1 PetscInitialize() line 969 in >>>>>>> ../../../petsc/src/sys/objects/pinit.c >>>>>>> [1]PETSC ERROR: #2 checkError() line 56 in >>>>>>> ../../../physics/src/eigensolver/SLEPc.cpp >>>>>>> [1]PETSC ERROR: #3 PetscInitialize() line 966 in >>>>>>> ../../../petsc/src/sys/objects/pinit.c >>>>>>> [1]PETSC ERROR: #4 SlepcInitialize() line 262 in >>>>>>> ../../../slepc/src/sys/slepcinit.c >>>>>>> [0]PETSC ERROR: #1 PetscInitialize() line 969 in >>>>>>> ../../../petsc/src/sys/objects/pinit.c >>>>>>> [0]PETSC ERROR: #2 checkError() line 56 in >>>>>>> ../../../physics/src/eigensolver/SLEPc.cpp >>>>>>> [0]PETSC ERROR: #3 PetscInitialize() line 966 in >>>>>>> ../../../petsc/src/sys/objects/pinit.c >>>>>>> [0]PETSC ERROR: #4 SlepcInitialize() line 262 in >>>>>>> ../../../slepc/src/sys/slepcinit.c >>>>>>> PETSC ERROR: Logging has not been enabled. >>>>>>> You might have forgotten to call PetscInitialize(). >>>>>>> PETSC ERROR: Logging has not been enabled. >>>>>>> You might have forgotten to call PetscInitialize(). >>>>>>> >>>>>>> -------------------------------------------------------------------------- >>>>>>> MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD >>>>>>> with errorcode 56. >>>>>>> >>>>>>> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. >>>>>>> You may or may not see output from other processes, depending on >>>>>>> exactly when Open MPI kills them. >>>>>>> >>>>>>> Thanks, >>>>>>> Sam >>>>>>> >>>>>>> On Thu, Jun 25, 2020 at 7:37 PM Junchao Zhang < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> Sam, >>>>>>>> The MPI_Comm_create_keyval() error was fixed in maint/master. >>>>>>>> From the error message, it seems you need to configure --with-log=1 >>>>>>>> Otherwise, please send your full error stack trace and >>>>>>>> configure.log. >>>>>>>> Thanks. >>>>>>>> --Junchao Zhang >>>>>>>> >>>>>>>> >>>>>>>> On Thu, Jun 25, 2020 at 2:18 PM Sam Guo <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi Junchao, >>>>>>>>> I now encountered the same error with parallel. I am wondering >>>>>>>>> if there is a need for parallel fix as well. >>>>>>>>> [1]PETSC ERROR: #1 PetscInitialize() line 969 in >>>>>>>>> ../../../petsc/src/sys/objects/pinit.c >>>>>>>>> PETSC ERROR: Logging has not been enabled. >>>>>>>>> You might have forgotten to call PetscInitialize(). >>>>>>>>> PETSC ERROR: Logging has not been enabled. >>>>>>>>> You might have forgotten to call PetscInitialize(). >>>>>>>>> >>>>>>>>> On Sat, Jun 20, 2020 at 7:35 PM Sam Guo <[email protected]> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Hi Junchao, >>>>>>>>>> Your patch works. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Sam >>>>>>>>>> >>>>>>>>>> On Sat, Jun 20, 2020 at 4:23 PM Junchao Zhang < >>>>>>>>>> [email protected]> wrote: >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Sat, Jun 20, 2020 at 12:24 PM Barry Smith <[email protected]> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Junchao, >>>>>>>>>>>> >>>>>>>>>>>> This is a good bug fix. It solves the problem when PETSc >>>>>>>>>>>> initialize is called many times. >>>>>>>>>>>> >>>>>>>>>>>> There is another fix you can do to limit PETSc mpiuni >>>>>>>>>>>> running out of attributes inside a single PETSc run: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> int MPI_Comm_create_keyval(MPI_Copy_function >>>>>>>>>>>> *copy_fn,MPI_Delete_function *delete_fn,int *keyval,void >>>>>>>>>>>> *extra_state) >>>>>>>>>>>> { >>>>>>>>>>>> >>>>>>>>>>>> if (num_attr >= MAX_ATTR){ >>>>>>>>>>>> for (i=0; i<num_attr; i++) { >>>>>>>>>>>> if (!attr_keyval[i].extra_state) { >>>>>>>>>>>> >>>>>>>>>>> attr_keyval[i].extra_state is provided by user (could be NULL). >>>>>>>>>>> We can not rely on it. >>>>>>>>>>> >>>>>>>>>>>> /* reuse this slot */ >>>>>>>>>>>> attr_keyval[i].extra_state = extra_state; >>>>>>>>>>>> attr_keyval[i.]del = delete_fn; >>>>>>>>>>>> *keyval = i; >>>>>>>>>>>> return MPI_SUCCESS; >>>>>>>>>>>> } >>>>>>>>>>>> } >>>>>>>>>>>> return MPIUni_Abort(MPI_COMM_WORLD,1); >>>>>>>>>>>> } >>>>>>>>>>>> return MPIUni_Abort(MPI_COMM_WORLD,1); >>>>>>>>>>>> attr_keyval[num_attr].extra_state = extra_state; >>>>>>>>>>>> attr_keyval[num_attr].del = delete_fn; >>>>>>>>>>>> *keyval = num_attr++; >>>>>>>>>>>> return MPI_SUCCESS; >>>>>>>>>>>> } >>>>>>>>>>>> >>>>>>>>>>>> This will work if the user creates tons of attributes but is >>>>>>>>>>>> constantly deleting some as they new ones. So long as the number >>>>>>>>>>>> outstanding at one time is < MAX_ATTR) >>>>>>>>>>>> >>>>>>>>>>>> Barry >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Jun 20, 2020, at 10:54 AM, Junchao Zhang < >>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>> >>>>>>>>>>>> I don't understand what your session means. Let's try this patch >>>>>>>>>>>> >>>>>>>>>>>> diff --git a/src/sys/mpiuni/mpi.c b/src/sys/mpiuni/mpi.c >>>>>>>>>>>> index d559a513..c058265d 100644 >>>>>>>>>>>> --- a/src/sys/mpiuni/mpi.c >>>>>>>>>>>> +++ b/src/sys/mpiuni/mpi.c >>>>>>>>>>>> @@ -283,6 +283,7 @@ int MPI_Finalize(void) >>>>>>>>>>>> MPI_Comm_free(&comm); >>>>>>>>>>>> comm = MPI_COMM_SELF; >>>>>>>>>>>> MPI_Comm_free(&comm); >>>>>>>>>>>> + num_attr = 1; /* reset the counter */ >>>>>>>>>>>> MPI_was_finalized = 1; >>>>>>>>>>>> return MPI_SUCCESS; >>>>>>>>>>>> } >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> --Junchao Zhang >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Sat, Jun 20, 2020 at 10:48 AM Sam Guo <[email protected]> >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Typo: I mean “Assuming initializer is only needed once for >>>>>>>>>>>>> entire session” >>>>>>>>>>>>> >>>>>>>>>>>>> On Saturday, June 20, 2020, Sam Guo <[email protected]> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Assuming finalizer is only needed once for entire session(?), >>>>>>>>>>>>>> I can put initializer into the static block to call it once but >>>>>>>>>>>>>> where do I >>>>>>>>>>>>>> call finalizer? >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Saturday, June 20, 2020, Junchao Zhang < >>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> The counter num_attr should be recycled. But first try to >>>>>>>>>>>>>>> call PETSc initialize/Finalize only once to see it fixes the >>>>>>>>>>>>>>> error. >>>>>>>>>>>>>>> --Junchao Zhang >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Sat, Jun 20, 2020 at 12:48 AM Sam Guo < >>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> To clarify, I call PETSc initialize and PETSc finalize >>>>>>>>>>>>>>>> everytime I call SLEPc: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> PetscInitializeNoPointers(argc,args,nullptr,nullptr); >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> SlepcInitialize(&argc,&args,static_cast<char*>(nullptr),help); >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> //calling slepc >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> SlepcFinalize(); >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> PetscFinalize(); >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Fri, Jun 19, 2020 at 10:32 PM Sam Guo < >>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Dear PETSc team, >>>>>>>>>>>>>>>>> When I called SLEPc multiple time, I eventually got >>>>>>>>>>>>>>>>> following error: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> MPI operation not supported by PETSc's sequential MPI >>>>>>>>>>>>>>>>> wrappers >>>>>>>>>>>>>>>>> [0]PETSC ERROR: #1 PetscInitialize() line 967 in >>>>>>>>>>>>>>>>> ../../../petsc/src/sys/objects/pinit.c >>>>>>>>>>>>>>>>> [0]PETSC ERROR: #2 SlepcInitialize() line 262 in >>>>>>>>>>>>>>>>> ../../../slepc/src/sys/slepcinit.c >>>>>>>>>>>>>>>>> [0]PETSC ERROR: #3 SlepcInitializeNoPointers() line 359 in >>>>>>>>>>>>>>>>> ../../../slepc/src/sys/slepcinit.c >>>>>>>>>>>>>>>>> PETSC ERROR: Logging has not been enabled. >>>>>>>>>>>>>>>>> You might have forgotten to call PetscInitialize(). >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I debugged: it is because of following in >>>>>>>>>>>>>>>>> petsc/src/sys/mpiuni/mpi.c >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> if (num_attr >= MAX_ATTR) >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> in function int MPI_Comm_create_keyval(MPI_Copy_function >>>>>>>>>>>>>>>>> *copy_fn,MPI_Delete_function *delete_fn,int *keyval,void >>>>>>>>>>>>>>>>> *extra_state) >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> num_attr is declared static and keeps increasing every >>>>>>>>>>>>>>>>> time MPI_Comm_create_keyval is called. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I am using petsc 3.11.3 but found 3.13.2 has the >>>>>>>>>>>>>>>>> same logic. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Is this a bug or I didn't use it correctly? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>> Sam >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>> >>>>>
